Project 2: Fun with Filters and Frequencies!

Part 1: Fun with Filters

1.1: Convolutions From Scratch!

For this part, I first started with a four for loop implementation that looped through the entire padded image array. Then, for each ith, jth element in the image array, we loop through the kernel, and get the sum of the element * the kenel. We then set that as the ith, jth value of our output to get the convoluted image.
This version took very very long, since it had to do single computation over every single pixel of the image. For a 1500x2000 image, it took roughly 10 seconds to complete one convolution.


def brute_convolution_four(img_array, kernel):
    h, w = len(img_array), len(img_array[0])
    kh, kw = len(kernel), len(kernel[0])
    pad_h, pad_w = kh // 2, kw //2
    padded = np.pad(img_array, ((pad_h, pad_h), (pad_w, pad_w)), mode="constant")

    output = np.zeros_like(img_array, dtype=np.float64)
    for i in range(h):
        for j in range (w):
            total = 0
            for n in range(kh):
                for m in range(kw):
                    total += padded[i+n, j+m] * kernel[n, m]
            output[i, j] = total
                    
    return output

I then made this faster using just two for loops, looping through the entire padded image array, but taking a slice of the image array to do elementwise multiplication of that slice with the kernel. After getting that product, I summed it to get the final output and then set the ith, jth value as that sum.
This version was way faster, cutting down the computation time to around 2 seconds to complete one convolution. It still wasn't close to the built in convolve2d, however, since convolve2d was almost instantaneous.


def brute_convolution_two(img_array, kernel):
    h, w = len(img_array), len(img_array[0])
    kh, kw = len(kernel), len(kernel[0])
    pad_h, pad_w = kh // 2, kw //2
    padded = np.pad(img_array, ((pad_h, pad_h), (pad_w, pad_w)), mode="constant")
    output = np.zeros_like(img_array, dtype=np.float64)

    for i in range(h):
        for j in range(w):
            area = padded[i: i+kh, j:j+kw]
            output[i, j] = np.sum((area * kernel))
return output

Original Image

Convolution on x with scipy.convolve2d

Brute force convolution on x

Built in convolution on y

Brute force convoution on y

Built in box convolution

Brute force box convolution

1.2: Finite Difference Operator

For this part, I used the built in scipy.convolve2d to convolve the cameraman image with the dy and dx filter. Then, I got the gradient by taking the euclidean distance between each value of the convoluted dy and dx images. I then tested different threshholds to figure out the best one to classify the edges. I tried 0.7 but it wasn't able to get some of the thinner ones, so I tried going down to 0.5, and then 0.3, then 0.1.

Original cameraman image

Cameraman gradient image

Threshold = 0.7

Threshold = 0.3

Threshold = 0.1

Final threshold = 0.35

As I got to 0.3 and 0.1, a lot of noise started showing up. I couldn't get the real edges of the buildings in the back without the grass speckles using this method, so I decided to go back past 0.3 a little bit to reduce the noise as much as possible and just get the cameraman.

1.3: Derivative of Gaussian (DoG) Filter

For this part, I first created the 1D Guassian filter G using cv2.getGaussianKernel(). Then I created the 2D kernel by getting the outer product with it's transpose. Using that 2D kernel, I convolute the original image to smooth out the edges. I tried different inputs for the gaussian function, with 3 not smoothing well enough and 10 blurring the original image too much. I ended up with 6x1 as my sigma input to getGaussianKernel. After that, I just did the same dx, dy convolution on the smoothed image, and then got the gradient using np.hypot. After trialing a few thresholds, I ended up with a threshold of 0.12 now.

Cameraman image blurred

Blurred cameraman gradient

Binarized cameraman

Now we try the single convolution by creating a derivative of gaussian filters. We first take the convolution of the gaussian filter with the dx and dy filters. Then we take the convolution of the cameraman image with these derived gaussian filters. Once we have those, we can get the gradient using np.hypot, and then filter the same threshhold to get the edge image.

Gradient of DoG filter

Edge image of DoG filter

Original binarized cameraman

As we can tell, both the DoG filter and the original gaussian blurred edge image look the exact same.

Part 2: Fun with Frequencies!

2.1: Image "Sharpening"

For this part, I first blurred the orignal grayscale image by convolving with a gaussian kernel, separating out the lower frequencies. Then, I got filtered out the lower frequencies by subtracting the greyscale image with the blurred grayscale image, getting only the high frequencies (the details). Then, I added the "details" back to the original image, using alpha = 0.8. Changing alpha changed how thick the lines from the details were and gave "more sharpening" as alpha increased and "less sharpening" as alpha decreased.

Taj Mahal:

The result was sharper edges on the Taj Mahal itself, with the tree silhouettes and streets emphasized, as we can see in the details image.

Taj Mahal details

Original Image

Sharpened Image

Doe Library:

For doe library, the building edges were also sharpened, and the tree details were enhanced. It almost looked like each branch was visible.

Taj Mahal details

Original Image

Sharpened Image

Blur and Resharpen:

For this, I first blurred an image with the original alpha = 10 gaussian kernel. Then, I did the same process to sharpen the image: gaussian blur the grayscale for the low frequencies, subtract those from the original grayscale to get the high frequencies, and then add it back to the original (blurred) color image.

Original Image

Details

Original Image (blurred)

Sharpened Image

The sharpening brought back some of the edges of the campanile, but was not able to unsmooth the edges of the trees or the smaller buildings in the back. A lot of the high frequencies were lost during our initial blurring, therefore even with the sharpening process, many of the edges could not be fully recovered.

2.2: Hybrid Images

For this part, I did a high pass and low pass filter to filter out the high frequencies and low frequencies both images. For the low frequencies, I did a gaussian blur with kernel size = 6 * sigma1, and for the high frequenceies, I did a gaussian blur with kernel size 6 * sigma2, subtracting the original with the blur to get the high frequencies. I then added them together with a factor of alpha for the details to control how strong the high frequencies came out in the hybrid.

Derek + Nutmeg

Sigma1 = 3, Sigma2 = 5, alpha = 2. The edges of Nugmeg wasn't too prominent; I enhanced them so that Nutmeg was visible up close.

Derek

Nutmeg

Dermeg

Messi + Ronaldo

Sigma1 = 3, Sigma2 = 5, alpha = 0.7. The edges of ronaldo came out very strong, so I lowered it so that Messi was still visible from afar.

Messi

Ronaldo

Messnaldo

My favorite hybrid was Messnaldo, so I plotted the log magnitude of the Fourier transform for the images used to create it.

Original Messi FFT

Low pass Messifilter FFT

Original Ronaldo FFT

High pass Ronaldo FFT

Hybrid FFT

Jenn + Chicken the Cat

Sigma1 = 12, Sigma2 = 14, alpha = 2.5. The original images here were way bigger, so I had to do a bigger gaussian kernel to actually get a usable blur.

Jenn the human

Chicken the Cat

Jennken

Multi-Resolution Blending and the Oraple Journey

2.3: Gaussian and Laplacian Stacks

To build the Gaussian Stack, I repeatedly blurred the previous level with a noramalized gaussian kernal (6 * sigma) using reflect padding instead of fill padding. I didn't do any downsampling to keep the images the same size as the blur continued. To build the Laplacian stack, I took the differences between consecutive Gaussian levels - each Laplacian level was the current Gaussian level minus the next Gaussian level. As for the last Laplacian level, I just appeneded the last Gaussian level.

Original apple image

Original orange image

Apple Laplacian Stack (Levels 0-4)

Level 0

Level 1

Level 2

Level 3

Orange Laplacian Stack (Levels 0-4)

Level 0

Level 1

Level 2

Level 3

2.4: Multiresolution Blending (A.K.A. The Oraple!)

To actually blend images, I built the Laplacian stacks for both images with the same number of levels. Then, I generated a Gaussian stack from the mask input image. After that, I looped though each i, blending both Laplacian levels with the corresponding mask level. To get the final image, I sum over all the blended levels to get the correct pixel values. For my custom blend, I put a cat face onto an orange, using a circular mask to capture the cat's face.

Original apple image

Original orange image

Laplacian Stack

Apple mask

Orange Mask

Oraple

Campanile x Big Ben

Big Ben

Campanile

Big Campanile

Chicken the Orange

Chicken the cat!

Orange

Orange Cat!