← Back to Welcome

Project 2: Fun with Filters and Frequencies!

Part 1: Fun with Filters

1.1: Convolutions From Scratch!

For this part, I first started with a four for loop implementation that looped through the entire padded image array. Then, for each ith, jth element in the image array, we loop through the kernel, and get the sum of the element * the kenel. We then set that as the ith, jth value of our output to get the convoluted image.
This version took very very long, since it had to do single computation over every single pixel of the image. For a 1500x2000 image, it took roughly 10 seconds to complete one convolution.


def brute_convolution_four(img_array, kernel):
    h, w = len(img_array), len(img_array[0])
    kh, kw = len(kernel), len(kernel[0])
    pad_h, pad_w = kh // 2, kw //2
    padded = np.pad(img_array, ((pad_h, pad_h), (pad_w, pad_w)), mode="constant")

    output = np.zeros_like(img_array, dtype=np.float64)
    for i in range(h):
        for j in range (w):
            total = 0
            for n in range(kh):
                for m in range(kw):
                    total += padded[i+n, j+m] * kernel[n, m]
            output[i, j] = total
                    
    return output
                    

I then made this faster using just two for loops, looping through the entire padded image array, but taking a slice of the image array to do elementwise multiplication of that slice with the kernel. After getting that product, I summed it to get the final output and then set the ith, jth value as that sum.
This version was way faster, cutting down the computation time to around 2 seconds to complete one convolution. It still wasn't close to the built in convolve2d, however, since convolve2d was almost instantaneous.


def brute_convolution_two(img_array, kernel):
    h, w = len(img_array), len(img_array[0])
    kh, kw = len(kernel), len(kernel[0])
    pad_h, pad_w = kh // 2, kw //2
    padded = np.pad(img_array, ((pad_h, pad_h), (pad_w, pad_w)), mode="constant")
    output = np.zeros_like(img_array, dtype=np.float64)

    for i in range(h):
        for j in range(w):
            area = padded[i: i+kh, j:j+kw]
            output[i, j] = np.sum((area * kernel))
return output
                    
original image
Original Image
built in convolution on x
Convolution on x with scipy.convolve2d
brute force convolution on x
Brute force convolution on x
built in convolution on y
Built in convolution on y
brute force convolution on y
Brute force convoution on y
built in box convolution
Built in box convolution
brute force box convolution
Brute force box convolution

1.2: Finite Difference Operator

For this part, I used the built in scipy.convolve2d to convolve the cameraman image with the dy and dx filter. Then, I got the gradient by taking the euclidean distance between each value of the convoluted dy and dx images. I then tested different threshholds to figure out the best one to classify the edges. I tried 0.7 but it wasn't able to get some of the thinner ones, so I tried going down to 0.5, and then 0.3, then 0.1.

original cameraman image
Original cameraman image
cameraman gradient image
Cameraman gradient image
0.7 threshold
Threshold = 0.7
0.3 threshold
Threshold = 0.3
0.1 threshold
Threshold = 0.1
final 0.35 threshold
Final threshold = 0.35

As I got to 0.3 and 0.1, a lot of noise started showing up. I couldn't get the real edges of the buildings in the back without the grass speckles using this method, so I decided to go back past 0.3 a little bit to reduce the noise as much as possible and just get the cameraman.

1.3: Derivative of Gaussian (DoG) Filter

For this part, I first created the 1D Guassian filter G using cv2.getGaussianKernel(). Then I created the 2D kernel by getting the outer product with it's transpose. Using that 2D kernel, I convolute the original image to smooth out the edges. I tried different inputs for the gaussian function, with 3 not smoothing well enough and 10 blurring the original image too much. I ended up with 6x1 as my sigma input to getGaussianKernel. After that, I just did the same dx, dy convolution on the smoothed image, and then got the gradient using np.hypot. After trialing a few thresholds, I ended up with a threshold of 0.12 now.

original cameraman image blurred
Cameraman image blurred
cameraman gradient image
Blurred cameraman gradient
0.7 threshold
Binarized cameraman

Now we try the single convolution by creating a derivative of gaussian filters. We first take the convolution of the gaussian filter with the dx and dy filters. Then we take the convolution of the cameraman image with these derived gaussian filters. Once we have those, we can get the gradient using np.hypot, and then filter the same threshhold to get the edge image.

gradient of DoG filter
Gradient of DoG filter
edge image of DoG filter
Edge image of DoG filter
0.7 threshold
Original binarized cameraman

As we can tell, both the DoG filter and the original gaussian blurred edge image look the exact same.

Part 2: Fun with Frequencies!

2.1: Image "Sharpening"

For this part, I first blurred the orignal grayscale image by convolving with a gaussian kernel, separating out the lower frequencies. Then, I got filtered out the lower frequencies by subtracting the greyscale image with the blurred grayscale image, getting only the high frequencies (the details). Then, I added the "details" back to the original image, using alpha = 0.8. Changing alpha changed how thick the lines from the details were and gave "more sharpening" as alpha increased and "less sharpening" as alpha decreased.

Taj Mahal:

The result was sharper edges on the Taj Mahal itself, with the tree silhouettes and streets emphasized, as we can see in the details image.

Taj Mahal Details
Taj Mahal details
Original Taj Mahal
Original Image
Taj Mahal Sharpened
Sharpened Image

Doe Library:

For doe library, the building edges were also sharpened, and the tree details were enhanced. It almost looked like each branch was visible.

Taj Mahal Details
Taj Mahal details
Original Taj Mahal
Original Image
Taj Mahal Sharpened
Sharpened Image

Blur and Resharpen:

For this, I first blurred an image with the original alpha = 10 gaussian kernel. Then, I did the same process to sharpen the image: gaussian blur the grayscale for the low frequencies, subtract those from the original grayscale to get the high frequencies, and then add it back to the original (blurred) color image.

Original Taj Mahal
Original Image
Taj Mahal Sharpened
Details
Original Taj Mahal
Original Image (blurred)
Taj Mahal Sharpened
Sharpened Image

The sharpening brought back some of the edges of the campanile, but was not able to unsmooth the edges of the trees or the smaller buildings in the back. A lot of the high frequencies were lost during our initial blurring, therefore even with the sharpening process, many of the edges could not be fully recovered.

2.2: Hybrid Images

For this part, I did a high pass and low pass filter to filter out the high frequencies and low frequencies both images. For the low frequencies, I did a gaussian blur with kernel size = 6 * sigma1, and for the high frequenceies, I did a gaussian blur with kernel size 6 * sigma2, subtracting the original with the blur to get the high frequencies. I then added them together with a factor of alpha for the details to control how strong the high frequencies came out in the hybrid.

Derek + Nutmeg

Sigma1 = 3, Sigma2 = 5, alpha = 2. The edges of Nugmeg wasn't too prominent; I enhanced them so that Nutmeg was visible up close.

Picture of Derek
Derek
Picture of Nutmeg
Nutmeg
Derek and Nutmeg hybrid
Dermeg

Messi + Ronaldo

Sigma1 = 3, Sigma2 = 5, alpha = 0.7. The edges of ronaldo came out very strong, so I lowered it so that Messi was still visible from afar.

Picture of Messi
Messi
Picture of Ronaldo
Ronaldo
Messi and Ronaldo hybrid
Messnaldo

My favorite hybrid was Messnaldo, so I plotted the log magnitude of the Fourier transform for the images used to create it.

Picture of Messi
Original Messi FFT
Picture of Ronaldo
Low pass Messifilter FFT
Messi and Ronaldo hybrid
Original Ronaldo FFT
Messi and Ronaldo hybrid
High pass Ronaldo FFT
Messi and Ronaldo hybrid
Hybrid FFT

Jenn + Chicken the Cat

Sigma1 = 12, Sigma2 = 14, alpha = 2.5. The original images here were way bigger, so I had to do a bigger gaussian kernel to actually get a usable blur.

Jenn
Jenn the human
Chicken the Cat
Chicken the Cat
Jenn and Chicken
Jennken

Multi-Resolution Blending and the Oraple Journey

2.3: Gaussian and Laplacian Stacks

To build the Gaussian Stack, I repeatedly blurred the previous level with a noramalized gaussian kernal (6 * sigma) using reflect padding instead of fill padding. I didn't do any downsampling to keep the images the same size as the blur continued. To build the Laplacian stack, I took the differences between consecutive Gaussian levels - each Laplacian level was the current Gaussian level minus the next Gaussian level. As for the last Laplacian level, I just appeneded the last Gaussian level.

apple
Original apple image
Orange
Original orange image

Apple Laplacian Stack (Levels 0-4)

apple
Level 0
apple
Level 1
apple
Level 2
apple
Level 3

Orange Laplacian Stack (Levels 0-4)

orange level 0
Level 0
orange level 1
Level 1
orange level 2
Level 2
orange level3
Level 3

2.4: Multiresolution Blending (A.K.A. The Oraple!)

To actually blend images, I built the Laplacian stacks for both images with the same number of levels. Then, I generated a Gaussian stack from the mask input image. After that, I looped though each i, blending both Laplacian levels with the corresponding mask level. To get the final image, I sum over all the blended levels to get the correct pixel values. For my custom blend, I put a cat face onto an orange, using a circular mask to capture the cat's face.

apple
Original apple image
Orange
Original orange image
apple
Laplacian Stack
apple
Apple mask
apple
Orange Mask
apple
Oraple

Campanile x Big Ben

apple
Big Ben
apple
Campanile
apple
Big Campanile

Chicken the Orange

apple
Chicken the cat!
apple
Orange
apple
Orange Cat!