Convolution
Convolution is one of those ideas that sounds more intimidating than it is. At its core, it is just a sliding window operation.
Let's say you have a 1D signal — an accelerometer reading, or a time series of audio samples. You also have a smaller array called a kernel or filter. You slide that kernel along the signal, and at each position, you compute the weighted sum of the kernel values with the signal values it currently overlaps. The result is a transformed, filtered signal.
1D Convolution
At position 0:
Slide a 1×3 kernel across a signal one step at a time. Watch the feature map build up as the kernel detects its target pattern — and observe how the output value changes based on how well the kernel matches the local patch.
In 2D — the world of images — the mechanics are identical, just extended into two dimensions. The kernel is now a small 2D matrix (say, 3 × 3). It slides across both the width and height of the image. At each position, it computes the weighted sum of the kernel values with the pixel values it covers. The output is a feature map: a new 2D array that represents how strongly the kernel's pattern was detected at each location.
2D Convolution
At position (row 0, col 0):
Slide a 3×3 kernel across a signal one step at a time. Watch the feature map build up as the kernel detects its target pattern — and observe how the output value changes based on how well the kernel matches the local patch.
Why This Matters Practically
A 3 × 3 kernel designed to detect horizontal edges will produce high values wherever there are horizontal edges in the image, regardless of where in the image those edges appear. The same kernel, applied at every position, detects the same feature everywhere. This is the key insight behind CNNs.
3D Convolution
In volumetric data (medical scans like CT or MRI) or video (which is just 2D images across time), convolution can extend to three dimensions. The kernel moves across width, height, and depth simultaneously. If you are interested in medical imaging or video analysis, 3D convolutions are worth exploring — but the focus going forward is 2D.