What is Convolution?

Published

April 15, 2025

Modified

July 19, 2025

Inspiration

Consider that you live in a high rise apartment complex. Have you heard an ambulance go by? How does the sound of the siren change as the ambulance approaches towards your dwelling and then goes past it to get lost amidst the surrounding buildings again?

The siren’s emitted sound is always the same. It is the local surroundings and the geometry of the echoes that brings the same sound to your ears again and again, but in altered/weighted form. The sound from the ambulance goes all around, hits on or other of the buildings, reflects, and comes back to your ears after a delay and weighted by the strength of the echo geometry.

What you hear is the overlapping of multiple, weighted copies of the sound emitted by the ambulance. As long as you have direct, i.e. non-reflected path from the ambulance to your ears, the echoes are relatively subdued. Once the vehicle gets right into your building complex and you lose the direct line-of-sight path, the echoes take over and the sound becomes a very confused mass that is barely recognizable.

What is Convolution?

All right, what does this have to do with convolution? Let us make some definitions first:

Channel

The free-space medium plus the buildings and other things that reflect sound in our environment, are called the “Channel”. The channel ascts as a conduit between a source (transmitter) and a receiver.

Impulse Response of the Channel

The geometry of the echoes that connect transmitter to receiver, including the bounces of the walls, the resulting path-delays, and weighting are together denoted as the impulse response of the channel. This is what the channel would put out at the receiver if the source transmitter were to emit a very-short-duration signal, like the squeak of a mouse.

Now, most signals emitted by a source are usually not “squeak-like”: the ambulance has a siren that continuously emits the wellknown sound. Such a continuous signal is capable of mathematically decomposed into a series of “squeak-like” signals, which we call impulses.

So finally:

What is Convolution?

Each impulse undergoes the same geometry path-delays and path-weightings posed by the channel impulse response. This is diagrammatically shown below:

We see that impulses in the input waveform that arrive later undergo wieghting by the earliest of path-delays and path-weightings. This should give you an intuition, that mathematically, this is like taking a weighted average but with the sequence of weights inverted in time!!!

If \(in(t)\) is the emitted sound waveform, and \(f(t)\) is the channel impulse response, we write the output of the channel as:

\[ \Large{out(t) = \int_{-\infty}^{\infty} in(t) * f(t-\tau) *d\tau} \tag{1}\]

Note that we are integrating wrt delay \(\tau\); and \(f\) uses negative \(\tau\) as its variable. Hence it is hence inverted in time, as shown in the bottom left of the Figure 1.

We’ll see

Wait, But Why?

Perceptrons are a standard convolution operation.
Convolution is an operation that is crucial to the operation of Convolutional Neural Networks. The early (spatial) filter layers in a CNN implement a convolution with impulse responses that learn to look for edges, curves and similar canonical pieces in an input image.
When we generate guitar-like sounds using the Karplus-Strong Guitar Algorithm, we are using a set of filters (with low-pass/band-pass impulse responses) in the feedback loop of a delay-line primed with random noise.
Convolution can be seen as a series of Vector Dot Products between two vectors sliding past each other.

References

To be Written Up.

R Package Citations

Package	Version	Citation
keras	2.15.0	@keras
safetensors	0.1.2	@safetensors
tensorflow	2.16.0.9000	@tensorflow