Understanding Convolution in Deep Learning (2015)

Understanding Convolution in Deep Learning (2015)(timdettmers.com)

142 points by teptoria 5 years ago | 20 comments

nabla9 5 years ago |

Everyone must first get over the terminology confusion. Convolution in DL is actually cross-correlation, not convolution. In practise it does not matter, signal is just flipped, but it can be very confusing when you try to learn and go trough examples.

r_c_a_d 5 years ago | |

The terminology comes from signal processing, where a convolution in the frequency domain is equivalent to a multiplication in the time domain. I don't think anyone is thinking about the frequency domain in deep-learning, but they still call the operators convolution kernels.

XMPPwocky 5 years ago | | |

Fuck the frequency domain, here-

"Convolution with a kernel K" describes a system whose impulse response is K. In discrete time, suppose you have K=[1,2] and convolve [0,1,2,0] with it- you wind up with [0,1,3,2,0], if I'm awake enough for arithmetic.

Correlation with a kernel K is convolution with K time-reversed (i.e. [2,1])- you'd get [0,2,5,2,0] (again if I'm awake). Note that 5- right there, the input signal "lines up just right" with the kernel- 2x2 + 1x1. That's why it's called correlation- its output is big when the input looks like the kernel.

qppo 5 years ago | | |

I mean ultimately it comes from functional analysis and differential equations (not signal processing).

It's a binary operator on functions that yields a third function. It has a lot of useful properties and equivalences, like that it can be described as the product of two Fourier transforms (although that's very roundabout).

You're actually introduced to convolution in middle school when you're taught how to multiply monomials to build a polynomial (at my middle school they called it "FOIL").

YetAnotherNick 5 years ago | | |

Multiplying in frequency domain is convolution, in DL terminology convolution is that convolution with the weights rotated by 180 degree.

YetAnotherNick 5 years ago | |

As it is mostly done with weights whose initialization and any operation will be same with the flipping, it can basically be imagined whatever you find easy to imagine.

dnautics 5 years ago | |

It is a convolution. Grant Sanderson (3blue 1brown) explains the relationship between a filter and the fourier transform towards of the end of this hot-off-the-presses video: https://mitmath.github.io/18S191/Fall20/lecture2/

m0zg 5 years ago | |

> actually cross-correlation

That doesn't help one understand what it is at all. Convolution in DL is simply a set of dot products of a patch from the input with a bunch of filters. Each resulting dot product is simply a measure of similarity between a patch and a filter. That's all there is to it.

lacker 5 years ago |

IMO calling it "convolution" in deep learning is extra confusing, because the word "convolution" means many fairly different things in other contexts.

The idea behind convolution in deep learning is that, if a particular pattern of pixels is meaningful, then it is probably also meaningful if you shift the whole thing in some direction. So you can force some layers of the network to be the same under translation, and it'll be faster to pick up some sorts of patterns.

Der_Einzige 5 years ago | |

You didn't explain why its faster though.

It's faster because its reduces the dimensionality of the inputs down to something manageable (hundreds or low thousands). You can replace convolutions with most other types of dimensionality reduction (including other types of layers) and outside of image tasks you'll get very similar or even better performance.

elcritch 5 years ago | | |

I wonder if that’d work by doing a 2d Fourier transformer on an image before hand, but you’re not reducing dimensionally that way.

bonoboTP 5 years ago | |

Convolution has been an established term in image processing long before convnets were invented in the 90s (or 80s or whenever). It's the same thing. It's useful to learn a bit of basic image processing, edge detection etc before jumping straight into the flashiest shiniest DL model with no basic foundations to conceptualize what is happening.

(Even before that it has been used in signal processing.)

timkofu 5 years ago |

Thanks for sharing this.

ellisv 5 years ago |

Published in 2015