Deep Learning
Fast Fourier Convolution — A detailed view
Solving issues with Vanilla CNNs

Hello all, Hope you are doing good. In this post, we will go through a research paper named “Fast Fourier Convolution”. This paper explores the problems with the convolution operator in CNNs and proposes a new operator to replace the convolution operator. This has improved the accuracies for computer vision tasks such as image classification, action recognition etc.,
The Resnet architecture has been modified to include the FFC operator in place of convolution layers and has shown an improvement of 1.5 in Top-1 accuracy on the Imagenet dataset. The accuracies have improved in a similar manner for action recognition and human keypoint detection tasks across different architectures. More of this is in the results section.
Problems with CNNs
- Receptive field — The image part that is accessible by one filter. Most of the CNNs have 3x3 filters which have less receptive fields. This has been solved to some extent by stacking the layers. But for context-sensitive tasks such as human pose estimation, a large receptive field is highly desired.
- Cross-scale fusion — CNNs provide different levels of feature abstraction at different stages. For accurate spatial detection, the fusion of these features is preferred. Works like FPN does the same thing, but with additional layers. This increases the complexity of the network.
FFC is a novel convolutional operator that efficiently implements non-local receptive fields and fuses multi-scale information.
Idea
- Borrowed from Spectral-domain (Spectral convolution theorem)
- Updating a single value in spectral-domain globally affects all original data
- Convert the spatial features to spectral features — Apply some operations — Convert back to spatial features
- operations in spectral-domain indicate the receptive field of convolution to the full resolution of the input feature map
FFC Components
The idea is to replace the convolution layer (Conv2D) with the FFC block.
FFC block consists of 2 paths — local and global. The local path uses ordinary convolution operators on the input feature maps and the global path operates in the spectral domain. You can get an idea from the figure below.

The actual FFC has interconnections between 2 paths as shown below

The output blocks Y_l and Y_g are calculated as below.

Except for f_g() all other 3 functions (f_l, f_l_g, f_g_l) are convolution layers. f_g() function is a Spectral Transformer.
You can see the code block below for the FFC block for better understanding.













