Paper Explained - Perceiver: General Perception with Iterative Attention (Full Video Analysis)

Inspired by the fact that biological creatures attend to multiple modalities at the same time, DeepMind releases its new Perceiver model. Based on the Transformer architecture, the Perceiver makes no assumptions on the modality of the input data and also solves the long-standing quadratic bottleneck problem. This is achieved by having a latent low-dimensional Transformer, where the input data is fed multiple times via cross-attention. The Perceiver’s weights can also be shared across layers, making it very similar to an RNN. Perceivers achieve competitive performance on ImageNet and state-of-the-art on other modalities, all while making no architectural adjustments to input data.

0:00​ - Intro & Overview
2:20​ - Built-In assumptions of Computer Vision Models
5:10​ - The Quadratic Bottleneck of Transformers
8:00​ - Cross-Attention in Transformers
10:45​ - The Perceiver Model Architecture & Learned Queries
20:05​ - Positional Encodings via Fourier Features
23:25​ - Experimental Results & Attention Maps
29:05​ - Comments & Conclusion

Paper: [2103.03206] Perceiver: General Perception with Iterative Attention