Paper Explained - Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention (Full Video Analysis)

The Nyströmformer (or Nystromformer, Nyströmer, Nystromer), is a new drop-in replacement for approximating the Self-Attention matrix in Transformers with linear memory and time requirements. Most importantly, it uses the Nystrom-Method to subselect (or segment mean) queries and keys as so-called landmarks and uses those to reconstruct the inherently low-rank attention matrix. This is relevant for many areas of Machine Learning, especially Natural Language processing, where it enables longer sequences of text to be processed at once.

OUTLINE:
0:00 - Intro & Overview
2:30 - The Quadratic Memory Bottleneck in Self-Attention
7:20 - The Softmax Operation in Attention
11:15 - Nyström-Approximation
14:00 - Getting Around the Softmax Problem
18:05 - Intuition for Landmark Method
28:05 - Full Algorithm
30:20 - Theoretical Guarantees
35:55 - Avoiding the Large Attention Matrix
36:55 - Subsampling Keys vs Negative Sampling
43:15 - Experimental Results
47:00 - Conclusion & Comments

Paper: [2102.03902] Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
Code: GitHub - mlpen/Nystromformer
Appendix: Nystromformer/Nystromformer_Supplement.pdf at main · mlpen/Nystromformer · GitHub
LRA Results: https://twitter.com/tanmingxing/status/1359301186734620675

1 Like