魔改Attention大集合
极市平台
共 10214字,需浏览 21分钟
·
2020-08-29 08:09
极市导读
如何对attention进行高效改进?本文盘点了相关论文,并梳理出它们的引用量、代码实现、算法复杂度和关键点,方便对比使用。
Efficient Attention
Paper (引用量) | 源码实现 | 复杂度 | AutoRegressive | Main Idea |
---|---|---|---|---|
Generating Wikipedia by Summarizing Long Sequences[1] (208) | memory-compressed-attention[2] | |||
CBAM: Convolutional Block Attention Module[3] (677) | attention-module[4] | |||
CCNet: Criss-Cross Attention for Semantic Segmentation[5] (149) | CCNet[6] | |||
Efficient Attention: Attention with Linear Complexities[7] (2) | efficient-attention[8] | |||
Star-Transformer[9] (24) | fastNLP[10] | |||
Generating Long Sequences with Sparse Transformers[11] (139) | torch-blocksparse[12] | |||
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond[13] (96) | GCNet[14] | |||
SCRAM: Spatially Coherent Randomized Attention Maps[15] (1) | - | |||
Interlaced Sparse Self-Attention for Semantic Segmentation[16] (13) | IN_PAPER | |||
Permutohedral Attention Module for Efficient Non-Local Neural Networks[17] (2) | Permutohedral_attention_module[18] | |||
Large Memory Layers with Product Keys[19] (28) | XLM[20] | |||
Expectation-Maximization Attention Networks for Semantic Segmentation[21] (38) | EMANet[22] | |||
Compressive Transformers for Long-Range Sequence Modelling[23] (20) | compressive-transformer-pytorch[24] | |||
BP-Transformer: Modelling Long-Range Context via Binary Partitioning[25] (8) | BPT[26] | |||
Axial Attention in Multidimensional Transformers[27] (5) | axial-attention[28] | |||
Reformer: The Efficient Transformer[29] (69) | trax[30] | |||
Transformer on a Diet[31] (2) | transformer-on-diet[32] | |||
Sparse Sinkhorn Attention[33] (4) | sinkhorn-transformer[34] | |||
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection[35] (1) | - | |||
Efficient Content-Based Sparse Attention with Routing Transformers[36] (11) | routing-transformer[37] | |||
Longformer: The Long-Document Transformer[38] (15) | longformer[39] | |||
Neural Architecture Search for Lightweight Non-Local Networks[40] (2) | AutoNL[41] | |||
ETC: Encoding Long and Structured Data in Transformers[42] (2) | - | |||
Multi-scale Transformer Language Models[43] (1) | IN_PAPER | |||
Synthesizer: Rethinking Self-Attention in Transformer Models[44] (5) | - | |||
Jukebox: A Generative Model for Music[45] (9) | jukebox[46] | |||
GMAT: Global Memory Augmentation for Transformers[47] (0) | gmat[48] | |||
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers[49] (0) | google-research[50] | |||
Hand-crafted Attention is All You Need? A Study of Attention on Self-supervised Audio Transformer[51] (0) | - | |||
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention[52] (1) | fast-transformers[53] | |||
Linformer: Self-Attention with Linear Complexity[54] (3) | linformer-pytorch[55] | |||
Real-time Semantic Segmentation with Fast Attention[56] (0) | - | |||
Fast Transformers with Clustered Attention[57] (0) | fast-transformers[58] | |||
Big Bird: Transformers for Longer Sequences[59] (0) | - |
推荐阅读
评论