User contributions for Aelmancy
Jump to navigation
Jump to search
3 April 2025
- 01:1401:14, 3 April 2025 diff hist +81 stat946W25 →Zero-Shot Text-to-Image Generation
- 01:0801:08, 3 April 2025 diff hist +856 stat946W25 →Topic 19: MM-LLMs
25 March 2025
- 22:3022:30, 25 March 2025 diff hist +264 stat946W25 →Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
- 21:5421:54, 25 March 2025 diff hist +59 stat946W25 →Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
- 21:5121:51, 25 March 2025 diff hist 0 N File:dynamic pruning 1.png No edit summary current
- 21:5021:50, 25 March 2025 diff hist +128 stat946W25 →Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
- 21:4421:44, 25 March 2025 diff hist +1,711 stat946W25 →Topic 6: KV Cache Compression
20 March 2025
- 21:4021:40, 20 March 2025 diff hist +177 stat946W25 →EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
- 21:3821:38, 20 March 2025 diff hist 0 N File:echoatt results3.png No edit summary current
- 21:3721:37, 20 March 2025 diff hist −1 stat946W25 →Results
- 21:3721:37, 20 March 2025 diff hist +3 stat946W25 No edit summary
- 21:3421:34, 20 March 2025 diff hist +1 stat946W25 →Results
- 21:3321:33, 20 March 2025 diff hist +4 stat946W25 →Results
- 21:3221:32, 20 March 2025 diff hist +468 stat946W25 →EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
- 21:2521:25, 20 March 2025 diff hist 0 N File:echoatt results2.png No edit summary current
- 21:2221:22, 20 March 2025 diff hist 0 N File:echoatt results1.png No edit summary current
19 March 2025
- 20:5720:57, 19 March 2025 diff hist +384 stat946W25 →EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
- 20:3720:37, 19 March 2025 diff hist +17 stat946W25 →Distillation with teacher's Pseudo-Labels
- 20:3620:36, 19 March 2025 diff hist −1 stat946W25 →Knowledge Distillation
- 20:3620:36, 19 March 2025 diff hist +250 stat946W25 →EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
- 20:3420:34, 19 March 2025 diff hist +1,935 stat946W25 →EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
18 March 2025
- 20:4320:43, 18 March 2025 diff hist +994 stat946W25 →EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
- 19:5619:56, 18 March 2025 diff hist +801 stat946W25 →EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
- 19:4019:40, 18 March 2025 diff hist +310 stat946W25 →Topic 5: KD / Pruning / Sharing
- 19:3219:32, 18 March 2025 diff hist +55 N File:echoatt.png https://doi.org/10.48550/arXiv.2409.14595 current
- 19:3019:30, 18 March 2025 diff hist +1,888 stat946W25 →Topic 5: KD / Pruning / Sharing
13 March 2025
- 22:1322:13, 13 March 2025 diff hist +21 stat946W25 →Summary & Key Takeaways
- 22:0822:08, 13 March 2025 diff hist +94 stat946W25 →Summary & Key Takeaways
- 22:0622:06, 13 March 2025 diff hist 0 stat946W25 →Topic 12: State Space Models
- 22:0422:04, 13 March 2025 diff hist +1,227 stat946W25 →Topic 12: State Space Models
- 18:2118:21, 13 March 2025 diff hist 0 stat946W25 →Key Approaches to Linear Attention
- 18:2118:21, 13 March 2025 diff hist +10 stat946W25 →Retentive Network (RetNet): A Successor to Transformer for Large Language Models
- 18:1618:16, 13 March 2025 diff hist −1 stat946W25 →Performance, Limitations & Future Work
- 18:1518:15, 13 March 2025 diff hist +5 stat946W25 →Performance, Limitations & Future Work
- 18:1518:15, 13 March 2025 diff hist −3 stat946W25 →Key Approaches to Linear Attention
- 18:1418:14, 13 March 2025 diff hist +89 stat946W25 →Key Approaches to Linear Attention
- 18:0918:09, 13 March 2025 diff hist +114 stat946W25 →Key Approaches to Linear Attention
- 18:0518:05, 13 March 2025 diff hist +4,584 stat946W25 →Topic 10: Linear Attention
11 March 2025
- 01:2501:25, 11 March 2025 diff hist +2 stat946W25 →Retentive Network (RetNet): A Successor to Transformer for Large Language Models
- 01:2301:23, 11 March 2025 diff hist +226 stat946W25 →Retentive Network (RetNet): A Successor to Transformer for Large Language Models
- 00:5400:54, 11 March 2025 diff hist 0 stat946W25 →Topic 10: Linear Attention
- 00:5200:52, 11 March 2025 diff hist +327 stat946W25 →Key Approaches to Linear Attention
- 00:5000:50, 11 March 2025 diff hist +476 N File:retnet comparison.png comparing retnet to other models. from @misc{sun2023retentivenetworksuccessortransformer, title={Retentive Network: A Successor to Transformer for Large Language Models}, author={Yutao Sun and Li Dong and Shaohan Huang and Shuming Ma and Yuqing Xia and Jilong Xue and Jianyong Wang and Furu Wei}, year={2023}, eprint={2307.08621}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2307.08621}, } current
- 00:4000:40, 11 March 2025 diff hist +1,199 stat946W25 →Retentive Network (RetNet): A Successor to Transformer for Large Language Models
10 March 2025
- 19:0519:05, 10 March 2025 diff hist −44 stat946W25 →Retentive Network: A Successor to Transformer for Large Language Models
- 19:0419:04, 10 March 2025 diff hist +325 stat946W25 →Retentive Network: A Successor to Transformer for Large Language Models
- 18:4518:45, 10 March 2025 diff hist 0 stat946W25 →Topic 10: Linear Attention
- 18:4318:43, 10 March 2025 diff hist +181 stat946W25 →Retentive Network: A Successor to Transformer for Large Language Models
- 18:1918:19, 10 March 2025 diff hist +921 stat946W25 No edit summary
- 17:3917:39, 10 March 2025 diff hist +347 N File:retnet impossible triangle.png impossible triangle from @article{sun2023retentive, title={Retentive network: A successor to transformer for large language models}, author={Sun, Yutao and Dong, Li and Huang, Shaohan and Ma, Shuming and Xia, Yuqing and Xue, Jilong and Wang, Jianyong and Wei, Furu}, journal={arXiv preprint arXiv:2307.08621}, year={2023} } current