File list
Jump to navigation
Jump to search
This special page shows all uploaded files.
Date | Name | Thumbnail | Size | User | Description | Versions |
---|---|---|---|---|---|---|
14:31, 19 March 2025 | results on WikiText2.png (file) | ![]() |
144 KB | K4liang | 2 | |
13:41, 19 March 2025 | QwithRMSNorm.png (file) | ![]() |
172 KB | K4liang | 1 | |
13:38, 19 March 2025 | rmsnorm.png (file) | ![]() |
196 KB | K4liang | 1 | |
19:32, 18 March 2025 | echoatt.png (file) | ![]() |
324 KB | Aelmancy | https://doi.org/10.48550/arXiv.2409.14595 | 1 |
23:47, 17 March 2025 | GLA Results2.png (file) | ![]() |
112 KB | W33jiang | 1 | |
23:37, 17 March 2025 | GLA Results.png (file) | ![]() |
131 KB | W33jiang | 1 | |
23:22, 17 March 2025 | RetNet Results.png (file) | ![]() |
171 KB | W33jiang | 1 | |
20:09, 17 March 2025 | invariance Theorem.png (file) | ![]() |
160 KB | K4liang | 1 | |
12:52, 17 March 2025 | sortcut.png (file) | ![]() |
125 KB | J3bright | 1 | |
11:12, 17 March 2025 | cost flashfft.png (file) | ![]() |
90 KB | J3bright | 1 | |
23:57, 15 March 2025 | BigBird Results.png (file) | ![]() |
480 KB | W33jiang | 1 | |
23:46, 15 March 2025 | BigBird Sparse Attention.png (file) | ![]() |
142 KB | W33jiang | 1 | |
23:05, 15 March 2025 | Flash Attentionv3-Results.png (file) | ![]() |
142 KB | X56tan | 1 | |
23:04, 15 March 2025 | Flash Attentionv3-PingPong.png (file) | ![]() |
57 KB | X56tan | 1 | |
00:25, 15 March 2025 | Screenshot 2025-03-15.png (file) | ![]() |
43 KB | M2ghorba | 1 | |
21:54, 14 March 2025 | KD.png (file) | ![]() |
115 KB | P2zheng | Sequence-level knowledge distillation | 1 |
18:30, 14 March 2025 | Screenshot 2025-03-14 182945.png (file) | ![]() |
28 KB | A4ngan | 1 | |
21:07, 13 March 2025 | Flash Attention V2 Attention forward + backward speed on A100 GPU.png (file) | ![]() |
255 KB | W33jiang | 1 | |
21:32, 12 March 2025 | RobustAlgorithm.jpg (file) | ![]() |
60 KB | K4liang | 2 | |
00:50, 11 March 2025 | retnet comparison.png (file) | ![]() |
71 KB | Aelmancy | comparing retnet to other models. from @misc{sun2023retentivenetworksuccessortransformer, title={Retentive Network: A Successor to Transformer for Large Language Models}, author={Yutao Sun and Li Dong and Shaohan Huang and Shuming Ma and Yuqing Xia and Jilong Xue and Jianyong Wang and Furu Wei}, year={2023}, eprint={2307.08621}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2307.08621}, } | 1 |
23:01, 10 March 2025 | RetNet dual.png (file) | ![]() |
64 KB | M54rahma | Dual form of RetNet | 1 |
17:39, 10 March 2025 | retnet impossible triangle.png (file) | ![]() |
65 KB | Aelmancy | impossible triangle from @article{sun2023retentive, title={Retentive network: A successor to transformer for large language models}, author={Sun, Yutao and Dong, Li and Huang, Shaohan and Ma, Shuming and Xia, Yuqing and Xue, Jilong and Wang, Jianyong and Wei, Furu}, journal={arXiv preprint arXiv:2307.08621}, year={2023} } | 1 |
00:26, 8 March 2025 | Overview of Sparse Sinkhorn Attention.png (file) | ![]() |
87 KB | X56tan | 1 | |
20:18, 7 March 2025 | Mamba2-VennDiag.png (file) | ![]() |
57 KB | X56tan | 1 | |
20:17, 7 March 2025 | Mamba2-SSD.png (file) | ![]() |
69 KB | X56tan | 1 | |
20:17, 7 March 2025 | Mamba2-architecture.png (file) | ![]() |
55 KB | X56tan | 1 | |
20:05, 7 March 2025 | Mamba2-Matrix Efficient Algo.png (file) | ![]() |
145 KB | X56tan | 1 | |
14:03, 7 March 2025 | memory-recall-tradeoff.png (file) | ![]() |
132 KB | Yc24wang | 1 | |
13:10, 7 March 2025 | Mamba-Architecture.png (file) | ![]() |
119 KB | Yc24wang | 1 | |
21:33, 6 March 2025 | H3 synthetic tasks eval.png (file) | ![]() |
75 KB | W33jiang | 2 | |
21:24, 6 March 2025 | Synthetic tasks.png (file) | ![]() |
80 KB | W33jiang | 1 | |
21:21, 6 March 2025 | H3 layer.png (file) | ![]() |
54 KB | W33jiang | 1 | |
18:07, 3 March 2025 | 13.13.png (file) | ![]() |
35 KB | Rtymkow | 1 | |
15:41, 2 March 2025 | mode collapse.png (file) | ![]() |
178 KB | Ksuszek | 1 | |
15:19, 2 March 2025 | RNN Structure.png (file) | ![]() |
75 KB | M54rahma | RNN structure | 1 |
21:01, 1 March 2025 | GAN part d.png (file) | ![]() |
28 KB | Ksuszek | 1 | |
21:01, 1 March 2025 | GAN part c.png (file) | ![]() |
28 KB | Ksuszek | 1 | |
21:01, 10 February 2025 | 10.11.png (file) | ![]() |
120 KB | Rtymkow | 1 | |
18:48, 7 February 2025 | RNN plot.png (file) | ![]() |
215 KB | W258xu | 1 | |
15:58, 7 February 2025 | ouput.png (file) | ![]() |
188 KB | Thudon | 1 | |
15:57, 7 February 2025 | output.png (file) | ![]() |
188 KB | Thudon | 1 | |
12:40, 7 February 2025 | sentence similarity matrix.png (file) | ![]() |
15 KB | Fjean | 1 | |
10:32, 7 February 2025 | Screenshot 2025-02-07 093140.png (file) | ![]() |
62 KB | A22amiri | 1 | |
09:14, 7 February 2025 | Screenshot 2025-02-07 081441.png (file) | ![]() |
240 KB | A22amiri | 1 | |
19:52, 4 February 2025 | 7.10a.png (file) | ![]() |
119 KB | Rtymkow | 1 | |
16:28, 4 February 2025 | double pendulum test.png (file) | ![]() |
78 KB | Ksuszek | 1 | |
01:46, 31 January 2025 | LSTM v.s. Dense model.png (file) | ![]() |
271 KB | Z238zhan | The plot of the accuracy of the LSTM model and the dense model | 1 |
20:15, 30 January 2025 | Recurrent NN .png (file) | ![]() |
429 KB | Z238zhan | Screenshot from Lecture 8 | 1 |
08:10, 30 January 2025 | Exercise8 1.png (file) | ![]() |
33 KB | C63ng | Plot for exercise 8.1 | 1 |
18:35, 28 January 2025 | different pooling strategies.png (file) | ![]() |
257 KB | Z238zhan | Validation accuracy for different pooling strategies on CIFAR-10 dataset | 1 |