User contributions for M2ghorba
Jump to navigation
Jump to search
6 April 2025
- 00:2400:24, 6 April 2025 diff hist −16 stat946W25 →Overall Method
- 00:2200:22, 6 April 2025 diff hist −1 stat946W25 →1. Initial Retrieval Step:
- 00:2200:22, 6 April 2025 diff hist −2 stat946W25 →4. Iteration and Stopping:
- 00:2100:21, 6 April 2025 diff hist −1 stat946W25 →2. Reason Step:
- 00:2000:20, 6 April 2025 diff hist −1 stat946W25 →1. Initial Retrieval Step:=
- 00:1900:19, 6 April 2025 diff hist +3,476 stat946W25 →Interleaving Retrieval with Chain-of-Thought Reasoning
5 April 2025
- 23:5823:58, 5 April 2025 diff hist +1,661 stat946W25 →Interleaving Retrieval with Chain-of-Thought Reasoning
29 March 2025
- 04:2504:25, 29 March 2025 diff hist −2 stat946W25 →Learning Transferable Visual Models From Natural Language Supervision
- 04:2404:24, 29 March 2025 diff hist +1 stat946W25 →Architecture and Methodology
- 04:2304:23, 29 March 2025 diff hist +31 stat946W25 →Architecture and Methodology
- 04:1704:17, 29 March 2025 diff hist −1 stat946W25 →Training Data and Preprocessing=
- 04:1604:16, 29 March 2025 diff hist +2,977 stat946W25 →Robust Speech Recognition via Large-Scale Weak Supervision
- 03:0503:05, 29 March 2025 diff hist 0 N File:9-1.png No edit summary current
- 02:4402:44, 29 March 2025 diff hist +65 stat946W25 →Topic 19: MM-LLMs
- 02:3702:37, 29 March 2025 diff hist −1 stat946W25 →Zero-Shot Transfer & Evaluation
- 02:3602:36, 29 March 2025 diff hist +3 stat946W25 →Zero-Shot Transfer & Evaluation
- 02:3502:35, 29 March 2025 diff hist +6 stat946W25 →Zero-Shot Transfer & Evaluation
- 02:3402:34, 29 March 2025 diff hist −6 stat946W25 →Pretraining with Contrastive Learning
- 02:3202:32, 29 March 2025 diff hist +8 stat946W25 →Pretraining with Contrastive Learning
- 02:2802:28, 29 March 2025 diff hist +18 stat946W25 →Architecture and Methodology
- 02:2602:26, 29 March 2025 diff hist +12 stat946W25 →Pretraining with Contrastive Learning
- 02:1802:18, 29 March 2025 diff hist +3,951 stat946W25 →Topic 19: MM-LLMs
22 March 2025
- 00:3300:33, 22 March 2025 diff hist +603 stat946W25 →Transformer-VQ: Linear-Time Transformers via Vector Quantization
- 00:2000:20, 22 March 2025 diff hist −4 stat946W25 →Self-Attention in Transformers
- 00:1900:19, 22 March 2025 diff hist −9 stat946W25 →Vector Quantization (VQ) Mechanism
- 00:1900:19, 22 March 2025 diff hist −8 stat946W25 →Training Objective
- 00:1600:16, 22 March 2025 diff hist −4 stat946W25 →Mathematical Formulation
- 00:0300:03, 22 March 2025 diff hist +2,557 stat946W25 →Transformer-VQ: Linear-Time Transformers via Vector Quantization
21 March 2025
- 23:0823:08, 21 March 2025 diff hist +975 stat946W25 →Key Innovations
- 22:5422:54, 21 March 2025 diff hist +561 stat946W25 →Topic 6: KV Cache Compression
15 March 2025
- 00:3100:31, 15 March 2025 diff hist 0 stat946W25 →Findings: Attention Layers Are Less Crucial Than MLP Layers
- 00:3100:31, 15 March 2025 diff hist 0 stat946W25 →Topic 5: KD / Pruning / Sharing
- 00:2900:29, 15 March 2025 diff hist −6 stat946W25 →Findings: Attention Layers Are Less Crucial Than MLP Layers
- 00:2900:29, 15 March 2025 diff hist +115 stat946W25 →Findings: Attention Layers Are Less Crucial Than MLP Layers
- 00:2500:25, 15 March 2025 diff hist 0 N File:Screenshot 2025-03-15.png No edit summary current
- 00:1500:15, 15 March 2025 diff hist +241 stat946W25 →Findings: Attention Layers Are Less Crucial Than MLP Layers
14 March 2025
- 23:3923:39, 14 March 2025 diff hist −6 stat946W25 →Findings: Attention Layers Are Less Crucial Than MLP Layers
- 23:3823:38, 14 March 2025 diff hist +703 stat946W25 →Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models
- 23:3123:31, 14 March 2025 diff hist +163 stat946W25 →Method: Selective Layer Skipping
- 23:2723:27, 14 March 2025 diff hist +2 stat946W25 →Method: Selective Layer Skipping
- 23:2723:27, 14 March 2025 diff hist +55 stat946W25 →Method: Selective Layer Skipping
- 22:5422:54, 14 March 2025 diff hist +61 stat946W25 →Method: Selective Layer Skipping
- 22:4922:49, 14 March 2025 diff hist +34 stat946W25 →Method: Selective Layer Skipping
- 22:4722:47, 14 March 2025 diff hist +1,068 stat946W25 →Method: Selective Layer Skipping
- 22:3822:38, 14 March 2025 diff hist −15 stat946W25 →Method: Selective Layer Skipping
- 22:3722:37, 14 March 2025 diff hist +18 stat946W25 →Method: Selective Layer Skipping
- 22:3522:35, 14 March 2025 diff hist +1,755 stat946W25 →Topic 5: KD / Pruning / Sharing
13 March 2025
- 00:1700:17, 13 March 2025 diff hist +1,079 stat946W25 →Structured State Space (S4)
12 March 2025
- 23:2323:23, 12 March 2025 diff hist +492 stat946W25 →Gated Linear Attention (GLA)
- 23:0523:05, 12 March 2025 diff hist +414 stat946W25 →Introduction