發表文章

目前顯示的是有「Deep Learning」標籤的文章

🚀 OpenMythos: Unlocking the Secrets of Recurrent-Depth Transformers | 揭秘遞歸深度 Transformer 的推理之鑰

圖片
[CN] 在當前 LLM 的開發中,開發者面臨著一個巨大的痛點:如何在不增加參數總量的情況下,提升模型處理複雜推理的能力?傳統的 Transformer 依賴於堆疊數百層不同的權重,這導致了巨大的顯存壓力與計算冗餘。OpenMythos 旨在擊碎這一困境,它基於對 Claude Mythos 架构的理論重建,探索『遞歸深度 Transformer (RDT)』的可能性——讓模型在單次前向傳播中通過權重循環來『深思』,而非單純依靠增加層數。這正是當前 AI 領域追求『計算自適應推理』的核心熱點。 [EN] In current LLM development, developers face a massive pain point: how to enhance a model's complex reasoning capabilities without exponentially increasing the total parameter count? Traditional Transformers rely on stacking hundreds of unique layers, leading to immense memory pressure and computational redundancy. OpenMythos shatters this limitation by theoretically reconstructing the Claude Mythos architecture, exploring the potential of Recurrent-Depth Transformers (RDT). It enables the model to 'think deeper' via weight loops within a single forward pass rather than simply adding more layers, hitting the current AI hotspot of 'compute-adaptive reasoning'. 🛠...