AI Tool: OpenMythos

OpenMythos
# [TITLE]
## 🔓 解碼 Claude Mythos:OpenMythos 帶你探索「循環深度 Transformer」的推理之謎 🌀
## Decoding Claude Mythos: Exploring the Mystery of "Recurrent-Depth Transformers" with OpenMythos 🌀

---

# [IMAGE_PROMPT]
*A hyper-realistic 8k cinematic shot of a futuristic neural network architecture. The image features a glowing, iridescent, looped crystalline structure representing a recurrent transformer block, with data streams flowing in a recursive spiral. Neon blue and deep violet lighting, cybernetic circuitry background, high-tech laboratory atmosphere, volumetric lighting, Unreal Engine 5 render, extremely detailed.*

---

# [LABELS]
`#LLM` `#OpenSource` `#Transformer` `#RecurrentDepthTransformer` `#ClaudeMythos` `#DeepLearning` `#AIArchitecture` `#GitHub` `#Python` `#NeuralNetworks`

---

# [CONTENT]

### 🚀 引言 | Introduction

在當前 LLM 的軍備競賽中,大家都在關注模型參數的規模,但真正的「思考能力」是否能透過架構創新而實現?**OpenMythos** 是一個令人興奮的開源嘗試,它試圖從第一原理出發,基於公開研究文獻重建被傳說中的「Claude Mythos」架構。
In the current LLM arms race, everyone is focused on parameter scale, but can true "reasoning capability" be achieved through architectural innovation? **OpenMythos** is an exciting open-source endeavor that attempts to reconstruct the rumored "Claude Mythos" architecture from first principles, based on publicly available research literature.

這個專案挑戰了傳統 Transformer 的線性堆疊模式,提出了一種「循環深度 (Recurrent-Depth)」的概念:讓模型在單次前向傳播中,透過隱藏狀態的多次循環來「深思熟慮」,而無需輸出中間 Token。
This project challenges the linear stacking pattern of traditional Transformers, introducing the concept of "Recurrent-Depth": allowing the model to "think deeply" by looping hidden states multiple times within a single forward pass, without the need to output intermediate tokens.

---

### 🛠️ 核心功能解析 | Key Features

OpenMythos 並非簡單的層級堆疊,而是一個由三個階段組成的複雜系統:
OpenMythos is not a simple stack of layers, but a sophisticated system composed of three distinct stages:

1. **Prelude (序曲)**: 標準的 Transformer 區塊,負責將初始輸入進行初步編碼。
**Prelude**: Standard transformer blocks responsible for the initial encoding of the input.
2. **Recurrent Block (循環區塊)**: 核心創新所在。該區塊會根據 `max_loop_iters` 進行多次迭代。與傳統模型不同,它重複使用同一組權重,透過隱藏狀態 $h$ 的更新來深化推理。
**Recurrent Block**: The core innovation. This block iterates multiple times based on `max_loop_iters`. Unlike traditional models, it reuses the same weights, deepening reasoning through the update of hidden state $h$.
3. **Coda (終曲)**: 最後的標準 Transformer 層,將循環後的高度抽象狀態轉化為最終輸出。
**Coda**: Final standard transformer layers that transform the highly abstract state after looping into the final output.

此外,它還支持 **MLA (Multi-head Latent Attention)** 與 **GQA (Grouped-Query Attention)** 的切換,並在前饋網路(FFN)中採用了**稀疏 MoE (Mixture of Experts)** 結構,使其具備極高的計算適應性。
Additionally, it supports switching between **MLA (Multi-head Latent Attention)** and **GQA (Grouped-Query Attention)**, and utilizes a **sparse MoE (Mixture of Experts)** structure in the feed-forward networks, granting it exceptional compute-adaptability.

---

### 💡 技術亮點與創新 | Why it matters?

**為什麼這比單純增加層數更強大?**
**Why is this more powerful than simply adding more layers?**

**1. 靜默推理 (Silent Reasoning):**
不同於 Chain-of-Thought (CoT) 需要輸出文字才能思考,OpenMythos 的推理發生在「潛在空間 (Latent Space)」中。這意味著模型在輸出第一個字之前,已經在內部進行了多次邏輯迴路。
**Silent Reasoning:** Unlike Chain-of-Thought (CoT) which requires text output to "think," OpenMythos's reasoning happens within the "Latent Space." This means the model performs multiple logical loops internally before emitting a single token.

**2. 防止信號漂移 (Preventing Signal Drift):**
為了避免在多次循環中遺忘原始輸入,OpenMythos 引入了「輸入注入 (Input Injection)」機制:$h_{t+1} = A \cdot h_t + B \cdot e + \text{Transformer}(h_t, e)$。其中 $e$ 是 Prelude 的編碼,確保模型在深化思考時始終緊扣主題。
**Preventing Signal Drift:** To avoid forgetting the original input during multiple loops, OpenMythos introduces an "Input Injection" mechanism: $h_{t+1} = A \cdot h_t + B \cdot e + \text{Transformer}(h_t, e)$. Here, $e$ is the encoding from the Prelude, ensuring the model remains anchored to the original prompt while deepening its thought process.

**3. 計算自適應 (Compute-Adaptive):**
透過調整循環次數 (`n_loops`),使用者可以在「快速反應」與「深度思考」之間取得平衡,而不需要更換模型權重。
**Compute-Adaptive:** By adjusting the number of loops (`n_loops`), users can balance between "fast response" and "deep thinking" without needing to switch model weights.

---

### 📦 快速上手 | Quick Start Guidance

安裝非常簡單,你可以直接使用 `pip` 或 `uv`:
Installation is straightforward; you can use `pip` or `uv` directly:

```bash
pip install open-mythos
# or
uv pip install open-mythos
```

**簡單實作範例 (Minimal Implementation):**
你可以快速定義一個配置並初始化模型。以下是使用 MLA 注意力機制的示例:
You can quickly define a configuration and initialize the model. Here is an example using the MLA attention mechanism:

```python
import torch
from open_mythos.main import OpenMythos, MythosConfig

# 配置參數:定義維度、專家數量及循環次數
cfg = MythosConfig(
vocab_size=1000, dim=256, n_heads=8,
max_loop_iters=4, n_experts=8,
attn_type="mla", kv_lora_rank=32, q_lora_rank=64
)

model = OpenMythos(cfg)

# 模擬輸入並設定循環 4 次進行推理
ids = torch.randint(0, cfg.vocab_size, (2, 16))
logits = model(ids, n_loops=4)
print(f"Logits shape: {logits.shape}")
```

---

### 🔗 專案資源與連結 | Conclusion & GitHub Link

OpenMythos 不僅僅是一個代碼庫,它更像是一個關於 LLM 未來的「理論實驗室」。它向我們證明了:推理能力的提升不一定依賴於更大的參數規模,更有可能的突破點在於如何更高效地利用既有權重進行遞迴處理。
OpenMythos is more than just a codebase; it is a "theoretical laboratory" for the future of LLMs. It proves that improving reasoning capabilities does not necessarily depend on larger parameter scales, but rather on how to more efficiently utilize existing weights through recursive processing.

如果你對 Transformer 的演進、遞迴神經網絡的復興或 Claude 的底層邏輯感興趣,這個專案絕對值得 Star!
If you are interested in the evolution of Transformers, the revival of recurrent neural networks, or the underlying logic of Claude, this project is absolutely worth a star!

👉 **GitHub Repository:** [kyegomez/OpenMythos](https://github.com/kyegomez/OpenMythos)
👉 **Documentation:** Check out the `docs/open_mythos.md` in the repo for a full API reference.

留言

這個網誌中的熱門文章

[Security] wpa_supplicant setup

[拆機] Nexus5 更換背蓋、電池

[我的MAC Air] 2012年中,MAC Air SSD升級