LoongForge:打造跨模態大模型訓練的模組化引擎 | LoongForge: A Modular Engine for Cross-Modal Large Model Training

一個模組化、可擴展且高效能的訓練框架,專為語言、多模態與具身智能模型而生。 | A modular, scalable, and highly efficient training framework designed for language, multimodal, and embodied models.

🔎 工具速覽 / AT A GLANCE

CategoryAI Training Framework
PricingOpen Source
BestForLarge-scale model pre-training and SFT across heterogeneous hardware
GitHub Stars⭐ 136

🚀 引言 / Introduction

LoongForge 是百度 Baige AI 基礎設施平台推出的 Loong 開源系列之一。基於 Megatron-LM 並進行深度優化,它提供了一個高效且具備高度擴展性的解決方案,涵蓋從預訓練、持續預訓練到監督微調(SFT)的全過程。 | LoongForge is part of the Loong open-source series from Baidu's Baige AI infrastructure platform. Built upon and significantly enhancing Megatron-LM, it delivers an efficient and highly extensible solution covering pre-training, continued pre-training, and supervised fine-tuning (SFT).

🛠️ 核心功能 / Key Features

原生支持 LLM、VLM、VLA 及擴散模型,透過靈活的組件抽象化輕鬆添加新模態。 | Natively supports LLMs, VLMs, VLAs, and Diffusion Models, making the addition of new multi-modal variants effortless through flexible composition abstraction.

提供先進的並行計算與內存管理優化,顯著降低訓練成本並加速開發。 | Provides advanced optimizations in parallelism and memory management, significantly reducing training costs and accelerating model development.

原生支持 NVIDIA GPU 與 崑崙 (Kunlun) XPU,確保在不同硬件集群間的無縫遷移與穩定擴展。 | Native, high-performance support for both NVIDIA GPUs and Kunlun XPUs, ensuring seamless migration and stable training across diverse hardware clusters.

💡 技術亮點 / Tech Highlights

採用配置驅動方式,可自由組合可替換的 ViT 與 LLM 組件來構建 VLM。 | A configuration-driven approach to assemble VLMs using interchangeable ViT and LLM components.

允許為不同模型組件(如 Vision Encoder 與 LLM)分配獨立的張量/數據並行規模,優化吞吐量。 | Enables assigning independent Tensor/Data Parallel sizes to different model components for optimal throughput and memory efficiency.

將視覺編碼器與 LLM 拆分為獨立任務,消除流水線氣泡並防止 ViT 計算阻塞 LLM。 | Separates vision encoder and LLM into independent tasks, eliminating pipeline bubbles and preventing ViT computation from blocking LLM throughput.

利用負載感知數據重新分發算法,優化由數據打包引起的數據並行不平衡。 | Leverages a load-aware data redistribution algorithm to optimize data parallel imbalances caused by data packing.

📦 快速上手 / Quick Start

安裝依賴並配置對應的硬件環境(NVIDIA GPU 或 Kunlun XPU)。 | Install dependencies and configure the corresponding hardware environment (NVIDIA GPU or Kunlun XPU).

透過配置文件定義 ViT 與 LLM 組件,組裝所需的多模態模型結構。 | Define ViT and LLM components via configuration files to assemble the desired multi-modal model architecture.

配置並行策略(Tensor/Data Parallel)並啟動預訓練或 SFT 任務。 | Configure parallelism strategies (Tensor/Data Parallel) and launch pre-training or SFT tasks.

準備好試試 LoongForge:打造跨模態大模型訓練的模組化引擎 | LoongForge: A Modular Engine for Cross-Modal Large Model Training 了嗎?

Ready to try LoongForge:打造跨模態大模型訓練的模組化引擎 | LoongForge: A Modular Engine for Cross-Modal Large Model Training?

前往 GitHub 頁面 →

留言

這個網誌中的熱門文章

[Security] wpa_supplicant setup

[拆機] Nexus5 更換背蓋、電池

[我的MAC Air] 2012年中,MAC Air SSD升級