發表文章

目前顯示的是有「Evaluation Framework」標籤的文章

AgentOdyssey:解鎖大模型持續學習的開放式文字遊戲生成引擎 | AgentOdyssey: An Open-Ended Text Game Engine for Test-Time Continual Learning Agents

圖片
一個專為測試大模型在長程規劃與持續學習能力而設計的開放式遊戲生成與評估框架。 / An open-ended game generation and evaluation framework designed to test the long-horizon planning and continual learning capabilities of LLM agents. 🔎 工具速覽 / AT A GLANCE Category AI Agent Evaluation Framework / LLM Research Tool Pricing Free / Open Source BestFor AI researchers studying continual learning, long-horizon planning, and autonomous agents. GitHub Stars ⭐ 37 🚀 引言 / Introduction AgentOdyssey 是一個輕量級的交互式環境,旨在挑戰 AI Agent 在未知環境中的適應能力。它不僅能生成全新的長程文字遊戲,更提供了一套嚴謹的評估機制,用以衡量 Agent 在測試時的持續學習表現。 / AgentOdyssey is a lightweight interactive environment designed to challenge the adaptability of AI agents in unknown environments. It not only generates novel long-horizon text games but also provides a rigorous evaluation mechanism to measure the test-time continual learning performance of agents. 🛠️ 核心功能 / Key Fea...