Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL Review 2026-06-15 14 분 소요 0. Introduction
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation Review 2026-06-14 11 분 소요 0. Introduction
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning Review 2026-06-11 11 분 소요 0. Introduction