MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Review 2026-06-09 13 분 소요 0. Introduction
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Review 2026-06-08 11 분 소요 0. Introduction
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Review 2026-06-07 11 분 소요 0. Introduction
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance Review 2026-06-06 10 분 소요 0. Introduction