Kronos: A Foundation Model for the Language of Financial Markets Review

2026-05-21 12 분 소요

0. Introduction

Kronos는 financial time series를 일반적인 time-series forecasting 문제로만 보지 않는다. 이 논문은 candlestick 또는 K-line data를 금융 시장의 언어처럼 보고, OHLCVA record를 discrete token으로 바꾼 뒤 decoder-only Transformer로 next-token prediction을 수행한다. TimesFM이나 Chronos 같은 general TSFM의 아이디어를 금융 데이터에 그대로 가져오는 것이 아니라, 금융 K-line의 noise, non-stationarity, price-volume interaction, heavy-tail behavior에 맞춰 tokenizer와 pretraining corpus를 다시 설계한다는 점이 핵심이다.

특히 흥미로운 지점은 이 논문이 forecasting model 하나를 만드는 데서 멈추지 않는다는 점이다. Kronos는 price series forecasting, return forecasting, realized volatility forecasting, synthetic K-line generation, investment simulation을 하나의 discrete autoregressive framework로 묶는다. 즉 금융 데이터를 continuous regression target으로 직접 예측하는 대신, 시장 상태를 discrete code sequence로 압축하고, 그 code sequence를 language model처럼 생성한다.

한 줄 요약: Kronos는 OHLCVA K-line을 coarse-to-fine discrete token으로 양자화하고, 12B+ K-line record 기반 autoregressive pretraining으로 여러 금융 예측 및 생성 task를 통합하려는 financial time-series foundation model이다.

이 논문을 지금 볼 가치가 있는 이유는 다음과 같음.

general TSFM이 금융 데이터에서 항상 잘 작동하지 않는 이유를 domain-specific tokenizer와 corpus 관점에서 보여준다.
time-series foundation model을 단순 forecasting backbone이 아니라 prediction + generation + simulation을 함께 처리하는 generative model로 확장한다.
금융 시계열에서 discrete tokenization이 왜 유리할 수 있는지, 그리고 coarse-to-fine factorization이 어떤 trade-off를 만드는지 보기 좋다.

Kronos의 핵심은 단순하다. 금융 시장 데이터에는 일반적인 time series와 다른 구조적 편향이 있고, foundation model을 쓰려면 model size보다 먼저 representation contract를 다시 설계해야 한다. 여기서 representation contract란 OHLCVA record를 어떤 token으로 만들고, 그 token을 어떤 autoregressive order로 생성할지에 대한 약속이다.

1. Problem Setting

1-1. Problem definition

이 논문이 겨냥하는 문제는 financial K-line data를 위한 범용 foundation model을 만드는 것이다.
K-line은 Open, High, Low, Close, Volume, Amount를 포함하는 multivariate time series다.
금융 K-line은 low signal-to-noise ratio, strong non-stationarity, price-volume interaction, regime shift, heavy-tail event를 가진다.
그래서 일반적인 time-series benchmark에서 좋은 TSFM을 그대로 가져와도 금융 forecasting, volatility estimation, synthetic generation에서는 기대만큼 잘 작동하지 않을 수 있다.
논문은 이 문제를 continuous regression이 아니라 discrete market language modeling 문제로 바꾼다.

1-2. Why previous approaches are insufficient

전통적인 forecasting model은 task마다 별도 구조를 설계하는 경우가 많다. price forecasting, volatility forecasting, generation, backtesting signal generation이 하나의 framework로 묶이지 않는다.
general-purpose TSFM은 다양한 domain의 time series를 학습하지만, 금융 데이터 비중이 작고 K-line 특유의 price-volume 구조를 충분히 반영하지 못할 수 있다.
continuous prediction 방식은 OHLCVA의 joint structure를 직접 회귀해야 하므로, noise와 tail event에 취약할 수 있다.
금융 시장에서는 short-term pattern이 매우 약하고 regime이 자주 바뀌기 때문에, 단순 loss 감소가 실제 투자 signal quality로 이어지는지 따로 확인해야 한다.
기존 generative time-series model도 synthetic sequence를 만들 수는 있지만, forecasting과 investment simulation까지 같은 pretrained backbone으로 처리하는 구조는 상대적으로 덜 정리되어 있었다.

결국 Kronos가 다루는 문제는 더 좋은 forecasting head를 붙이는 문제가 아니다. 핵심은 금융 K-line을 foundation model이 다룰 수 있는 token language로 재구성하는 것이다.

2. Core Idea

2-1. Main contribution

Kronos의 핵심 기여는 세 가지로 정리할 수 있다.

첫째, OHLCVA K-line record를 continuous vector 그대로 쓰지 않고 hierarchical discrete token으로 바꾼다.
둘째, 각 token을 coarse subtoken과 fine subtoken으로 나누고, coarse를 먼저 예측한 뒤 fine residual을 예측하는 autoregressive order를 사용한다.
셋째, 45개 global exchange에서 수집한 12B+ K-line record로 model family를 pretrain하고, predictive task와 generative task를 함께 평가한다.

여기서 중요한 포인트는 tokenizer와 language model을 분리해서 보는 것이다. Tokenizer는 시장의 continuous state를 discrete vocabulary로 압축한다. Decoder-only Transformer는 그 token sequence의 temporal dynamics를 학습한다. 이 분리가 있기 때문에 Kronos는 forecasting을 text generation과 비슷한 형태로 처리할 수 있다.

2-2. Design intuition

Kronos의 설계 직관은 다음과 같다.

금융 K-line은 continuous value 자체보다 shape, relative movement, volume interaction, volatility pattern이 중요하다.
따라서 raw value를 직접 회귀하기보다, 시장 상태를 discrete token으로 바꾸면 pattern-level modeling이 쉬워질 수 있다.
하지만 single large vocabulary를 그대로 쓰면 embedding과 output head가 너무 커진다.
그래서 token을 coarse subtoken과 fine subtoken으로 나눈다.
coarse subtoken은 주요 구조를 잡고, fine subtoken은 residual detail을 보완한다.

논문이 쓰는 핵심 factorization은 아래처럼 이해할 수 있다.

\[p(b_t | b_{<t}) = p(b_t^c | b_{<t}) * p(b_t^f | b_{<t}, b_t^c)\]

여기서 $b_t^c$는 coarse subtoken이고, $b_t^f$는 fine subtoken이다. 즉 Kronos는 한 시점의 K-line token을 한 번에 맞히지 않고, coarse-to-fine order로 생성한다. 이 설계는 financial pattern을 두 단계로 해석하게 만든다. 먼저 큰 market state를 보고, 그 다음 세부 변동을 채우는 방식이다.

내가 보기엔 이 논문의 핵심은 tokenizer를 단순 preprocessing으로 취급하지 않는다는 점이다. Kronos에서 tokenizer는 downstream model의 입력 포맷을 정하는 부품이 아니라, 시장 데이터의 의미 공간을 정의하는 가장 중요한 modeling layer다.

3. Architecture / Method

3-1. Overview

Item	Description
Goal	Financial K-line sequence를 위한 domain-specific foundation model 구축
Input	OHLCVA K-line record
Tokenizer	Transformer autoencoder + Binary Spherical Quantization
Token structure	Coarse subtoken + fine subtoken
Backbone	Decoder-only autoregressive Transformer
Output usage	Forecasting, volatility estimation, synthetic K-line generation, investment simulation
Key difference	Continuous regression이 아니라 discrete financial language modeling으로 문제를 바꿈

3-2. Module breakdown

1) K-line tokenization

입력은 각 시점의 OHLCVA vector다.
Kronos는 이를 Transformer-based autoencoder로 latent representation으로 바꾼다.
그 다음 Binary Spherical Quantization, 즉 BSQ를 사용해 continuous latent를 binary code로 양자화한다.
논문은 20-bit token을 사용하고, 이를 coarse subtoken과 fine subtoken으로 나눈다.
이 구조는 하나의 거대한 vocabulary를 직접 예측하는 비용을 줄이면서, 큰 effective vocabulary를 유지하려는 설계다.

Tokenizer 학습에는 reconstruction objective와 quantization objective가 함께 들어간다.

coarse-only reconstruction은 주요 market shape를 복원하도록 유도한다.
full-token reconstruction은 fine detail까지 복원하도록 유도한다.
BSQ commitment loss는 continuous latent와 binary code가 안정적으로 맞물리도록 정규화한다.

여기서 중요한 점은 coarse와 fine이 임의로 나뉘는 것이 아니라, loss 설계가 두 subtoken의 역할을 나누도록 강제한다는 것이다.

2) Hierarchical autoregressive modeling

Tokenized sequence는 decoder-only Transformer로 들어간다.
각 시점의 coarse subtoken과 fine subtoken embedding을 따로 만들고, concat 후 linear projection으로 fused input vector를 만든다.
Transformer는 causal attention으로 과거 token만 본다.
다음 시점에서는 먼저 coarse subtoken distribution을 예측한다.
이후 predicted coarse subtoken을 조건으로 fine subtoken을 예측한다.

학습 objective는 대략 아래처럼 볼 수 있다.

\[L_ar = - E_D sum_t [log p(b_t^c | b_{<t}) + log p(b_t^f | b_{<t}, b_t^c)]\]

흥미로운 점은 fine subtoken을 학습할 때 ground-truth coarse를 teacher forcing으로만 쓰지 않고, model이 예측한 coarse sample을 사용한다는 점이다. 논문은 이 방식이 multi-step inference에서 ground-truth token이 없는 상황과 학습 분포를 더 가깝게 만들어 exposure bias를 줄인다고 설명한다.

3) Model family

논문은 세 가지 Kronos model size를 제시한다.

Model	Layers	Hidden dim	Heads	Token bits	Params
Kronos-small	8	512	8	20	24.7M
Kronos-base	12	832	16	20	102.3M
Kronos-large	18	1664	32	20	499.2M

모든 모델은 decoder-only Transformer 계열이고, 논문은 최대 context length를 512 token으로 제한한다. 이는 resource constraint와 deployment budget을 고려한 선택이다. 긴 horizon은 더 긴 context를 쓰기보다, 1분, 5분, daily 같은 다른 temporal granularity를 통해 대응하는 방향으로 설명된다.

4) Temporal embedding and Transformer details

모델은 financial market의 intraday, weekly, monthly seasonality를 반영하기 위해 temporal embedding을 사용한다.
minute-of-day, hour-of-day, day-of-week, day-of-month, month-of-year feature가 embedding된다.
Transformer block에는 causal self-attention, RoPE, Pre-LN, RMSNorm이 사용된다.
학습에는 AdamW, cosine learning-rate schedule, linear warm-up이 사용된다.
warm-up은 peak learning rate의 10 percent에서 시작해 15000 step 동안 진행된다.

이 부분은 구조적으로 낯설지는 않다. novelty는 Transformer block 자체보다, K-line tokenization과 hierarchical prediction order에 있다.

5) Inference as market trajectory generation

Kronos의 inference는 text generation과 유사하다.

과거 K-line context를 token으로 변환한다.
미래 token을 autoregressive하게 sampling한다.
generated token을 decoder를 통해 continuous OHLCVA trajectory로 복원한다.
task에 따라 temperature, top-p, sample count를 조절한다.
forecasting처럼 precision이 중요한 task에서는 여러 future trajectory를 sampling한 뒤 평균을 내는 Monte Carlo rollout을 사용한다.

논문 설정에서 price series forecasting과 return forecasting은 temperature 0.6, top-p 0.90, sample count 10을 사용한다. realized volatility forecasting은 temperature 0.9, top-p 0.90, sample count 1이다. synthetic K-line generation은 temperature 1.0, top-p 0.95, sample count 1이다.

내가 보기엔 이 inference 설계는 꽤 중요하다. Kronos는 point forecast model이라기보다 probabilistic trajectory generator에 가깝다. 따라서 sampling hyperparameter와 aggregation 방식이 결과에 영향을 줄 수 있다.

4. Training / Data / Recipe

4-1. Data

Pretraining corpus는 논문의 가장 강한 주장 중 하나다.

Item	Description
Scale	12B+ K-line observations
Coverage	45 global exchanges
Frequencies	7 sampling frequencies
Asset coverage	stocks, futures, forex, options, crypto 등 금융 자산군
Input fields	OHLCVA 중심

Appendix에서는 data quality control도 중요하게 다룬다. Raw financial K-line은 missing value, illiquid period, stagnant price, structural break, abnormal price jump 같은 artifact를 포함할 수 있다. Kronos는 이를 필터링하기 위해 frequency-specific threshold를 둔다.

특히 preprocessing에서 인상적인 부분은 다음과 같다.

price field의 missing value는 imputation보다 sequence boundary로 취급한다.
volume과 amount missing value는 zero imputation을 사용한다.
학습 중 5 percent sample에서 volume과 amount를 random zero로 만들어, volume 정보가 없을 때도 price-only prediction이 가능하도록 regularization한다.
price jump, illiquidity, price stagnation을 기준으로 low-quality segment를 제거한다.

금융 데이터에서는 model architecture만큼 data cleaning이 중요하다. 이 논문은 그 점을 비교적 명확히 인식하고 있다.

4-2. Training strategy

학습은 두 단계로 볼 수 있다.

Tokenizer training
- Transformer autoencoder가 OHLCVA record를 discrete binary code로 양자화한다.
- Hierarchical reconstruction loss를 통해 coarse subtoken과 fine subtoken의 역할을 나눈다.
- Quantization objective로 BSQ codebook과 encoder output을 안정화한다.
Autoregressive pretraining
- Tokenized K-line sequence를 decoder-only Transformer로 학습한다.
- Objective는 next token의 coarse subtoken과 fine subtoken을 순차적으로 예측하는 negative log-likelihood다.
- Model size는 small, base, large로 확장한다.
- Raw corpus imbalance를 줄이기 위해 crypto, futures, forex data의 sampling weight를 높인다.

이 구조는 language model pretraining과 유사하지만, natural language token 대신 financial state token을 사용한다는 점이 다르다.

4-3. Engineering notes

실무 관점에서 재사용할 만한 recipe는 다음과 같다.

continuous OHLCVA를 그대로 regression하지 말고, discretization을 통해 pattern vocabulary를 먼저 만든다.
coarse-to-fine token split을 쓰면 large vocabulary의 표현력과 output head 비용 사이의 trade-off를 줄일 수 있다.
financial data는 missing value와 structural break 처리 방식이 중요하다. 잘못 이어 붙인 sequence는 모델이 잘못된 regime transition을 학습하게 만든다.
volume과 amount를 random zero로 만드는 regularization은 volume reliability가 낮거나 asset별 volume definition이 다른 환경에서 유용하다.
inference에서는 single deterministic prediction보다 multiple sampled trajectories를 평균내는 방식이 더 안정적일 수 있다.
다만 sampling temperature와 top-p는 task별로 달라지므로, deployment에서는 validation regime을 따로 잡아야 한다.

Kronos를 재현하거나 변형할 때 가장 먼저 확인할 것은 model code가 아니라 data schema다. 어떤 asset, 어떤 frequency, 어떤 corporate action, 어떤 liquidity condition을 하나의 sequence로 볼 것인지가 성능을 크게 좌우할 가능성이 높다.

5. Evaluation

5-1. Main results

논문은 5개 대표 task로 Kronos를 평가한다.

Task	Metric	Purpose
Price series forecasting	IC, RankIC	미래 OHLC price series 품질
Return forecasting	IC, RankIC	cross-sectional return signal 품질
Realized volatility forecasting	MAE, R2	volatility estimation 품질
Synthetic K-line generation	discriminative score, TSTR IC, TSTR RankIC	synthetic sequence realism and usefulness
Investment simulation	AER, IR	prediction signal이 backtest에서 유용한지 확인

Baseline은 25개 모델로 구성된다. 범주는 non-pre-trained full-shot time-series model, zero-shot TSFM, econometric volatility model, generative time-series model을 포함한다.

저자들이 보고한 headline result는 다음과 같다.

price series forecasting에서 Kronos는 leading TSFM 대비 RankIC 93 percent 개선을 보고한다.
같은 task에서 best non-pre-trained baseline 대비 RankIC 87 percent 개선을 보고한다.
realized volatility forecasting에서 9 percent lower MAE를 보고한다.
synthetic K-line generation fidelity에서 22 percent improvement를 보고한다.
investment simulation에서는 AER와 IR 기준으로 baseline 대비 가장 좋은 결과를 보고한다.

이 결과는 Kronos의 주장과 잘 맞는다. 즉 finance-specific corpus와 discrete tokenization이 단순 general TSFM보다 금융 K-line task에 더 적합할 수 있다는 것이다.

5-2. What really matters in the experiments

가장 중요한 실험은 최종 score보다 ablation이다. Table 2는 model paradigm을 비교한다.

Model	Prediction space	Objective	Price RankIC	Return RankIC	Volatility MAE
Direct-AR	Continuous	MSE	0.0149	0.0399	0.0565
Prob-AR	Continuous	NLL	0.0102	0.0329	0.0464
Kronos-Parallel	Discrete	Cross-Entropy	0.0226	0.0505	0.0461
Kronos	Discrete	Cross-Entropy	0.0254	0.0622	0.0384

이 표에서 중요한 해석은 두 가지다.

continuous prediction보다 discrete prediction이 더 강하게 나온다.
discrete prediction 중에서도 coarse와 fine을 parallel로 맞히는 것보다, coarse를 먼저 예측하고 fine을 조건부로 예측하는 방식이 더 좋다.

즉 논문의 핵심은 단순히 tokenization을 썼다는 데 있지 않다. sequential subtoken dependency를 모델링한 것이 성능 차이에 기여한다.

또 하나 중요한 점은 test-time scaling이다. Kronos는 여러 trajectory를 sampling하고 평균내는 방식으로 forecasting stability를 높인다. 이건 LLM에서 self-consistency나 multiple sample decoding을 쓰는 것과 비슷한 감각이다. 다만 금융 예측에서는 sample diversity가 너무 커지면 signal이 흐려질 수 있고, 너무 작으면 uncertainty를 충분히 반영하지 못한다. 그래서 temperature, top-p, sample count는 단순 inference detail이 아니라 model behavior를 정하는 핵심 knob다.

마지막으로 investment simulation 결과는 조심해서 읽어야 한다. Backtest에서 AER와 IR이 좋다는 것은 model signal이 유용할 가능성을 보여주지만, 이것이 곧 실거래 성과를 보장한다는 뜻은 아니다. Transaction cost, slippage, liquidity, universe selection, short-sale constraint, market impact, regime change를 별도로 봐야 한다.

6. Limitations

Financial backtest result는 투자 성과 보장이 아니다. 논문은 investment simulation을 포함하지만, 실제 trading에는 slippage, fee, liquidity, capacity, execution delay, risk management가 추가된다.
Pretraining corpus 재현성이 쉽지 않다. 12B+ K-line record와 cleaning pipeline이 핵심인데, 동일한 global market data와 preprocessing을 완전히 재현하기 어렵다.
Temporal split은 강하지만 regime coverage는 별도 문제다. 논문은 training data를 June 2024까지 두고 July 2024 이후를 test로 사용한다. 그러나 특정 기간의 market regime이 future regime을 대표한다고 보장할 수는 없다.
Context length 512 token 제한이 있다. 여러 frequency를 쓰면 horizon을 조절할 수 있지만, 긴 regime memory와 cross-asset context를 직접 넣는 구조는 아니다.
Tokenizer가 tail event를 얼마나 보존하는지 계속 확인해야 한다. BSQ와 coarse-to-fine token이 heavy-tail pattern에 유리하다는 설명은 설득력 있지만, rare crisis event나 market microstructure shock이 충분히 보존되는지는 사용 domain별 검증이 필요하다.
OHLCVA 중심 표현은 order-book과 news signal을 직접 포함하지 않는다. K-line은 보편적이고 접근성이 좋지만, high-frequency execution이나 event-driven trading에는 order flow, news, fundamentals, macro data가 중요할 수 있다.
Model output calibration이 별도로 중요하다. Autoregressive sampling은 uncertainty-like behavior를 만들 수 있지만, 이것이 calibrated forecast distribution이라는 뜻은 아니다.

7. My Take

7-1. Why this matters for my work

이 논문은 time-series foundation model을 볼 때, backbone보다 tokenizer와 data domain이 먼저라는 점을 잘 보여준다.
특히 금융 데이터에서는 generic TSFM의 broad pretraining보다, finance-specific corpus와 market-aware discretization이 더 중요할 수 있다.
Kronos는 numerical time series를 language modeling으로 옮길 때 어떤 설계가 필요한지 좋은 사례다.
단순히 next value를 예측하는 model이 아니라, future trajectory를 생성하고 이를 volatility, return, synthetic data, backtest signal로 재사용하는 구조가 실무적으로 흥미롭다.

7-2. Reuse potential

재사용하고 싶은 포인트는 다섯 가지다.

Domain-specific tokenizer first
- raw numerical series를 바로 Transformer에 넣기 전에, domain-specific discrete state를 먼저 정의한다.
Coarse-to-fine factorization
- 큰 vocabulary를 한 번에 예측하지 않고, coarse token과 fine token을 순차적으로 예측한다.
Data cleaning as model design
- missing value, illiquid period, stagnant price, structural break를 model input contract의 일부로 본다.
Sampling-based forecasting
- single forecast보다 multiple rollout을 만들고, task별로 aggregation을 다르게 둔다.
Unified predictive and generative evaluation
- forecasting score만 보지 않고 synthetic generation과 downstream simulation까지 함께 본다.

이 아이디어는 금융 외에도 적용 가능하다. 예를 들어 industrial sensor data, energy demand, traffic flow, medical monitoring처럼 continuous time series가 domain-specific state transition을 갖는 곳에서는 coarse-to-fine discrete tokenization을 검토해볼 만하다.

7-3. Follow-up papers

A decoder-only foundation model for time-series forecasting
Chronos: Learning the Language of Time Series
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
PLUTUS: A Well Pre-trained Large Unified Transformer can Unveil Financial Time Series Regularities
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
PatchTST: A Time Series is Worth 64 Words
MarketGPT: Developing a Pre-trained Transformer for Modeling Financial Time Series
Qlib: An AI-oriented Quantitative Investment Platform

8. Summary

Kronos는 financial K-line을 OHLCVA continuous sequence가 아니라 discrete market language로 재정의한다.
핵심 구조는 BSQ 기반 K-line tokenizer와 coarse-to-fine subtoken autoregressive modeling이다.
12B+ K-line record, 45 global exchange, 7 frequency 기반 pretraining으로 price forecasting, return forecasting, volatility forecasting, synthetic generation, investment simulation을 함께 평가한다.
Ablation 기준으로 discrete prediction과 sequential subtoken modeling이 continuous regression 및 parallel subtoken prediction보다 강하게 나온다.
다만 backtest를 실거래 성과로 해석하면 안 되고, data reproducibility, regime shift, calibration, context length, non-K-line signal 부재를 반드시 같이 봐야 한다.

Twitter Facebook LinkedIn