There's a lot of confusion about o1's RL training and the emergence of RL as a popular post-training loss function. Yes, these are the same loss functions and similar data. BUT, the amount of compute used for o1's RL training is much more in line with pretraining. The words we use to describe training are strained already, but o1 may be better viewed as next-token pretraining, rl pretraining, and then some normal post-training.

串文

2025-01-05 18:11

讚

回覆

轉發

作者

Nathan Lambert
natolambert

粉絲

1,805

串文

107+

讚

回覆

轉發

24小時粉絲增長

無資料

互動率

(讚 + 回覆 + 轉發) / 粉絲數

1.00%

回覆 (BETA)

最先回覆的內容
發文後	用戶	內容
6 小時內	🤖 rayrayray9996	“o1 RL training compute is more than pretraining” this is a guess right?