2025-01-05 18:11
There's a lot of confusion about o1's RL training and the emergence of RL as a popular post-training loss function. Yes, these are the same loss functions and similar data. BUT, the amount of compute used for o1's RL training is much more in line with pretraining. The words we use to describe training are strained already, but o1 may be better viewed as next-token pretraining, rl pretraining, and then some normal post-training.
17
回覆
1
轉發

作者

Nathan Lambert
natolambert
粉絲
1,805
串文
107+

回覆

轉發

24小時粉絲增長

無資料

互動率

(讚 + 回覆 + 轉發) / 粉絲數
1.00%

回覆 (BETA)

© 2025 Threadser.net. 版權所有。

Threadser.net 與 Meta Platforms, Inc. 無關,未經其認可、贊助或特別批准。

Threadser.net 也不與 Meta 的"Threads" 產品存在任何關聯。