2024-09-08 21:54
Have been interested in continued pretraining lately, so did a test on Llama 3.1 8B base: mix of dclm-baseline, starcoder python, and finetome-100k instructions prepared in both chatml and llama format, 64k ctx len. Need to run some real evals but can't believe how good of vibes this gives after just 650 steps of peft training on a data mix like this. https://huggingface.co/ericflo/Llama-3.1-8B-ContinuedTraining
回覆
轉發

回覆

轉發

24小時粉絲增長

無資料

互動率

(讚 + 回覆 + 轉發) / 粉絲數
NaN%

回覆 (BETA)

最先回覆的內容
發文後用戶內容
一小時內
Paul Nelson
paullovesdominoes
What is continued pretraining?

© 2025 Threadser.net. 版權所有。

Threadser.net 與 Meta Platforms, Inc. 無關,未經其認可、贊助或特別批准。

Threadser.net 也不與 Meta 的"Threads" 產品存在任何關聯。