Have been interested in continued pretraining lately, so did a test on Llama 3.1 8B base: mix of dclm-baseline, starcoder python, and finetome-100k instructions prepared in both chatml and llama format, 64k ctx len. Need to run some real evals but can't believe how good of vibes this gives after just 650 steps of peft training on a data mix like this. https://huggingface.co/ericflo/Llama-3.1-8B-ContinuedTraining

串文

2024-09-08 21:54

讚

回覆

轉發

作者

Eric Florenzano
ericflo

粉絲

串文

102+

讚

回覆

轉發

24小時粉絲增長

無資料

互動率

(讚 + 回覆 + 轉發) / 粉絲數

NaN%

回覆 (BETA)

最先回覆的內容
發文後	用戶	內容
一小時內	Paul Nelson paullovesdominoes	What is continued pretraining?