2024-09-08 21:54
Have been interested in continued pretraining lately, so did a test on Llama 3.1 8B base: mix of dclm-baseline, starcoder python, and finetome-100k instructions prepared in both chatml and llama format, 64k ctx len. Need to run some real evals but can't believe how good of vibes this gives after just 650 steps of peft training on a data mix like this.
https://huggingface.co/ericflo/Llama-3.1-8B-ContinuedTraining