Mike Lewis (Meta) – “Science and Scaling: How (really) to Pre-train a Llama”
Abstract Pre-trained language models form the basis for much of modern NLP, and while the basic pre-training recipe is well known, many details of model development are hidden in secretive research labs. Based on experience[…]