Zephyr: Direct Distillation of LM Alignment: Experimental Details | HackerNoon
Briefly

We conduct all of our fine-tuning experiments using Mistral 7B (Jiang et al., 2023), which is the current state-of-the-art base LM at the 7B parameter scale, and matches the performance of much larger models like LLaMa 34B on many NLP benchmarks.
We use the Transformer Reinforcement Learning (TRL) library for fine-tuning, in conjunction with DeepSpeed ZeRO3 and FlashAttention-2 to optimize memory and improve training speed.
All models are trained with the AdamW optimizer and no weight decay. We did not experiment with parameter-efficient techniques such as LoRA, but expect similar results to hold with these methods.
All experiments were run on 16 A100s using bfloat16 precision and typically took 2-4 hours to complete.
Read at Hackernoon
[
]
[
|
]