Zephyr: Direct Distillation of LM Alignment: Experimental Details

from Hackernoon 9 months ago

We conduct all of our fine-tuning experiments using Mistral 7B (Jiang et al., 2023), which is the current state-of-the-art base LM at the 7B parameter scale, and matches the performance of much larger models like LLaMa 34B on many NLP benchmarks.
Hackernoonhttps://hackernoon.com/zephyr-direct-distillation-of-lm-alignment-experimental-details

We use the Transformer Reinforcement Learning (TRL) library for fine-tuning, in conjunction with DeepSpeed ZeRO3 and FlashAttention-2 to optimize memory and improve training speed.
Hackernoonhttps://hackernoon.com/zephyr-direct-distillation-of-lm-alignment-experimental-details

All models are trained with the AdamW optimizer and no weight decay. We did not experiment with parameter-efficient techniques such as LoRA, but expect similar results to hold with these methods.
Hackernoonhttps://hackernoon.com/zephyr-direct-distillation-of-lm-alignment-experimental-details

All experiments were run on 16 A100s using bfloat16 precision and typically took 2-4 hours to complete.
Hackernoonhttps://hackernoon.com/zephyr-direct-distillation-of-lm-alignment-experimental-details

Read at Hackernoon

#fine-tuning-experiments #nlp-models #optimization-techniques

[

Collection

]

[

...

]

Zephyr: Direct Distillation of LM Alignment: Experimental Details | HackerNoonZephyr: Direct Distillation of LM Alignment: Experimental Details | HackerNoon Briefly

Zephyr: Direct Distillation of LM Alignment: Experimental Details | HackerNoon
Zephyr: Direct Distillation of LM Alignment: Experimental Details | HackerNoon
Briefly