Mistral AI's Open-Source Mixtral 8x7B Outperforms GPT-3.5
Briefly

Mistral AI continues its mission to deliver the best open models to the developer community. Moving forward in AI requires taking new technological turns beyond reusing well-known architectures and training paradigms. Most importantly, it requires making the community benefit from original models to foster new inventions and usages.
The key idea of MoE models is to replace the feed-forward layers of the Transformer block with a combination of a router plus a set of expert layers. During inference, the router in a Transformer block selects a subset of the experts to activate. In the Mixtral model, the output for that block is computed by applying the softmax function to the top two experts.
Read at InfoQ
[
]
[
|
]