Apple Researchers Detail Method to Combine Different LLMs to Achieve State-of-the-Art PerformanceMultimodal LLMs combine language and vision models for improved text generation.Design choices for MLLMs include model architecture and pre-training data approaches.