[The] pre-training data set consists of ... data we have licensed from publishers, curated publicly available or open-sourced datasets...no private Apple user data is included in the data mixture.
The technical paper highlights the responsible sourcing of training data for Apple Foundation Models, including web data, licensed data, and data from undisclosed publishers.
Collection
[
|
...
]