Danny Goodwin provided a detailed summary of Apple's 'Preference Ranking Guidelines,' a 170-page PDF that outlines how human reviewers evaluate digital assistant responses. This document, similar to Google's Quality Raters guidelines, details the criteria for scoring responses based on truthfulness, harmfulness, conciseness, and overall user satisfaction. The guidelines aim to ensure that AI-generated replies are not only factual but also helpful and user-friendly. Version 3.3 was released in January 2025, with earlier versions available, reflecting ongoing developments in AI response evaluation.
It lays out the system used by human reviewers to score digital assistant replies. Responses are judged on categories such as truthfulness, harmfulness, conciseness, and overall user satisfaction.
The process isn't just about checking facts. It's designed to ensure AI-generated responses are helpful, safe, and feel natural to users.
Collection
[
|
...
]