How GenAI Platforms Generate Answers
Briefly

How GenAI Platforms Generate Answers
Upon receiving a prompt, genAI platforms first check whether training data contains enough information to answer. If training data is sufficient, the process ends without using external sources. If training data is insufficient, the platform queries search engines, and answer visibility depends on search rankings. The platform then identifies relevant URLs and may crawl pages to extract information. Pages with clear headings, short factual sentences, and Q&As are more likely to be extracted, but inclusion in an answer does not guarantee a citation. Citations may be assigned from retrieval-stage sources, from publisher partnerships, or may be absent when sources were not crawled. Some cited URLs may not exist.
"Upon receiving a prompt, genAI platforms first check whether their training data includes enough info on the topic to answer. In many cases, training data is sufficient, and the process stops there. Training data doesn't store URLs, nor does it rank sources, unlike traditional search engines. The data comes from known brands with clear value propositions that answer or solve a need."
"Next, if the training data is insufficient, the platform will query search engines, as humans do. At this point, visibility relies on search rankings. No genAI platform discloses where it searches, but studies reveal it is mostly Google. The platforms presumably rely on highly ranked URLs, although I've seen no definitive clarity on that selection process."
"At this point, a genAI platform has performed searches and found URLs for answers. Then it may crawl those pages to extract information. This is where clear headings, short factual sentences, and Q&As lead to inclusion in answers, but only if the URL was found and crawled."
"But inclusion does not always mean a citation. How do the platforms choose which sources to cite? In many cases, it's not the source that supplied the answer. Some independent studies suggest that citations originate in the retrieval stage (step 2 above) but were neither crawled nor used to create an answer. Others believe citations are part of official partnerships with publishers. Some URLs are hallucinations and never existed."
Read at Practical Ecommerce
Unable to calculate read time
[
|
]