
"AI image generators have historically struggled to spell because they generally used diffusion models, which work by reconstructing images from noise. 'The diffusion models [...] are reconstructing a given input,' Asmelash Teka Hadgu, founder and CEO of Lesan AI, told TechCrunch in 2024."
"Researchers have since explored other mechanisms for image generation, like autoregressive models, which make predictions about what an image should look like and function more like an LLM."
"OpenAI explained that the new model has 'thinking capabilities,' which give it the ability to search the web, make multiple images from one prompt, and double-check its creations."
"The model's knowledge cuts off in December, and it has a stronger understanding of non-Latin text rendering in languages like Japanese, Korean, Hindi, and Bengali."
AI image generation has evolved rapidly, with models like ChatGPT Images 2.0 producing realistic outputs that can be used in real-world applications. Previously, AI struggled with spelling and coherence, but advancements in autoregressive models have enhanced capabilities. The new model features 'thinking capabilities' for web searching and creating diverse marketing assets. It also shows improved understanding of non-Latin text rendering, although details about the underlying technology remain undisclosed.
Read at TechCrunch
Unable to calculate read time
Collection
[
|
...
]