Good from Afar, But Far from Good: AI Prototyping in Real Design Contexts
Briefly

Good from Afar, But Far from Good: AI Prototyping in Real Design Contexts
"Over the past few months, the UX design field has been flooded with AI-powered prototyping tools that generate interfaces instantly from natural-language prompts. Despite the massive marketing hype, our evaluation with real design scenarios revealed that these tools can follow instructions to achieve a general goal, but they lack the sophistication to weigh design tradeoffs and produce thoughtful, high-quality designs without extensive guidance from humans."
"To understand whether AI prototyping tools can craft thoughtful, context-aware designs comparable to those created by human designers, we conducted an evaluation using an actual project - redesigning the profile page for individuals registered for NN/G's live online training. This evaluation focused on three categories of AI tools: AI-assisted design tools that focus on generating static wireframes or design mockups AI-assisted (vibe) coding tools that generate interactive code-based prototypes General-purpose AI chatbots capable of generating prototypes"
"To mirror different stages of the real-world design process, we wrote prompts providing various levels of context: Prompt 1 - broad text prompt: General context and page goals, with minimal detail Prompt 2 - detailed text prompt: General context, page goals, and explicit outlining of components, design language, and interaction states Prompt 3 - text plus design artifacts: Supporting visuals at varying fidelity levels, including a photo of a hand sketch, the image of the design mockup, and a link to a Figma design frame"
AI prototyping tools can follow instructions to meet general interface goals but struggle to make nuanced design tradeoffs and require extensive human guidance for high-quality outcomes. An empirical evaluation used a real NN/G profile-page redesign to compare AI-assisted static design tools, AI-assisted code-generation tools, and general chatbots. The evaluation tested three prompt levels: broad text, detailed text, and text plus design artifacts including sketches and Figma frames. Heuristic assessments compared AI outputs with NN/G designer work. Results show that more specific prompts and supporting artifacts improve outputs, but tools still fall short of producing context-aware, thoughtful designs without human iteration.
Read at Nielsen Norman Group
Unable to calculate read time
[
|
]