Microsoft Researchers Say New AI Model Can 'See' Your Phone Screen

from Hackernoon 9 months ago

MM-Navigator demonstrates that large multimodal models excel in zero-shot GUI navigation through advanced screen interpretation and precise action localization capabilities.
Hackernoonhttps://hackernoon.com/microsoft-researchers-say-new-ai-model-can-see-your-phone-screen

The system exhibited a 91% accuracy rate in generating reasonable action descriptions and a 75% accuracy rate in executing correct actions for single-step instructions.
Hackernoonhttps://hackernoon.com/microsoft-researchers-say-new-ai-model-can-see-your-phone-screen

Our findings highlight the significant improvements of MM-Navigator over previous GUI navigators, establishing a foundation for future research into the GUI navigation task.
Hackernoonhttps://hackernoon.com/microsoft-researchers-say-new-ai-model-can-see-your-phone-screen

This work underscores the potential of GPT-4V to effectively interpret smartphone GUIs and fulfill user instructions accurately, improving user interactions with devices.
Hackernoonhttps://hackernoon.com/microsoft-researchers-say-new-ai-model-can-see-your-phone-screen

Read at Hackernoon

#ai-navigation #smartphone-gui #multimodal-models #generative-ai #machine-learning

Collection

[

...

]

Microsoft Researchers Say New AI Model Can 'See' Your Phone Screen | HackerNoonMicrosoft Researchers Say New AI Model Can 'See' Your Phone Screen | HackerNoon Briefly

Microsoft Researchers Say New AI Model Can 'See' Your Phone Screen | HackerNoon
Microsoft Researchers Say New AI Model Can 'See' Your Phone Screen | HackerNoon
Briefly