Microsoft Researchers Say New AI Model Can 'See' Your Phone Screen | HackerNoon
Briefly

MM-Navigator demonstrates that large multimodal models excel in zero-shot GUI navigation through advanced screen interpretation and precise action localization capabilities.
The system exhibited a 91% accuracy rate in generating reasonable action descriptions and a 75% accuracy rate in executing correct actions for single-step instructions.
Our findings highlight the significant improvements of MM-Navigator over previous GUI navigators, establishing a foundation for future research into the GUI navigation task.
This work underscores the potential of GPT-4V to effectively interpret smartphone GUIs and fulfill user instructions accurately, improving user interactions with devices.
Read at Hackernoon
[
|
]