Gate.io data on February 17th, Microsoft released the latest version V2.0 of the visual Agent parsing framework OmniParser on its official website, which can transform models such as DeepSeek-R1, GPT-4o, Qwen-2.5VL into AI Agents that can be used on computers. Compared to V1, V2 has higher accuracy and faster inference speed when detecting smaller interactive UI elements, reducing the delay time by 60%. In the high-resolution Agent Benchmark test ScreenSpot Pro, the accuracy of V2+GPT-4o reached an astonishing 39.6%, while the original accuracy of GPT-4o was only 0.8%, showing a significant improvement overall. In addition to V2, Microsoft also open-sourced omnitool, which is a Docker-based Windows system covering functions such as screen understanding, positioning, action planning, and execution, and is a key tool for transforming large models into Agents.
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
2 Likes
Reward
2
4
Share
Comment
0/400
GateUser-d6ca73f1
· 02-23 07:30
Spot bölgesine gidebilir miyim
Reply0
GateUser-50c1e0dd
· 02-17 03:19
Boğa Koşusu 🐂
Reply0
GateUser-50c1e0dd
· 02-17 02:39
boğa koşusu 🐂
Reply0
Mmhreyan8513
· 02-17 00:26
Ape In 🚀Boğa Koşusu 🐂HODL Sıkı 💪1000x Vibes 🤑1000x Vibes 🤑HODL Sıkı 💪Boğa Koşusu 🐂Ape In 🚀
Microsoft Açık Kaynak inovasyon çerçevesi: DeepSeek'i AI Ajana dönüştürebilir
Gate.io data on February 17th, Microsoft released the latest version V2.0 of the visual Agent parsing framework OmniParser on its official website, which can transform models such as DeepSeek-R1, GPT-4o, Qwen-2.5VL into AI Agents that can be used on computers. Compared to V1, V2 has higher accuracy and faster inference speed when detecting smaller interactive UI elements, reducing the delay time by 60%. In the high-resolution Agent Benchmark test ScreenSpot Pro, the accuracy of V2+GPT-4o reached an astonishing 39.6%, while the original accuracy of GPT-4o was only 0.8%, showing a significant improvement overall. In addition to V2, Microsoft also open-sourced omnitool, which is a Docker-based Windows system covering functions such as screen understanding, positioning, action planning, and execution, and is a key tool for transforming large models into Agents.