Microsoft's Phi-4-multimodal AI model handles speech, text, and videoMicrosoft's new small language model aids developers in creating multimodal AI applications for lightweight devices.
Meta says its latest AI models answer more 'contentious' questions than the last versionLlama 4 addresses more contentious topics with improved balance and lower refusal rates than its predecessor.
Last month in AI - March 2025AI models are rapidly evolving with enhancements in multimodal capabilities, context sizes, and language support, reflecting the industry's pace of innovation.
Viral AI company DeepSeek releases new image model family | TechCrunchDeepSeek's Janus Pro AI models claim to outperform OpenAI's DALL-E 3 on key evaluation benchmarks.
Mistral launches a free tier for developers to test its AI models | TechCrunchMistral AI launched a free tier for developers to experiment with its AI models, aiming to attract more users and reduce costs.
Astra Is Google's Answer to the New ChatGPTGoogle and OpenAI demonstrate impressive advancements in multimodal AI models according to MIT assistant professor Pulkit Agrawal.
Microsoft's Phi-4-multimodal AI model handles speech, text, and videoMicrosoft's new small language model aids developers in creating multimodal AI applications for lightweight devices.
Meta says its latest AI models answer more 'contentious' questions than the last versionLlama 4 addresses more contentious topics with improved balance and lower refusal rates than its predecessor.
Last month in AI - March 2025AI models are rapidly evolving with enhancements in multimodal capabilities, context sizes, and language support, reflecting the industry's pace of innovation.
Viral AI company DeepSeek releases new image model family | TechCrunchDeepSeek's Janus Pro AI models claim to outperform OpenAI's DALL-E 3 on key evaluation benchmarks.
Mistral launches a free tier for developers to test its AI models | TechCrunchMistral AI launched a free tier for developers to experiment with its AI models, aiming to attract more users and reduce costs.
Astra Is Google's Answer to the New ChatGPTGoogle and OpenAI demonstrate impressive advancements in multimodal AI models according to MIT assistant professor Pulkit Agrawal.
Google's AI Mode can now see and search with imagesGoogle's AI Mode now includes multimodal capabilities, enabling it to analyze images and provide detailed responses about their contents.
Google AI Mode rolls out to more testers with new image search featureGoogle's AI Mode is expanding in the US, bringing advanced search capabilities to more users.
Google jumps on the agentic AI bandwagonGoogle's Gemini 2.0 introduces advanced agentic AI, moving beyond basic functions to provide more utility and enhanced user interaction.
Google goes "agentic" with Gemini 2.0's ambitious AI agent featuresGoogle unveiled Gemini 2.0, a multimodal AI model capable of generating text, images, and speech with improved performance and features for developers.
Project Astra is the future of AI at GoogleNext-gen bots like Google's Project Astra aimed to be truly useful assistants.
Google's AI Mode can now see and search with imagesGoogle's AI Mode now includes multimodal capabilities, enabling it to analyze images and provide detailed responses about their contents.
Google AI Mode rolls out to more testers with new image search featureGoogle's AI Mode is expanding in the US, bringing advanced search capabilities to more users.
Google jumps on the agentic AI bandwagonGoogle's Gemini 2.0 introduces advanced agentic AI, moving beyond basic functions to provide more utility and enhanced user interaction.
Google goes "agentic" with Gemini 2.0's ambitious AI agent featuresGoogle unveiled Gemini 2.0, a multimodal AI model capable of generating text, images, and speech with improved performance and features for developers.
Project Astra is the future of AI at GoogleNext-gen bots like Google's Project Astra aimed to be truly useful assistants.
Microsoft's new AI agent can control software and robotsMicrosoft introduced Magma, a groundbreaking AI model that combines visual and language processing for enhanced control of software interfaces and robotic systems.
Search engine Baidu launches two new AI modelsBaidu launched AI models ERNIE X1 and ERNIE 4.5, emphasizing performance and cost-effectiveness in the AI race.
Microsoft's new AI agent can control software and robotsMicrosoft introduced Magma, a groundbreaking AI model that combines visual and language processing for enhanced control of software interfaces and robotic systems.
Search engine Baidu launches two new AI modelsBaidu launched AI models ERNIE X1 and ERNIE 4.5, emphasizing performance and cost-effectiveness in the AI race.
'Death Of Perplexity AI?' Google's New AI Overview Triggers The Internet; Co-Founder Is HumoredGoogle introduced AI Mode in its search engine, enhancing user queries with advanced AI capabilities.
I tested the new Copilot Voice, Microsoft's AI voice assistant. You can, too - for freeMicrosoft's Copilot Voice enhances AI conversations with emotional understanding and free access, making multimodal AI assistants more accessible and interactive.
The Ray-Ban Meta Smart Glasses have multimodal AI nowSmart glasses are evolving with features like multimodal AI, enhancing user experiences.
'Death Of Perplexity AI?' Google's New AI Overview Triggers The Internet; Co-Founder Is HumoredGoogle introduced AI Mode in its search engine, enhancing user queries with advanced AI capabilities.
I tested the new Copilot Voice, Microsoft's AI voice assistant. You can, too - for freeMicrosoft's Copilot Voice enhances AI conversations with emotional understanding and free access, making multimodal AI assistants more accessible and interactive.
The Ray-Ban Meta Smart Glasses have multimodal AI nowSmart glasses are evolving with features like multimodal AI, enhancing user experiences.
OpenAI unveils newest AI model, GPT-4oGPT-4o will enhance ChatGPT with memory capabilities, real-time translation, and text-vision interaction, simplifying accessibility for all users.
5 Useful Datasets for Training Multimodal AI ModelsMultimodal datasets are essential for training versatile AI models, improving their performance and understanding across various data types.
DeepSeek Dropped Another Open-Source AI Model, Janus ProDeepSeek's Janus-Pro improves multimodal understanding and text-to-image generation.
OpenAI unveils newest AI model, GPT-4oGPT-4o will enhance ChatGPT with memory capabilities, real-time translation, and text-vision interaction, simplifying accessibility for all users.
5 Useful Datasets for Training Multimodal AI ModelsMultimodal datasets are essential for training versatile AI models, improving their performance and understanding across various data types.
DeepSeek Dropped Another Open-Source AI Model, Janus ProDeepSeek's Janus-Pro improves multimodal understanding and text-to-image generation.
Alibaba introduces new AI models and tools at developer summitAlibaba Cloud introduces enhanced AI offerings to support global developers and industries.The Qwen AI model suite expansion includes new LLMs and multimodal capabilities.
The Most Capable Open Source AI Model Yet Could Supercharge AI AgentsMolmo, an open source multimodal AI model, enhances accessibility for developers to create advanced AI agents that can perform useful tasks on computers.
The Future of GPT4All | HackerNoonGPT4All aims to be the primary solution for accessible language models, enhancing distribution and compatibility across different hardware platforms.
The Most Capable Open Source AI Model Yet Could Supercharge AI AgentsMolmo, an open source multimodal AI model, enhances accessibility for developers to create advanced AI agents that can perform useful tasks on computers.
The Future of GPT4All | HackerNoonGPT4All aims to be the primary solution for accessible language models, enhancing distribution and compatibility across different hardware platforms.
Amazon is hedging its big bet on Anthropic with its own AI video model, report saysAmazon's development of Olympus suggests a strategic dual approach in AI, combining in-house capabilities with partnerships.
Amazon Nova models can process text, photos and videosAmazon's Nova multimodal models enhance adaptability for enterprises by handling both textual and visual inputs.
Amazon is hedging its big bet on Anthropic with its own AI video model, report saysAmazon's development of Olympus suggests a strategic dual approach in AI, combining in-house capabilities with partnerships.
Amazon Nova models can process text, photos and videosAmazon's Nova multimodal models enhance adaptability for enterprises by handling both textual and visual inputs.
OpenAI Poaches 3 Top Engineers From DeepMindOpenAI is bolstering its multimodal AI efforts by hiring engineers from Google DeepMind, reflecting intense competition for AI talent in the industry.
OpenAI debuts mini version of its most powerful model yetOpenAI launches a cost-efficient mini-model GPT-4o to increase accessibility amid rising competition.
OpenAI could debut a multimodal AI digital assistant soonOpenAI is developing a new multimodal AI model with improved image and audio interpretation capabilities.
OpenAI Poaches 3 Top Engineers From DeepMindOpenAI is bolstering its multimodal AI efforts by hiring engineers from Google DeepMind, reflecting intense competition for AI talent in the industry.
OpenAI debuts mini version of its most powerful model yetOpenAI launches a cost-efficient mini-model GPT-4o to increase accessibility amid rising competition.
OpenAI could debut a multimodal AI digital assistant soonOpenAI is developing a new multimodal AI model with improved image and audio interpretation capabilities.
Hugging Face model SmolVLM requires a lot less computeSmolVLM is an efficient multimodal model that significantly reduces GPU requirements, making it suitable for various applications and more cost-effective for organizations.
DreamLLM: Additional Related Works to Look Out For | HackerNoonLLMs are fundamentally transforming the landscape of Natural Language Processing with advancements in model size and training techniques.
Building a Flexible Framework for Multimodal Data Input in Large Language Models | HackerNoonMultimodal AI enhances capabilities by integrating various data types, yet creating these systems presents technical challenges and complexities.
Building complex gen AI models? This data platform wants to be your one-stop shopEncord expands its multimodal AI data platform by adding audio and document annotation capabilities, elevating its service to AI teams.
Meta confirms it may train its AI on any image you ask Ray-Ban Meta AI to analyze | TechCrunchMeta may use images shared with AI to train models, raising privacy concerns for Ray-Ban Meta users who may not understand data usage.
Ray-Ban Meta Smart Glasses Can Now Tell You What You're Looking AtRay-Ban Meta smart glasses now offer multimodal AI capabilities, including object identification and voice command controls.
Meta confirms it may train its AI on any image you ask Ray-Ban Meta AI to analyze | TechCrunchMeta may use images shared with AI to train models, raising privacy concerns for Ray-Ban Meta users who may not understand data usage.
Ray-Ban Meta Smart Glasses Can Now Tell You What You're Looking AtRay-Ban Meta smart glasses now offer multimodal AI capabilities, including object identification and voice command controls.
Meta AI can now understand and edit your photos | TechCrunchMeta AI is enhancing photo editing and interaction capabilities, competing closely with Google and OpenAI.Multimodal capabilities allow for photo sharing and inquiry-based interactions, enhancing user experience.AI can edit images contextually, creating a dynamic way for users to interact with their photos.
Meta AI Unveils First Two Versions of Llama 3 | EntrepreneurMeta released Llama 3 models, enhancing Meta AI's capabilities to be more intelligent and diverse.
Meta AI can now understand and edit your photos | TechCrunchMeta AI is enhancing photo editing and interaction capabilities, competing closely with Google and OpenAI.Multimodal capabilities allow for photo sharing and inquiry-based interactions, enhancing user experience.AI can edit images contextually, creating a dynamic way for users to interact with their photos.
Meta AI Unveils First Two Versions of Llama 3 | EntrepreneurMeta released Llama 3 models, enhancing Meta AI's capabilities to be more intelligent and diverse.
The Future of Generative AI (2024): 8 Predictions to WatchGenerative AI is rapidly becoming integral across industries, evolving with new applications, while posing challenges around job displacement and the need for workforce adaptation.
Gartner: 40% of generative AI solutions to be multimodal by 2027Gartner predicts that 40% of generative AI solutions will be multimodal by 2027, significantly increasing from just 1% in 2023.
The Future of Generative AI (2024): 8 Predictions to WatchGenerative AI is rapidly becoming integral across industries, evolving with new applications, while posing challenges around job displacement and the need for workforce adaptation.
Gartner: 40% of generative AI solutions to be multimodal by 2027Gartner predicts that 40% of generative AI solutions will be multimodal by 2027, significantly increasing from just 1% in 2023.
Meta just stuck its AI somewhere you didn't expect it - a pair of Ray-Ban smart glassesAI integration could revolutionize smart glasses technology.
The AI glasses market comes into focusAI glasses market is diversifying with varying features and price points, emphasizing either innovation or affordability.
Meta just stuck its AI somewhere you didn't expect it - a pair of Ray-Ban smart glassesAI integration could revolutionize smart glasses technology.
The AI glasses market comes into focusAI glasses market is diversifying with varying features and price points, emphasizing either innovation or affordability.
Google Gemini Nano vs Apple Intelligence: Which AI Assistant is Better? - Yanko DesignComparing AI upgrades of Google Pixel 9 and Apple's iPhone 16 for smartphone choices.
Recap of Google I/O 2024: Gemini 1.5, Project Astra, AI-powered Search EngineGoogle introduced Gemini 1.5 Pro AI model with a 2 million token window, highlighting advancements in AI capabilities.
RealReports enhances property document analysis with new multimodal AI featureRealReports unveiled a new feature for its AI assistant Aiden, using multimodal AI to summarize property documents quickly and efficiently.
Ray-Ban Meta smart glasses do the AI thing without a projector or subscriptionThe Ray-Ban Meta smart glasses now feature multimodal AI, enhancing their functionality and interaction with users.
RealReports enhances property document analysis with new multimodal AI featureRealReports unveiled a new feature for its AI assistant Aiden, using multimodal AI to summarize property documents quickly and efficiently.
Ray-Ban Meta smart glasses do the AI thing without a projector or subscriptionThe Ray-Ban Meta smart glasses now feature multimodal AI, enhancing their functionality and interaction with users.
Google's medical AI destroys GPT's benchmark and outperforms doctorsAI models like Google's Med-Gemini are advancing to process diverse medical information, approaching real-world doctor capabilities.