Meta announced five major new AI models and research, including a multimodal system that can process both text and images, next-generation language models, music generation, AI speech detection, and efforts to improve diversity in AI systems.
These releases come from Meta's Fundamental AI Research (FAIR) team, which has been dedicated to advancing AI through open research and collaboration for over a decade. As AI innovates rapidly, Meta believes it is critical to engage with the global community.
“By making this research public, we hope to encourage iterative research and ultimately contribute to advancing AI in a responsible manner,” Mehta said.
Chameleon: Multimodal Text and Image Processing
Included in the release is key components of Meta's “Chameleon” model under a research license. Chameleon is a family of multimodal models that can understand and generate both text and images simultaneously, unlike most large-scale language models, which are typically unimodal.
“Just as humans can process words and images simultaneously, Chameleon can process and deliver images and text simultaneously,” Meta explains. “Chameleon can take any combination of text and images as input and output any combination of text and images.”
From generating creative captions to directing new scenes with text and images, the potential use cases are virtually endless.
Multi-token prediction for faster language model training
Meta has also released a pre-trained model for code completion that uses “multi-token prediction” under a non-profit research license. Training traditional language models is inefficient because it only predicts the next word. Multi-token models can predict multiple future words simultaneously and train faster.
“meanwhile [the one-word] “This approach is simple and scalable, but it is also inefficient; it requires orders of magnitude more text than a child would need to achieve the same level of language fluency,” Mehta says.
JASCO: An enhanced text-to-music conversion model
On the creative side, Meta's JASCO allows you to generate music clips from text, as well as accepting inputs like chords and beats, giving you more control.
“While existing text-to-music models like MusicGen rely primarily on textual input for music generation, our new model, JASCO, can accept a variety of inputs, such as chords and beats, giving us more control over the generated musical output,” Meta explains.
AudioSeal: AI-generated voice detection
Meta claims AudioSeal is the first audio watermarking system designed to detect AI-generated speech, and the system can identify specific AI-generated segments within larger audio clips up to 485 times faster than traditional methods.
“AudioSeal is released under a commercial license. This is just one example of the responsible research we've been sharing to prevent the misuse of generative AI tools,” Meta said.
Increased text-to-image versatility
Another important release aims to improve diversity in text-to-image models, which often exhibit geographic and cultural bias.
Meta developed automated metrics to assess potential geographic differences and conducted a large-scale annotation study of over 65,000 annotations to understand how people around the world perceive geographic representations.
“This will increase the diversity and representation of AI-generated images,” Mehta said. To improve the diversity of the overall generative model, the company released the associated code and annotations.
Meta says that by making these groundbreaking models publicly available, it hopes to foster collaboration and drive innovation within the AI community.
(Photo: Dima Solomin)
reference: NVIDIA Announces Latest Advancements in Visual AI
Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California and London – this comprehensive event will take place alongside other major events such as Intelligent Automation Conference, BlockX, Digital Transformation Week and Cyber Security & Cloud Expo.
Find out about upcoming enterprise technology events and webinars hosted by TechForge here.