Meta Launches Five AI Models for Multimodal Processing, Music Generation, and More

Meta announced five major new AI models and research, including a multimodal system that can process both text and images, next-generation language models, music generation, AI speech detection, and efforts to improve diversity in AI systems.

These releases come from Meta's Fundamental AI Research (FAIR) team, which has been dedicated to advancing AI through open research and collaboration for over a decade. As AI innovates rapidly, Meta believes it is critical to engage with the global community.

“By making this research public, we hope to encourage iterative research and ultimately contribute to advancing AI in a responsible manner,” Mehta said.

Chameleon: Multimodal Text and Image Processing

Included in the release is key components of Meta's “Chameleon” model under a research license. Chameleon is a family of multimodal models that can simultaneously understand and generate both text and images, unlike most large-scale language models, which are typically unimodal.

“Just as humans can process words and images simultaneously, Chameleon can process and deliver images and text simultaneously,” Meta explains. “Chameleon can take any combination of text and images as input and output any combination of text and images.”

From generating creative captions to directing new scenes with text and images, the potential use cases are virtually endless.

Multi-token prediction for faster language model training

Meta has also released a pre-trained model for code completion that uses “multi-token prediction” under a non-profit research license. Training traditional language models is inefficient because they only predict the next word. Multi-token models can predict multiple future words simultaneously, allowing them to train faster.

“meanwhile [the one-word] “This approach is simple and scalable, but it is also inefficient; it requires orders of magnitude more text than a child would need to achieve the same level of language fluency,” Mehta says.

JASCO: An enhanced text-to-music conversion model

On the creative side, Meta's JASCO allows you to generate music clips from text, as well as accepting inputs like chords and beats, giving you more control.

“While existing text-to-music models like MusicGen rely primarily on textual input for music generation, our new model, JASCO, can accept a variety of inputs, such as chords and beats, giving us more control over the generated musical output,” Meta explains.

AudioSeal: AI-generated voice detection

Meta claims AudioSeal is the first audio watermarking system designed to detect AI-generated speech, and the system can identify specific AI-generated segments within larger audio clips up to 485 times faster than traditional methods.

“AudioSeal is released under a commercial license. This is just one example of the responsible research we've been sharing to prevent the misuse of generative AI tools,” Meta said.

Increased text-to-image versatility

Another important release aims to improve diversity in text-to-image models, which often exhibit geographic and cultural bias.

Meta developed automated metrics to assess potential geographic differences and conducted a large-scale annotation study of over 65,000 annotations to understand how people around the world perceive geographic representations.

“This will increase the diversity and representation of AI-generated images,” Mehta said. To improve the diversity of the overall generative model, the company released the associated code and annotations.

Meta says that by making these groundbreaking models publicly available, it hopes to foster collaboration and drive innovation within the AI community.

(Photo: Dima Solomin)

reference: NVIDIA Announces Latest Advancements in Visual AI

Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California and London – this comprehensive event will take place alongside other major events such as Intelligent Automation Conference, BlockX, Digital Transformation Week and Cyber Security & Cloud Expo.

Find out about upcoming enterprise technology events and webinars hosted by TechForge here.

tag: ai, artificial intelligence, audio seal, chameleon, fair, jasco, meta, meta ai, model, music generation, open source, text to image conversion

Source link

What's Hot

US regulators say Amazon is responsible for dangerous goods sold by third-party sellers

Bangkok Post – Temu disrupts online retailers in Thailand

Egypt's Cartona raises $8.1 million as investors pull out of B2B e-commerce in Africa

Meta Launches Five AI Models for Multimodal Processing, Music Generation, and More

AI News Today – The Dales Report

AI News Today – July 25, 2024

AI-Powered WAF vs. Traditional Firewalls: Protecting Web Applications

Senators Investigate OpenAI's Safety and Employment Practices

AI News Today – The Dales Report

AI News Today – July 25, 2024

AI-Powered WAF vs. Traditional Firewalls: Protecting Web Applications

Senators Investigate OpenAI's Safety and Employment Practices

Institutional investors are actively investing in this cryptocurrency startup

Cryptocurrency wallet provider Exodus’ NYSE American stock listing postponed due to SEC review

SEC files final memorandum on Ripple lawsuit

Subscribe to Updates

What's Hot

Meta Launches Five AI Models for Multimodal Processing, Music Generation, and More

Chameleon: Multimodal Text and Image Processing

Multi-token prediction for faster language model training

JASCO: An enhanced text-to-music conversion model

AudioSeal: AI-generated voice detection

Increased text-to-image versatility

Related Posts