Mistral AI announced NeMo, a 12B model created in partnership with NVIDIA. The new model boasts an impressive context window of up to 128,000 tokens, delivering state-of-the-art performance in inference, world knowledge, and coding accuracy for its size category.
The collaboration between Mistral AI and NVIDIA has resulted in a model that not only pushes the boundaries of performance, but also prioritizes ease of use. Using a standard architecture, Mistral NeMo is designed to be a seamless replacement for systems currently using the Mistral 7B.
To encourage adoption and further research, Mistral AI has made both pre-trained base and instruction-tuned checkpoints available under the Apache 2.0 license. This open-source approach is attractive to researchers and enterprises alike and is likely to accelerate the integration of the model into various applications.
One of the key features of Mistral NeMo is quantization awareness during training, which enables FP8 inference without sacrificing performance, which can be crucial for organizations looking to efficiently deploy large language models.
Mistral AI provided a performance comparison between the Mistral NeMo base model and two recent open-source pre-trained models: Gemma 2 9B and Llama 3 8B.
“The model is designed for global, multilingual applications. It is trained on function calls, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi,” Mistral AI explains.
“This is another step towards putting cutting-edge AI models in the hands of everyone, in all the languages that shape human culture.”
Mistral NeMo has introduced Tekken, a new tokenizer based on Tiktoken. Trained on over 100 languages, Tekken provides improved compression efficiency for both natural language text and source code compared to the SentencePiece tokenizer used in previous Mistral models. The company claims that Tekken provides approximately 30% better compression efficiency for source code and several major languages, with even more significant improvements for Korean and Arabic.
Mistral AI also claims that Tekken outperforms the Llama 3 tokenizer at compressing text for about 85% of all languages, which could give Mistral NeMo the edge in multilingual applications.
The model weights are now available in HuggingFace, both in base and instructed versions. Developers can start experimenting with Mistral NeMo using the mistral-inference tool and adapt it using mistral-finetune. If you are using Mistral's platform, you can access the model under the name open-mistral-nemo.
In honor of our collaboration with NVIDIA, Mistral NeMo is also packaged as an NVIDIA NIM inference microservice, available at ai.nvidia.com. This integration streamlines deployment for organizations already invested in NVIDIA's AI ecosystem.
The release of Mistral NeMo marks a major step towards democratizing advanced AI models. By combining high performance, multi-language capabilities, and open source availability, Mistral AI and NVIDIA are positioning the model as a versatile tool for a wide range of AI applications across various industries and research disciplines.
(Photo: David Clode)
reference: Meta follows Apple in withholding AI models from EU users
Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California and London – this comprehensive event will take place alongside other major events such as Intelligent Automation Conference, BlockX, Digital Transformation Week and Cyber Security & Cloud Expo.
Find out about upcoming enterprise technology events and webinars hosted by TechForge here.