- Google is developing a new large-scale language model called Gemini 1.5.
- Gemini 1.5 improves on previous versions by handling different data types.
- The new model uses a “mix of experts” model for greater efficiency and has a larger context window.
Google has just raised the bar in the AI wars that are creating fierce competition between big tech companies and startups.
The technology giant, owned by parent company Alphabet Inc., announced that it is developing a new large-scale language model (LLM) called Gemini 1.5. According to the Verge, the first version of this technology, his Gemini 1.5 pro, will be released soon for early testing.
The news was outlined by Sundar Pichai, CEO of Google and Alphabet, and Demis Hassabis, CEO of Google DeepMind, in a company blog post written last week, stating that Google is OpenAI's answer to GPT. It was announced just two months after the original Gemini was announced. 4 and other his LLMs are created by startups and large technology companies alike.
Gemini is the next generation multimodal AI model. This means that the technology can process multiple types of data, including images, text, audio, video, and a combination of coding languages. This technology is intended to be used as a business tool and personal assistant.
Gemini isn't Google's first foray into AI. Rather, the tech company conducted a “cleanup” of its various AI tools in early February, renaming them all to Gemini.
In Gemini 1.5, new technology improvements go far beyond what the original Gemini was capable of. Here's what we know so far about this:
Uses a “mixture of experts” model
Gemini 1.5 promises to be faster and more efficient thanks to a specialization technology called “mix of experts,” also known as MoE. Rather than running the entire model every time a query is received, Gemini's MoE can generate a good answer using only the relevant portion of its processing power.
Context window is now larger
The power of an AI model is determined by its context window, which consists of the building blocks used to process information. This can include words, images, video, audio, or code. In the AI world, these building blocks are known as tokens.
The original Gemini could run up to 32,000 tokens. However, Gemini 1.5 Pro's context window capacity can handle up to 1 million tokens. This means the new LLM can analyze more data than previous versions, namely codebases containing 1 hour of video, 11 hours of audio, 30,000+ lines of code or 700,000+ words, Google said. says the blog post.
Improved performance compared to previous version
When testing the new AI model against its predecessor, the company says Gemini 1.5 Pro outperformed its predecessor in 87% of the benchmark tests used by Google.
Additionally, Gemini 1.5 was able to find small pieces of text within blocks of data as large as 1 million tokens 99% of the time during a test known as a “needle in a haystack” evaluation.
Gemini 1.5 also improves the ability to generate good responses from very long queries without requiring users to spend more time fine-tuning their queries. According to Google, testers provided Gemini 1.5 with a grammar manual in an obscure language, and the LLM was able to translate the text into English at a level similar to human learning.
Subjected to enhanced safety testing
As AI becomes more powerful, so too are concerns about the technology's role in security issues, from weaponization to deception. Google says Gemini 1.5 went through extensive ethics and safety testing in preparation for wide release. The technology company has conducted research on the safety risks of AI and developed techniques to mitigate potential harm.