Rakuten Unveils New AI Models Optimized for Japanese
- New Large Language Model and first Small Language Model deliver greater efficiency and aim to make AI applications accessible for everyone
Tokyo, December 18, 2024 – Rakuten Group, Inc. has unveiled two new AI models: Rakuten AI 2.0, the company’s first Japanese large language model (LLM) based on a Mixture of Experts (MoE) architecture, and Rakuten AI 2.0 mini, the company’s first small language model (SLM). Both models will be released to the open-source community to empower companies and professionals developing AI applications by Spring 2025.
Rakuten AI 2.0 is an 8x7B MoE foundation model based on the Rakuten AI 7B model released in March 2024. This MoE model is comprised of eight 7 billion parameter models, each as a separate expert. Each individual token is sent to the two most relevant experts, as decided by the router. The experts and router are continually trained together with vast amounts of high-quality Japanese and English language data.
Rakuten AI 2.0 mini is a 1.5 billion parameter foundation model and the first SLM developed by the company. The model was trained from the beginning on extensive Japanese and English language datasets curated and cleaned through an in-house multi-stage data filtering and annotation process, ensuring high-performance and accuracy in text generation tasks.
"At Rakuten, we see AI as a catalyst to augment human creativity and drive greater efficiency. Earlier this year, we launched a 7B Japanese LLM to accelerate AI solutions for local research and development," commented Ting Cai, Chief AI & Data Officer of Rakuten Group. "Our new cutting-edge Japanese LLM and pioneering SLM set new standards in efficiency, thanks to high-quality Japanese language data and innovative algorithms and engineering. These breakthroughs mark a significant step in our mission to empower Japan’s businesses and professionals to create AI applications that truly benefit users."
High efficiency with advanced architecture
Rakuten AI 2.0 employs a sophisticated Mixture of Experts architecture, which dynamically selects the most relevant experts for given input tokens, optimizing computational efficiency and performance. The model offers comparable performance to 8x larger dense models, while consuming approximately 4x less computation than dense models during inference.
Increased performance
Rakuten has conducted model evaluations with the LM Evaluation Harness*4 for Japanese and English capability measurements. The leaderboard evaluates language models based on a wide range of Natural Language Processing and Understanding tasks that reflect the characteristics of the target language. Rakuten AI 2.0’s average Japanese performance improved to 72.29 from 62.93 over eight tasks compared to Rakuten AI 7B, the open LLM Rakuten released in March 2024.
global.rakuten.com/corp/news/press/2024/1218_01.html