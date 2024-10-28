ETV Bharat / technology

Sarvam AI Launches Multilingual AI Model Sarvam 1, Boosting AI For Indic Languages

Hyderabad: Bengaluru-based startup Sarvam AI has launched the Sarvam 1 LLM, a new open-source AI model that has been trained on 11 languages, including Bengali, English, Gujarati, Hindi, Kannada, Marathi, Malayalam, Oriya, Punjabi, Tamil, and Telugu.

Sarvam 1 is a 2-billion-parameter model trained on 4 trillion tokens using a custom tokeniser on Nvidia H100 Tensor Core GPUs. It is claimed to be up to four times more efficient than other AI models trained in Indian languages.

To enable multilingual tasks, the Sarvam-2T training corpus includes 20 per cent of datasets in Hindi, English, and programming languages. Sarvam AI used synthetic data generation methods to build its datasets in order to deal with the shortage of high-quality training data for Indian languages. Developers can use the base model of Sarvam AI, available on Hugging Face, to build their own AI applications for Indic languages.

In December 2023, Sarvam AI launched the country's first Hindi LLM-- Open Hathi, built on Meta AI's Llama models. Back in August 2024, the startup also launched its first foundational AI model called Sarvam 2B.

Meta also talked about Sarvam AI at its recently held 'Build with AI Summit' in Bengaluru, calling it a startup that is advancing AI for Indic languages and creating Hindi LLMs while operating under limited resources.

Meta Chief AI Scientist in Bengaluru