Sarvam AI is a Bengaluru based artificial intelligence startup building foundation models. Its early model, Sarvam-1 (2B parameters), focused on multilingual Indic text, followed by larger models such as Sarvam-30B and Sarvam-105B for advanced reasoning and long-context tasks.
The company also develops speech-to-text, text-to-speech, and vision systems to support enterprise and public-sector use cases. Sarvam AI has released several models with open weights, hence they are leaning towards an open source approach.
they did a kickass demo in the AI summit esp. the multilingual translation capabilities…. sadly all such good events were crowded out in the media coverage on golgappa university
btw are these available for download? would like to try the models and see.
I highly highly doubt that. Considering all their previous models were finetunes as well.
I am 99% sure the 30b is a Qwen 3 fine tune. Everyone does this - even Microsoft.
The 105b model is likely a downsized version of a GLM.
Also, making a model from scratch including a completely new architecture is a bad idea in general because you will have to come up with inference tooling and debug problems that have already been solved in general.
I doubt Sarvam AI has the budget and talent to compete with Chinese and American labs to come up with entirely new architectures which might not even work as well as established ones.
The have categorically stressed that they built the two models from scratch. It would be a massive PR blunder from their side if they were finetunes. They would get roasted on SM 10x more than golgappa univ then.
These are not much exotic in the LLM space, it is only a matter of training data and compute. They claim to out perform the Chinese models but I guess it is mostly for Indian context usage and in practice they are probably at the level where Chinese AI labs were in 2025.
If they keep them closed, there won’t be any way to verify it. Besides, finetuning a Base Model is arguably building it from scratch (if you stretch the definition) as the Data is yours.
Unless I see proof and they explicitly talk about architecture etc., I am 90% sure these are finetunes.
It’s not hard to outperform the Base Models if you feed it good Data especially on Indian languages which the Base Models are not even trained in.
Edit: They are nowhere near Chinese Labs were in 2025. In 2025, Chinese Labs were releasing the best OSS models by far.
Not at all. anyone who calls a finetuned model as built from scratch is outright lying.
in that case they will face well deserved flak. you can wait some time and see.
Few people who tried it posted on X as such. also we have to compare these two models performance with the similar parameter models, not the 500B or 1T parameter models.
Lot of buzz words in the article but it kinda confirms 2 things:
The 30b model is likely based on Nvidia Nemotron 3 Nano 30b A3b. More than a finetune but less than a new architecture from scratch. The Nvidia Models feels heavily inspired by Qwen 3 30b A3b.
Edit: I would say this is similar to how companies say they made their own website when in reality, they use Square Space or Wix or a similar website builder tool.
I still maintain that the most important thing here are the Datasets.
It doesn’t take that much effort after you have the Data. It’s all about training - either through this Nemotron Framework or Finetuning and upscaling an existing base model.
You can rent compute if you have even a decent sized budget.