SARVAM AI just launched

Sarvam AI is a Bengaluru based artificial intelligence startup building foundation models. Its early model, Sarvam-1 (2B parameters), focused on multilingual Indic text, followed by larger models such as Sarvam-30B and Sarvam-105B for advanced reasoning and long-context tasks.

The company also develops speech-to-text, text-to-speech, and vision systems to support enterprise and public-sector use cases. Sarvam AI has released several models with open weights, hence they are leaning towards an open source approach.

3 Likes

apparantly their app has a waiting list. if anyone got their invite code can have the access. please get me one. :face_holding_back_tears::backhand_index_pointing_right::backhand_index_pointing_left:

1 Like

Pricing

they did a kickass demo in the AI summit esp. the multilingual translation capabilities…. sadly all such good events were crowded out in the media coverage on golgappa university :sweat_smile:

btw are these available for download? would like to try the models and see.

5 Likes

yes, there were some good showcases by fractal too. sarvam was the limelight taken by those attention seekers.
btw here’s the app link: https://play.google.com/store/apps/details?id=ai.sarvam.indus

you can try there oss models, they are available in huggingface.

1 Like

are these 30B and 105B models available as OSS? was not able to find in LM Studio currently. will keep an eye though.

So whats the verdict ,this any good?

All their OSS Models: sarvamai (Sarvam AI)

I am guessing the 30B model is a finetune of Qwen 3 30B A3b and the 105B model is likely a finetune of a GLM.

No. they are not finetunes. both LLMs built from ground up by sarvam.

I highly highly doubt that. Considering all their previous models were finetunes as well.

I am 99% sure the 30b is a Qwen 3 fine tune. Everyone does this - even Microsoft.
The 105b model is likely a downsized version of a GLM.

Also, making a model from scratch including a completely new architecture is a bad idea in general because you will have to come up with inference tooling and debug problems that have already been solved in general.

I doubt Sarvam AI has the budget and talent to compete with Chinese and American labs to come up with entirely new architectures which might not even work as well as established ones.

The have categorically stressed that they built the two models from scratch. It would be a massive PR blunder from their side if they were finetunes. They would get roasted on SM 10x more than golgappa univ then.

These are not much exotic in the LLM space, it is only a matter of training data and compute. They claim to out perform the Chinese models but I guess it is mostly for Indian context usage and in practice they are probably at the level where Chinese AI labs were in 2025.

1 Like

If they keep them closed, there won’t be any way to verify it. Besides, finetuning a Base Model is arguably building it from scratch (if you stretch the definition) as the Data is yours.

Unless I see proof and they explicitly talk about architecture etc., I am 90% sure these are finetunes.

It’s not hard to outperform the Base Models if you feed it good Data especially on Indian languages which the Base Models are not even trained in.

Edit: They are nowhere near Chinese Labs were in 2025. In 2025, Chinese Labs were releasing the best OSS models by far.

2 Likes

Do we have their research papers? Or anything talking about the model architecture?

1 Like

Not at all. anyone who calls a finetuned model as built from scratch is outright lying.

in that case they will face well deserved flak. you can wait some time and see.

Few people who tried it posted on X as such. also we have to compare these two models performance with the similar parameter models, not the 500B or 1T parameter models.

Lot of buzz words in the article but it kinda confirms 2 things:

The 30b model is likely based on Nvidia Nemotron 3 Nano 30b A3b. More than a finetune but less than a new architecture from scratch. The Nvidia Models feels heavily inspired by Qwen 3 30b A3b.

The 105b model is likely based on Nvidia’s Nemotron Super/Ultra.

Edit: I would say this is similar to how companies say they made their own website when in reality, they use Square Space or Wix or a similar website builder tool.

Using the Nemotron framework is more like one step more than using CUDA.

No. this is more like companies building their websites using React.

I still maintain that the most important thing here are the Datasets.

It doesn’t take that much effort after you have the Data. It’s all about training - either through this Nemotron Framework or Finetuning and upscaling an existing base model.

You can rent compute if you have even a decent sized budget.

Data is what will make or break your model.

I’m looking for something like this - https://arxiv.org/pdf/2505.09388

The architecture itself might not be new but it seems performant enough.

It’ll be nice if they publish the proper details rather than just benchmarks.

Good to see homegrown LLMs.
They have made tall claims and hopefully they’re able to back it up.

Lmao I hope they at least release proper benchmarks.