AIML DeepSeek.com making waves!

bssunilreddy · Feb 7, 2025

Chinese GPU Manufacturers Push Out Support For Running DeepSeek’s AI Models On Local Systems, Intensifying the AI Race

Chinese consumer GPU manufacturers have now started to bring support for running DeepSeek's R1 LLM models on local systems, jumping into the global AI race.

Well, China has been making some serious rounds in the AI industry by not only releasing a model capable of competing with mainstream options by OpenAI but also excelling in AI hardware capabilities. We recently reported on how Huawei is preparing to challenge NVIDIA's dominance through its Ascend 910C AI chip, and now it seems like Chinese GPU manufacturers, notably Moore Threads and Baidu's Kunlun, have stepped up to bring support for DeepSeek's R1 model with their consumer GPUs, fueling the race for computational capabilities.

Starting with Moore Threads, the firm has brought in support for DeepSeek's distillation models through a deployment service, and it is claimed that it is compatible with Moore Threads' MTT S80 and MTT S4000 consumer GPUs, with the latter one designed specifically for workstation workloads. Along with this, the firm has brought in support for DeepSeek's model on its KUAE cluster as well, which is said to be an in-house cluster explicitly designed for AI workloads, featuring the MTT S4000 GPUs.

This is certainly an outstanding achievement by Moore Threads, given that bringing the ability to run DeepSeek's AI models on local machines will fuel the professional consumers' adoption of the company's products. And, the MTT S80 and MTT S4000 GPUs can also be deployed for inference workloads, which means that the manufacturer has brought in extensive support. However, we are unaware of the performance of DeepSeek's models on these GPUs, but it surely won't be on the level of what AMD and NVIDIA offer.

In addition, Baidu, the famous Chinese tech company, has built its in-house AI cluster featuring the Kunlun Core P800 AI chips. According to MyDrivers, the Core P800 performs 20-50% better than similar mainstream GPUs, supports 8-bit inference, and has significantly low deployment and maintenance costs. The AI chip is said to completely support DeepSeek's V3/R1 AI models, and inference deployment is said to be achieved pretty easily.

Baidu's AI cluster is said to consist of 30,000 Core P800 AI GPUs and will be up and running soon. Chinese GPU manufacturers' achievements clearly show that they haven't been held back by global influence and have instead focused on shifting their hardware arsenal to domestically manufactured products, making them sustainable in the long run.

Source: https://wccftech.com/chinese-gpu-ma...-running-deepseek-ai-models-on-local-systems/

psyph3r · Feb 10, 2025

Meta staff torrented nearly 82TB of pirated books for AI training — court records reveal copyright violations

Did they think they could get away with it?

www.tomshardware.com

TEUser2K1 · Feb 22, 2025

DeepSeek to share some AI model code, doubling down on open source

https://www.reuters.com/technology/artificial-intelligence/deepseek-share-some-ai-model-code-doubling-down-open-source-2025-02-21/

bssunilreddy · Feb 24, 2025

China Doesn’t Need “Cutting-Edge” Accelerators To Progress With AI; DeepSeek’s Newest “FlashMLA” Project Now Brings In 8x TFLOPS Power Boost With NVIDIA’s H800 GPUs

DeepSeek's FlashMLA Will Help China's AI Industry To Squeeze Out Maximum Power From NVIDIA's Cut-Down Hopper GPUs

China has reportedly managed to seek an alternative to NVIDIA's "cut-down" AI accelerators, as DeepSeek's newest project has brought in eight times the TFLOPS with the Hopper H800s AI accelerators.

It seems like China isn't depending on anyone to scale up in terms of hardware capabilites, as in-house companies, notably DeepSeek, are utilizing the power of software to find workarounds with the equipment they have available. The latest developments by DeepSeek are some of the wildest ones we have seen in the markets, as, according to the firm, they have managed to squeeze out significant performance from NVIDIA's "cut-down" Hopper H800 GPUs by essentially optimizing memory consumption and allocation of resources across inference requests.

Just a quick background: DeepSeek is holding an "OpenSource" week, where it plans to unveil technologies and tools that will be easily available to the general public through Github repositories. The first day looks to be a great start since the firm unveiled FlashMLA, a "decoding kernel" designed particularly for NVIDIA's Hopper GPUs. Before we go into how it works out, let's take a quick look at the enhancements it has brought into the markets, and they surely are revolutionary.

DeepSeek claims that they have managed to squeeze out 580 TFLOPS for BF16 matrix multiplication on the Hopper H800, which is approximately eight times higher than the industry's standard rating. Not only this, but with efficient memory utilization, FlashMLA enables memory bandwidth of up to 3000 GB/s, which is almost two times the H800's theoretical peak. The important point here is all of this becomes possible simply through lines of code rather than hardware enhancements.

DeepSeek's FlashMLA implements "low-rank key-value compression", which, in easy terms, factorizes chunks of data into smaller portions, allowing for a faster processing, along with reduced memory consumption by up to 40%-60%. Another interesting inclusion is the use of block-based paging system, which dynamically allocates memory depending upon the intensity of the task, instead of a one fix value. This helps models to process variable-length sequences much more effectively, ultimately enhancing performance.

DeepSeek's development shows that the world of AI computing isn't dependent upon a single factor; rather, it is much more diverse, and this is clearly evident with FlashMLA. For now, it appears that the tool is specific for Hopper GPUs only, and it will be interesting to see what sort of performance we could bring in with the H100 through FlashMLA.

Source: https://wccftech.com/china-doesnt-need-cutting-edge-accelerators-to-progress-with-ai/

This is what I was talking about a The Korea Times report which states that China is way ahead in semi-conductor manufacturing than Korea & now in AI also with limited resources they are doing wonders and ****ing the woke west.

We also did the same thing with limited resources like in Chandrayan by ISRO & 3 Stage Nuclear Programme by BHAVINI & also Indigenous Nuclear Submarine Arihant & Arighat by BHAVINI. You know US & its allies didn't provide us required machinery or fuel sources by imposing sanctions through IAEA & NSG.

akaash · Feb 26, 2025

These guys are hardcore geniuses working there. Basically re-wrote and seriously optimized the low level hopper stuff! Really amazing.

Emperor · Mar 21, 2025

Emperor said:
believe me Trump & Musk (with their policies) instead of MAGA going to Make China Great Again (MCGA)

I post above on 31-01-2025 and Today's Rediff.com Articles sound alike read here

akaash · Mar 21, 2025

Emperor said:
I post above on 31-01-2025 and Today's Rediff.com Articles sound alike read here

This seems an obvious geopolitical outcome.
If the old established reliable merchant at a bazaar becomes an unreliable cheat/bully then of course everyone shifts their attention to who can now interact with them more reliably. The geopolitical world at large may not be a fan of china, but they do know china's character isn't one of randomness and unreliably shifting position for no apparent reason.

It's not great to have a smart enemy, but it is far worse to have an unpredictable, irrational, stupid frenemy. At least the smart enemy is predictable and you know what they want and thus how to work with them.

On a AI related note, it's wild that China is now the leader sharing amazing open source cutting edge advancements, and actually really sharing the 'how' part of it as well. While the Anthropic, Google, OpenAi bunch is far from open.

Kloud · Mar 21, 2025

Emperor said:
believe me Trump & Musk (with their policies) instead of MAGA going to Make China Great Again (MCGA)

You're not wrong. Businesses want stability. NASDAQ is falling, Chinese share market climbing.

mzsa1994 · Mar 21, 2025

akaash said:
It's not great to have a smart enemy, but it is far worse to have an unpredictable, irrational, stupid frenemy. At least the smart enemy is predictable and you know what they want and thus how to work with them.

Search

Search

AIML DeepSeek.com making waves!

bssunilreddy

Chinese GPU Manufacturers Push Out Support For Running DeepSeek’s AI Models On Local Systems, Intensifying the AI Race

psyph3r

Meta staff torrented nearly 82TB of pirated books for AI training — court records reveal copyright violations

TEUser2K1

bssunilreddy

China Doesn’t Need “Cutting-Edge” Accelerators To Progress With AI; DeepSeek’s Newest “FlashMLA” Project Now Brings In 8x TFLOPS Power Boost With NVIDIA’s H800 GPUs

DeepSeek's FlashMLA Will Help China's AI Industry To Squeeze Out Maximum Power From NVIDIA's Cut-Down Hopper GPUs

akaash

Emperor

akaash

Kloud

Not a Fan.

mzsa1994

AIML DeepSeek.com making waves!

Chinese GPU Manufacturers Push Out Support For Running DeepSeek’s AI Models On Local Systems, Intensifying the AI Race​

China Doesn’t Need “Cutting-Edge” Accelerators To Progress With AI; DeepSeek’s Newest “FlashMLA” Project Now Brings In 8x TFLOPS Power Boost With NVIDIA’s H800 GPUs​

DeepSeek's FlashMLA Will Help China's AI Industry To Squeeze Out Maximum Power From NVIDIA's Cut-Down Hopper GPUs​

Not a Fan.

Chinese GPU Manufacturers Push Out Support For Running DeepSeek’s AI Models On Local Systems, Intensifying the AI Race

China Doesn’t Need “Cutting-Edge” Accelerators To Progress With AI; DeepSeek’s Newest “FlashMLA” Project Now Brings In 8x TFLOPS Power Boost With NVIDIA’s H800 GPUs

DeepSeek's FlashMLA Will Help China's AI Industry To Squeeze Out Maximum Power From NVIDIA's Cut-Down Hopper GPUs