News DeepSeek.com making waves!

Chinese GPU Manufacturers Push Out Support For Running DeepSeek’s AI Models On Local Systems, Intensifying the AI Race​

Chinese consumer GPU manufacturers have now started to bring support for running DeepSeek's R1 LLM models on local systems, jumping into the global AI race.

Well, China has been making some serious rounds in the AI industry by not only releasing a model capable of competing with mainstream options by OpenAI but also excelling in AI hardware capabilities. We recently reported on how Huawei is preparing to challenge NVIDIA's dominance through its Ascend 910C AI chip, and now it seems like Chinese GPU manufacturers, notably Moore Threads and Baidu's Kunlun, have stepped up to bring support for DeepSeek's R1 model with their consumer GPUs, fueling the race for computational capabilities.

Starting with Moore Threads, the firm has brought in support for DeepSeek's distillation models through a deployment service, and it is claimed that it is compatible with Moore Threads' MTT S80 and MTT S4000 consumer GPUs, with the latter one designed specifically for workstation workloads. Along with this, the firm has brought in support for DeepSeek's model on its KUAE cluster as well, which is said to be an in-house cluster explicitly designed for AI workloads, featuring the MTT S4000 GPUs.

This is certainly an outstanding achievement by Moore Threads, given that bringing the ability to run DeepSeek's AI models on local machines will fuel the professional consumers' adoption of the company's products. And, the MTT S80 and MTT S4000 GPUs can also be deployed for inference workloads, which means that the manufacturer has brought in extensive support. However, we are unaware of the performance of DeepSeek's models on these GPUs, but it surely won't be on the level of what AMD and NVIDIA offer.

In addition, Baidu, the famous Chinese tech company, has built its in-house AI cluster featuring the Kunlun Core P800 AI chips. According to MyDrivers, the Core P800 performs 20-50% better than similar mainstream GPUs, supports 8-bit inference, and has significantly low deployment and maintenance costs. The AI chip is said to completely support DeepSeek's V3/R1 AI models, and inference deployment is said to be achieved pretty easily.

Baidu's AI cluster is said to consist of 30,000 Core P800 AI GPUs and will be up and running soon. Chinese GPU manufacturers' achievements clearly show that they haven't been held back by global influence and have instead focused on shifting their hardware arsenal to domestically manufactured products, making them sustainable in the long run.

Source: https://wccftech.com/chinese-gpu-ma...-running-deepseek-ai-models-on-local-systems/
 

China Doesn’t Need “Cutting-Edge” Accelerators To Progress With AI; DeepSeek’s Newest “FlashMLA” Project Now Brings In 8x TFLOPS Power Boost With NVIDIA’s H800 GPUs​

DeepSeek's FlashMLA Will Help China's AI Industry To Squeeze Out Maximum Power From NVIDIA's Cut-Down Hopper GPUs

China has reportedly managed to seek an alternative to NVIDIA's "cut-down" AI accelerators, as DeepSeek's newest project has brought in eight times the TFLOPS with the Hopper H800s AI accelerators.

It seems like China isn't depending on anyone to scale up in terms of hardware capabilites, as in-house companies, notably DeepSeek, are utilizing the power of software to find workarounds with the equipment they have available. The latest developments by DeepSeek are some of the wildest ones we have seen in the markets, as, according to the firm, they have managed to squeeze out significant performance from NVIDIA's "cut-down" Hopper H800 GPUs by essentially optimizing memory consumption and allocation of resources across inference requests.

Just a quick background: DeepSeek is holding an "OpenSource" week, where it plans to unveil technologies and tools that will be easily available to the general public through Github repositories. The first day looks to be a great start since the firm unveiled FlashMLA, a "decoding kernel" designed particularly for NVIDIA's Hopper GPUs. Before we go into how it works out, let's take a quick look at the enhancements it has brought into the markets, and they surely are revolutionary.

DeepSeek claims that they have managed to squeeze out 580 TFLOPS for BF16 matrix multiplication on the Hopper H800, which is approximately eight times higher than the industry's standard rating. Not only this, but with efficient memory utilization, FlashMLA enables memory bandwidth of up to 3000 GB/s, which is almost two times the H800's theoretical peak. The important point here is all of this becomes possible simply through lines of code rather than hardware enhancements.

DeepSeek's FlashMLA implements "low-rank key-value compression", which, in easy terms, factorizes chunks of data into smaller portions, allowing for a faster processing, along with reduced memory consumption by up to 40%-60%. Another interesting inclusion is the use of block-based paging system, which dynamically allocates memory depending upon the intensity of the task, instead of a one fix value. This helps models to process variable-length sequences much more effectively, ultimately enhancing performance.

DeepSeek's development shows that the world of AI computing isn't dependent upon a single factor; rather, it is much more diverse, and this is clearly evident with FlashMLA. For now, it appears that the tool is specific for Hopper GPUs only, and it will be interesting to see what sort of performance we could bring in with the H100 through FlashMLA.

Source: https://wccftech.com/china-doesnt-need-cutting-edge-accelerators-to-progress-with-ai/

This is what I was talking about a The Korea Times report which states that China is way ahead in semi-conductor manufacturing than Korea & now in AI also with limited resources they are doing wonders and ****ing the woke west.

We also did the same thing with limited resources like in Chandrayan by ISRO & 3 Stage Nuclear Programme by BHAVINI & also Indigenous Nuclear Submarine Arihant & Arighat by BHAVINI. You know US & its allies didn't provide us required machinery or fuel sources by imposing sanctions through IAEA & NSG.