China's AI industry has delivered a seismic shock to everyone. DeepSeek, a little-known AI startup is reportedly as powerful as OpenAI's ChatGPT, developed at a fraction of the investment cost. This breakthrough was described by Silicon Valley investor Marc Andreessen as a "Sputnik Moment", which has rattled global tech markets.
Since this fallout was immediate, the tech stocks in the US suffered a huge loss, losing nearly $1 trillion in a single day. Also, Nvidia, the world's most valuable chipmaker company, saw its shares drop by 17% and lose $589 billion in market cap. This must be the largest one-day loss for any company in US history.
The AI models from the Chinese startup DeepSeek started gaining widespread acceptance, eventually surpassing ChatGPT, which is the most downloaded app on the App Store. DeepSeek V3 and DeepSeek R1 rivals openAI's existing models o1 and o3.
DeepSeek is an AI company in China headquartered in Hangzhou and was founded by entrepreneur Liang Wenfeng. He is also the CEO of the quantitative hedge fund High Flyer. Liang reportedly began working on AI in 2019 with his AI company - High Flyer and has been dedicated to research in this domain. Liang is the controlling shareholder of DeepSeek, and according to a Reuters report, High Flyer owns patents relating to chip clusters that are used for training AI models.
What makes DeepSeek models stand out is their performance and open-source nature. The DeepSeek V3 was trained with a budget of just $6 million, which is a fraction of the hundreds of millions invested by companies like OpenAI, Meta, Google, etc., into their frontier models.
DeepSeek has been compared against the US AI powerhouse OpenAI because it is widely known for building large language models. Earlier this month, one of the first models unveiled by the company DeepSeek V3 surpassed GPT-4o and Claude 3.5 Sonnet in numerous benchmarks.
DeepSeek V3 stands out because of its architecture known as Mixture-of-Experts (MOE). The MOE models are like a team of specialists who work together to answer any question, instead of one big model managing everything. The DeepSeek V3 includes large, high-quality datasets that offer the model to understand the language and task-specific capabilities. Additionally, the model uses a new technique called Multi-Head Latent Attention (MLA) to enhance efficiency and cut costs of training and deployment, allowing it to compete with the most advanced models of the day.
Even if the AI community was still impressed by DeepSeek V3, the Chinese company launched its new model, DeepSeek R1. This new model comes with the ability to think. This capability is known as test-time computing. The R1 model has the same MOE architecture which matches and often surpasses the performance of the OpenAI frontier model in tasks like math, coding, and general knowledge. Also, R1 is reportedly 90-95% more affordable than OpenAI o1.
DeepSeek R1, an open-sourced model, is both powerful and free. While OpenAI o1 is a thinking model, it takes time to read the prompts to produce the most appropriate responses. However, one can see R1's thinking action, meaning that the model shows its chain of thought while producing the output to the prompt.
The R1 arrives at a time when industry giants started pumping billions into AI infrastructure. DeepSeek has delivered a state-of-the-art model that is competitive. Moreover, the company may invite others to replicate their work by making it open-source. After R1 releases, it started raising questions about whether such massive expenditures are necessary which has led to intense scrutiny of the industry's current approach.
It is commonly known now that training AI models requires massive investments. However, DeepSeek has found a way to overcome the massive infrastructure and hardware costs. DeepSeek was able to reduce the cost of building its AI models by using NVIDIA H800, which is considered to be an older generation of GPUs in the US. While AI in America uses advanced AI GPUs like NVIDIA H100, DeepSeek relies on the older version - NVIDIA H800, which has lower chip-to-chip bandwidth.
In 2022, US regulators put in place some rules that prevented NVIDIA from selling two advanced chips - A100 and H100, as per national security concerns. Following the rules, NVIDIA designed a new chip called the A800 which reduced some capabilities of the A100 and made the A800 legal for export to China. Engineers of DeepSeek relied on optimizing low-level code to enhance memory usage, this reportedly ensured that the performance was not affected by the limitations of chips.
Another key aspect of creating AI models is training which consumes massive resources. According to the research paper, the Chinese AI company has only trained some necessary parts of its model by employing a technique called Auxiliary-Loss-Free Load Balancing
Also Read:
Disclaimer: Finance Knock provides information from reliable and credible sources. However, we recommend verifying the details before making any financial decisions. Although we aim to provide accurate information, we are not responsible for any decisions made based on our content.
Comments