
China is so big, but so shielded from foreign influences. This is giving huge advantage to Chinese companies.
Since China’s domestic market is largely censored, and that policies also prohibit non-vetted products to be used in the region, China literally leave a market of over 1 billion internet consumers open to local players.
And in the world where AI has taken the spotlight, thanks to the West's influence and OpenAI's introduction of ChatGPT, the East that also took notice, has players, large and small, who wish to compete in this lucrative industry.
And DeepSeek is one of them.
Unlike bigger players like Baidu, which has its own ERNIE chatbot, DeepSeek ventures a bit further by introducing an AI that can reason.
And remarkably, its product, DeepSeek-R1 is able to rival the likes of OpenAI o1.
DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!
o1-preview-level performance on AIME & MATH benchmarks.
Transparent thought process in real-time.
Open-source models & API coming soon!
Try it now at https://t.co/v1TFy7LHNy#DeepSeek pic.twitter.com/saslkq4a1s— DeepSeek (@deepseek_ai) November 20, 2024
DeepSeek, which is a Chinese-based research lab, has introduced DeepSeek-R1 as one of the first AI models designed for advanced reasoning.
The model emphasizes "reasoning" by simulating thoughtful problem-solving, allowing it to self-check and better handle complex queries.
DeepSeek-R1 has reportedly matched OpenAI’s o1-preview model in benchmarks like AIME, which uses other AI to evaluate performance, and MATH, a collection of word problems.
Similar to OpenAI o1, the reasoning process can result in slower response times, with answers sometimes taking tens of seconds.
The extra seconds these models need is to fact-check themselves by spending more time considering a question or query.
This helps them avoid some of the weaknesses other models normally experience.
DeepSeek's has what it calls the 'DeepThink' feature, which represents a step forward in AI-driven reasoning and contextual understanding.
It leverages the advanced architecture of DeepSeek models, particularly its Multi-head Latent Attention (MLA) and Mixture-of-Experts (MoE) systems, to enhance efficiency and inference capabilities. DeepThink focuses on complex problem-solving tasks, coding support, and maintaining extended contextual understanding across interactions, making it suitable for detailed and iterative tasks like coding, data analysis, and strategic planning.
This feature is part of DeepSeek's broader goal to merge chat and coding functionalities while keeping the models cost-efficient and open-source. It is a direct competitor to prominent LLMs.
Inference Scaling Laws of DeepSeek-R1-Lite-Preview
Longer Reasoning, Better Performance. DeepSeek-R1-Lite-Preview shows steady score improvements on AIME as thought length increases. pic.twitter.com/zVk1GeOqgP— DeepSeek (@deepseek_ai) November 20, 2024
Despite its strengths, DeepSeek-R1 is not flawless.
For example, it struggles with simpler logic puzzles like tic-tac-toe.
It also has less safeguards, with many users reporting that they managed to bypass the filters and make the AI answer to questions, like how to make Methamphetamine, a potent central nervous system stimulant.
DeepSeek-R1 however, managed to avoid answering politically sensitive topics.
This reflects the strict regulatory pressures from the Chinese government, which mandates alignment with "core socialist values."
The emergence of reasoning models like DeepSeek-R1 and o1 comes as traditional scaling laws—relying on ever-increasing data and compute—face diminishing returns.
Test-time compute, which allows models additional processing time during tasks, underpins this new approach. Microsoft CEO Satya Nadella recently highlighted test-time compute as a potential “new scaling law.”
DeepSeek plans to open-source the model and provide API access. Its earlier general-purpose model, DeepSeek-V2, pressured competitors like ByteDance and Baidu to lower usage prices or make models free.
With backing from High-Flyer Capital, DeepSeek utilizes its own cutting-edge infrastructure, including a 10,000-GPU cluster costing $138 million.
The organization aims to achieve breakthroughs in AI capabilities, pushing toward "superintelligent" systems.