DeepSeek vs ChatGPT – Artifex.News

Has China achieved AI breakthrough with DeepSeek?

admin — Tue, 28 Jan 2025 10:01:45 +0000

For over two years, San Francisco-based OpenAI has dominated artificial intelligence (AI) with its generative pre-trained language models. The startup’s chatbot penned poems, wrote long-format stories, found bugs in code, and helped search the Internet (albeit with a cut off date). Its ability to generate coherent sentences flawlessly baffled users around the world.

Far away, across the Pacific Ocean, in Beijing, China made its first attempt to counter America’s dominance in AI. In March 2023, Baidu received the government’s approval to launch its AI chatbot, Ernie bot. Ernie was touted as the China’s answer to ChatGPT after the bot received over 30 million user sign-ups within a day of its launch.

But the initial euphoria around Ernie gradually ebbed as the bot fumbled and dodged questions about China’s President Xi Jinping, the Tiananmen Square crackdown and the human rights violation against the Uyghur Muslims. In response to questions on these topics, the bot replied: “Let’s talk about something else.”

Late to the AI party

As the hype around Ernie met the reality of Chinese censorship, several experts pointed out that difficulty of building large language models (LLMs) in the communist country. Google’s former CEO and chairman, Eric Schmidt, in talk at the Harvard Kennedy School of Government, in October 2023, said: “They [China] were late to the party. They didn’t get to this [LLM] AI space early enough.” Mr. Schmidt further pointed out that lack of training data on language and China’s unfamiliarity with open-source ideas may make the Chinese fall behind in global AI race.

As these Chinese tech giants trailed, the U.S. tech giants marched forward with their advances in LLMs. Microsoft-backed OpenAI cultivated a new crop of reasoning chatbots with its ‘O’ series that were better than ChatGPT. These AI models were the first to introduce inference-time scaling, which refers to how an AI model handles increasing amounts of data when it is giving answers.

AI trader turned AI builder

While the Chinese tech giants languished, a Zhejiang-based hedge fund, High-Flyer, that used AI for trading, set up its own AI lab, DeepSeek, in April 2024. Within a year, the AI spin off developed the DeepSeek-v2 model that performed well on several benchmarks and was able to provide the service at a significantly lower cost than other Chinese LLMs.

When DeepSeek-v3 was launched in December, it stunned AI companies. The Mixture-of-Expert (MoE) model was pre-trained on 14.8 trillion tokens with 671 billion total parameters of which 37 billion are activated for each token.

A MoE model uses different “experts” or sub-models that specialise in different aspects of language or tasks. And each expert is activated when its relevant to a particular task. This makes the model more efficient, saves resources and speeds up processing.

Training despite American sanctions

According to the technical paper released on December 26, DeepSeek-v3 was trained for 2.78 million GPU hours using Nvidia’s H800 GPUs. When compared to Meta’s Llama 3.1 training, which used Nvidia’s H100 chips, DeepSeek-v3 took 30.8 million GPU hours lesser.

After seeing early success in DeepSeek-v3, High-flyer built its most advanced reasoning models – – DeepSeek-R1-Zero and DeepSeek-R1 – – that has potentially disrupted the AI industry by becoming one of the most cost-efficient models in the market.

When compared to OpenAI’s o1, DeepSeek’s R1 slashes costs by a staggering 93% per API call. This is a huge advantage for businesses and developers looking to integrate AI without breaking the bank.

The savings don’t stop there. Unlike older models, R1 can run on high-end local computers — so, no need for costly cloud services or dealing with pesky rate limits. This gives users the freedom to run AI tasks faster and cheaper without relying on third-party infrastructure.

Plus, R1 is designed to be memory efficient as it requires only a portion of RAM to operate, which is low for an AI of its calibre. Separately, by batching, the processing of multiple tasks at once, and leveraging the cloud, this model further lowers costs and speeds up performance, making it even more accessible for a wide range of users.

A close contest

While it may not be quite as advanced as OpenAI’s o3, it still offers comparable quality to the o1. According to benchmark data on both models on LiveBench, when it comes to overall performance, o1 edges out R1 with a global average score of 75.67 compared to the Chinese model’s 71.38. OpenAI’s o1 continues to perform well on reasoning tasks with a nearly nine-point lead against its competitor, making it a go-to choice for complex problem-solving, critical thinking and language-related tasks.

When it comes to coding, mathematics and data analysis, the competition is quite tighter. Specifically, in data analysis, R1 proves to be a better choice for analysing large datasets.

One important area where R1 fails miserably, which is reminiscent of the Ernie Bot, is on topics that are censored in China. For instance, to any question on the Chinese President Xi Jinping, the Tiananmen Square protest, and the Uyghur Muslims, the bot tells its users: “Let’s talk about something else.”

Unlike Ernie, this time around, despite the reality of Chinese censorship, DeepSeek’s R1 has soared in popularity globally. It has already surpassed major competitors like ChatGPT, Gemini, and Claude to become the number one downloaded app in the U.S. (In India, DeepSeek is at the third spot under productivity, followed by Gmail and ChatGPT apps.) This meteoric rise in popularity highlights just how quickly the AI community is embracing R1’s promise of affordability and performance.

Smaller models rise

While OpenAI’s o4 continues to be the state-of-art AI model out there, it is only be a matter of time before other models could take the lead in building super intelligence.

DeepSeek shows that, through its distillation process, it can effectively transfers the reasoning patterns of larger models into smaller models. This means, instead of training smaller models from scratch using reinforcement learning (RL), which can be computationally expensive, the knowledge and reasoning abilities acquired by a larger model can be transferred to smaller models, resulting in better performance.

In its technical paper, DeepSeek compares the performance of distilled models with models trained using large scale RL. The results indicate that the distilled ones outperformed smaller models that were trained with large scale RL without distillation. Specifically, a 32 billion parameter base model trained with large scale RL achieved performance on par with QwQ-32B-Preview, while the distilled version, DeepSeek-R1-Distill-Qwen-32B, performed significantly better across all benchmarks. (Qwen is part of an LLM family on Alibaba Cloud.)

This, in essence, would mean that inference could shift to the edge, changing the landscape of AI infrastructure companies as more efficient models could reduce reliance on centralised data centres.

The future of AI race

While distillation is a powerful method for enabling smaller models to achieve high performance, it has limits. For instance, as distilled models will be tied to the “teacher“ model, the limitations in the larger models will also be transferred to the smaller ones. Also, distilled models may not be able to replicate the full range of capabilities or nuances of the larger model. This can affect the distilled model’s performance in complex or multi-faceted tasks.

Distillation is an effective tool for transferring existing knowledge, but it may not be the path to major paradigm shifts in AI on its own. That means, the need for GPUs may increase companies build only increase as more powerful intelligent models.

DeepSeek’s R1 and OpenAI’ o1 are the first reasoning models that are actually working. And R1 is the first successful demo of using RL for reasoning. From here, more compute power will be needed for training, running experiments, and exploring advanced methods for creating agents. There are many ways to leverage compute to improve performance, and right now, American companies are in a better position to do this, thanks to their larger scale and access to more powerful chips.

Published – January 28, 2025 03:31 pm IST

Source link

Is Chinese AI Startup Really A ‘Disruptor’?

admin — Tue, 28 Jan 2025 07:42:57 +0000

New Delhi:

There is a new kid on the Artificial Intelligence-driven chatbot / Large Language Model (LLM) block, and it is threatening to blow the rest out of the water. Meet DeepSeek, developed by a Hangzhou-based research lab with a fraction of the budget (if you believe the reports) used to make ChatGPT, Gemini, Claude AI, and others created by United States-based computer labs.

And the latest offerings – DeepSeek V3, a 671 billion parameter, ‘mixture of experts’ model; and DeepSeek R1, an advanced reasoning model that uses AI, possibly better than OpenAI’s 01 – have underlined its status as a potential heavyweight financial and technological disruptor in this field.

How much of a disruptor is it?

As of Monday DeepSeek V3 is the top downloaded app on the Apple Store in the US; let that sink in… a Chinese-developed chatbot is now the most-downloaded app in the US.

And that disruption, even if seen as a ‘potential’ one at this time, has raised doubts about how well some US tech companies have invested the billions pledged towards AI development.

READ | DeepSeek Questions US Big Tech’s Billion-Dollar Spending

Either way, the quality and cost efficiency of DeepSeek’s models have flipped this narrative; even if, in the long run, this particular Chinese model flops, that it was developed with a fraction of the financial and technological resources available to firms in the West is an eye-opener.

Again, how much of a disruptor is it?

Well, last month DeepSeek’s creators said training the V3 model required less than $6 million (although critics say the addition of costs from earlier development stages could push eventual costs north of $1 billion) in computing power from Nvidia’s H800 chips, a mid-range offering. “Did DeepSeek really build OpenAI for $5 million? Of course not,” Bernstein analyst Stacy Rasgon told Reuters.

But break down the available financials and it gets quite remarkable.

OpenAI’s 01 charges $15 per million input tokens.

DeepSeek’s R1 charges $0.55 per million input tokens.

The pricing, therefore, absolutely blows the competition away.

And, depending on end-use cases, DeepSeek is believed to be between 20 and 50 times more affordable, and efficient, than OpenAI’s 01 model. In fact, logical reasoning test score results are staggering; DeepSeek outperforms ChatGPT and Claude AI by seven to 14 per cent.

Dev.to, a popular online community for software developers, said it scored 92 per cent in completing complex, problem-solving tasks, compared to 78 per cent by GPT-4.

Input tokens, by the way, refer to units of information as part of a prompt or question. These are basically what the model needs to analyse or understand the context of a query or instruction.

For context, OpenAI is believed to spend $5 billion every year to develop its models.

So, even if DeepSeek’s critics (see above) are right, it is still a fraction of OpenAI’s costs.

This translates, as company boss Sam Altman pointed out, into significantly enhanced computing capabilities, but for the DeepSeek model to deliver at least that much processing power on its relatively shoestring budget is an eyebrow-raiser.

And Mr Altman acknowledged that, calling the R1 model “very impressive”.

Google boss Sundar Pichai went one step further, telling CNBC at Davos, ” I think we should take the development out of China very seriously.” And US President Donald Trump sounded a “wake-up” call.

And there are the hundreds of billions of dollars that US companies have lost amid a rout this week in tech stocks; chip-maker Nvidia, for example, lost over $600 billion and the tech-rich Nasdaq index finished Monday down by more than three per cent, with the unwelcome possibility of a further drop based on AI giants Meta and Microsoft’s expected earnings reports.

READ | Nvidia Loses Nearly $600 Billion As DeepSeek Jolts Tech Shares

For context, Meta and Microsoft both have their own AI models, at the forefront of which are Llama and Copilot; the former is a LLM that was first released in February 2023 and the latter is now an integrated feature in various Microsoft 365 applications, such as MS Word and Excel.

While neither is, arguably, on the same tech level as OpenAI or ChatGPT, Meta and MS have invested billions in AI and LLM projects, both in the US and abroad. For example, some analysts believe big US cloud companies will spend $250 billion this year on AI infrastructure alone.

But what really makes DeepSeek special is more than the cost and technology.

It is that, unlike its competitors, it is genuinely open-source.

The R1 code is completely open to the public under the MIT License, which is a permissive software license that allows users to use, modify, and distribute software with few restrictions.

This means you can download it, use it commercially without fees, change its architecture, and integrate it into any of your existing systems.

DeepSeek is also faster than GPT 4, more practical and, according to many experts, even understands regional idioms and cultural contexts better than its Western counterparts.

There is much more consider.

How, for example, does DeepSeek affect diplomatic and military ties between China and the US (and India also, actually), and what are the ethical problems with truly open-source AI models?

But what is undeniable is that China’s DeepSeek is a disruptor. And experts believe China has now leapfrogged – from 18 to six months behind state-of-the-art AI models developed in the US.

Meanwhile, DeepSeek’s success has already been noticed in China’s top political circles.

On January 20, the day it was released to the public (and also the day Trump was sworn in as President of the US), founder Liang Wenfeng attended a closed-door symposium for businessman and experts hosted by Chinese Premier Li Qiang. His presence has been seen as a sign DeepSeek could be important to Beijing’s policy goal of achieving self-sufficiency in strategic industries like AI.

With input from agencies

NDTV is now available on WhatsApp channels. Click on the link to get all the latest updates from NDTV on your chat.

Source link