The Open AI and DeepSeek rivalry - What will happen next?

When AI models start recycling AI generated data, what happens then?

By Markus Backman

January 27th, 2025

DeepSeek-R1, a Chinese AI model took the world by storm on January 20th, 2025
OpenAI accuses DeepSeek of stealing its model through "distillation," but DeepSeek denies the claims
OpenAI itself faces criticism for how it obtains its own training data, including lawsuits from media companies
As AI models become commonplace, they may start recycling already AI generated content, leading to AI dominated content spreading
The technology war between USA and China is just beginning, and we'll probably se more to come soon

This is what happened

We know about DeepSeek-R1 — the AI model created by a small Chinese company called DeepSeek that took the world by storm on January 20, 2025.

The young company, founded less than two years ago, launched DeepSeek-R1 successfully. It's a ChatGPT-style, "groundbreaking" service utilizing an LLM (Large Language Model).

What's notable about this case is that DeepSeek claims to have developed it with a very small budget of $6 million USD — a tiny fraction of what it cost to create OpenAI's ChatGPT.

They say they achieved this by using far fewer advanced chips, which led to Nvidia's stock plummeting by $600 billion in market value. It became the biggest one-day loss in U.S. history.

However, experts are questioning whether DeepSeek's methods are legitimate or if they copied data from OpenAI, the company behind ChatGPT.

ChatGPT has seen huge success with its GPT-4 and o1 models. Like o1, R1 is a chatbot where users can have semantic discussions, asking about anything.

DeepSeek's R1 uses significantly fewer resources (mainly energy) than its competitors, making it a so-called "revolutionary" alternative.

OpenAI reportedly claims it has evidence China’s DeepSeek ‘used its model’ to train AI chatbot

Shortly after the release of DeepSeek's latest version on January 20, it became the most downloaded application in Apple's App Store in the U.S.

Consequently, DeepSeek is now under scrutiny regarding the methods it used to train its language model. Many are questioning whether DeepSeek secretly copied OpenAI, despite OpenAI's claim that its data is secure.

DeepSeek denies these allegations, stating that they achieved their results by using cheaper chips (processors) that consume fewer resources.

The emergence of DeepSeek has led investors to reassess the foundations of the U.S. stock market boom, which has been largely driven by the belief that AI advancements require massive computing power. If DeepSeek’s claims are true, that assumption may no longer hold.

OpenAI, on the other hand, has accused DeepSeek of using its proprietary models to train DeepSeek's chatbot. They allege that DeepSeek employed a technique known as "distillation," where one AI model learns from another by repeatedly querying it to replicate its responses. OpenAI claims this violates its terms of service.

That said, I'm not endorsing DeepSeek either. The point is that all LLM-based platforms are trained using data created by people. So, is it not hypocritical to accuse a competitor of "stealing" data?

Is Open AI a hypocrite?

OpenAI reportedly claims DeepSeek "distilled" data to train its own model. While this may be true, we shouldn't ignore how OpenAI obtained its data in the first place.

AI training data doesn’t just appear out of nowhere. It’s pulled from resources around the web and other databases, where humans create content and information that is provided.

OpenAI has faced significant criticism over how it gathers data. For instance, it is currently in early hearings for a lawsuit filed by The New York Times, in which media companies allege OpenAI using their data from their intellectual property without authorization.

That said, I'm not endorsing DeepSeek either. The point is that all LLM-based platforms are trained using data created by people. So, is it not hypocritical to accuse a competitor of "stealing" data?

What happens next??

Aside from the political consequences to all of this such as a technology war between the U.S. and China, we need to talk about something else that may happen due to the emergence of AI technology like DeepSeek and CHAT-GPT. It lies more in the social impact they have on our society.

There's a lot of speculation about what's next. This is probably only the beginning of what we'll see with LLMs (Large Language Models), and we’ll continue to see more cases like this as more competitors enter the game, and it will undoubtedly become cheaper to develop AI models like ChatGPT and DeepSeek.

We need to remember that training these AI models still requires data initially created and monitored by humans, but when this data is recurringly being fed to these AI models, they'll start recycling it.

Now, if AI keeps training on AI, we could end up in a loop where everything is just recycled, making it difficult to determine the original author behind any content, and problems could follow.

If AI keeps training on AI, we could end up in a loop where everything just keeps getting recycled, and it’s hard to tell who is the original author behind things. Information could start becoming biased and eventually we wouldn't differ what is human made or not.

More and more people are using chatbots for just about everything. Can this be risky for us humans if we keep relying too heavily on AI generated things?

Information could become biased, and eventually, we might struggle to distinguish between human-made and AI-generated content. And if AI would keep building on AI would , wouldn't there be a risk of a lack of innovation and incentive to developing new content as the distinction of AI generated and human made creations would become blurred?

Contact us

Get in touch and let's discuss your business case

Submitting this form will not sign you up for any marketing lists. Your information is strictly used and stored for contacting you. Privacy Policy

Related posts:

All posts