The Open AI and DeepSeek rivalry - What will happen next?

When AI models start recycling AI generated data, what happens then?

By Markus Backman

January 27th, 2025

  • DeepSeek-R1, a Chinese AI model took the world by storm on January 20th, 2025

  • OpenAI accuses DeepSeek of stealing its model through "distillation," but DeepSeek denies the claims

  • OpenAI itself faces criticism for how it obtains its own training data, including lawsuits from media companies

  • As the popularity of these AI models are increasing, they could start recycling their own creations, leading to a spreading of biased information

  • The technology war between USA and China is just beginning, and we'll probably se more soon

This is what happened

We know about DeepSeek-R1 — the AI model created by a small Chinese company called DeepSeek that took the world by storm on January 20, 2025.

The young company, founded less than two years ago, launched DeepSeek-R1 successfully. It's a ChatGPT-style, "groundbreaking" service utilizing an LLM (Large Language Model).

What's notable about this case is that DeepSeek claims to have developed it with a very small budget of $6 million USD — a tiny fraction of what it cost to create OpenAI's ChatGPT.

They say they achieved this by using far fewer advanced chips, which led to Nvidia's stock plummeting by $600 billion in market value. It became the biggest one-day loss in U.S. history.

However, experts are questioning whether DeepSeek's methods are legitimate or if they copied data from OpenAI, the company behind ChatGPT.

ChatGPT has seen huge success with its GPT-4 and o1 models. Like o1, R1 is a chatbot where users can have semantic discussions, asking about anything.

DeepSeek's R1 uses significantly fewer resources (mainly energy) than its competitors, making it a so-called "revolutionary" alternative.

OpenAI reportedly claims it has evidence China’s DeepSeek ‘used its model’ to train AI chatbot

Shortly after the release of DeepSeek's latest version on January 20, it became the most downloaded application in Apple's App Store in the U.S.

Consequently, DeepSeek is now under scrutiny regarding the methods it used to train its language model. Many are questioning whether DeepSeek secretly copied OpenAI, despite OpenAI's claim that its data is secure.

DeepSeek denies these allegations, stating that they achieved their results by using cheaper chips (processors) that consume fewer resources.

The emergence of DeepSeek has led investors to reassess the foundations of the U.S. stock market boom, which has been largely driven by the belief that AI advancements require massive computing power. If DeepSeek’s claims are true, that assumption may no longer hold.

OpenAI, on the other hand, has accused DeepSeek of using its proprietary models to train DeepSeek's chatbot. They allege that DeepSeek employed a technique known as "distillation," where one AI model learns from another by repeatedly querying it to replicate its responses. OpenAI claims this violates its terms of service.


That said, I'm not endorsing DeepSeek either. The point is that all LLM-based platforms are trained using data created by people. So, is it not hypocritical to accuse a competitor of "stealing" data?


Is Open AI a hypocrite?

OpenAI reportedly claims DeepSeek "distilled" data to train its own model. While this may be true, we shouldn't ignore how OpenAI obtained its data in the first place.

AI training data doesn’t just appear out of nowhere. It’s pulled from resources around the web and other databases, where humans create content and information that is provided.

OpenAI has faced significant criticism over how it gathers data. For instance, it is currently in early hearings for a lawsuit filed by The New York Times, in which media companies allege OpenAI using their data from their intellectual property without authorization.

That said, I'm not endorsing DeepSeek either. The point is that all LLM-based platforms are trained using data created by people. So, is it not hypocritical to accuse a competitor of "stealing" data?

What happens in the future?

There's a lot of speculation about what's next. Personally, I think this is just the beginning of LLMs (Large Language Models), and we’ll continue to see more cases like this as more competitors enter the game.

As this trend continues, it will undoubtedly become cheaper to develop AI models like ChatGPT and DeepSeek.

One of the biggest risks is how LLMs obtain their training data. A large part of it comes from web scraping, which means pulling information from websites, articles, blogs, forums, and other open-access sources.


If AI keeps training on AI, we could end up in a loop where everything just keeps getting recycled, and it’s hard to tell who is the original author behind things. Information could start becoming biased and eventually we wouldn't differ what is human made or not.


More and more people are using chatbots for just about everything. At some point, AI-generated content will start getting recycled. Then what?

Initially, training these AI models required data created by humans, but these language models may start copying data that was created by AI. If AI keeps training on AI, we could end up in a loop where everything is just recycled, making it difficult to determine the original author behind any content.

Information could become biased, and eventually, we might struggle to distinguish between human-made and AI-generated content.

There's also a political consequence to all of this. A technology war has already emerged between the U.S. and China, and I find it hard to believe this rivalry will de-escalate anytime soon.

Contact us

Get in touch and let's discuss your business case

Email to sales@ikius.com or send us a message here.

Submitting this form will not sign you up for any marketing lists. Your information is strictly used and stored for contacting you. Privacy Policy