DeepSeek is Not a Sputnik moment, It is a Model T Moment

by Malcolm Murray

As someone who thinks about AI day-in and day-out, it is always fascinating to see which events in the AI space break out of the AI bubble and into the attention of the wider public. ChatGPT in November 2022 was of course one. The podcast-creating ability of Google’s NotebookLM almost got there, but didn’t quite reach the “getting-texts-from-grandma” level of virality. But this week, with DeepSeek’s launch of its R1 model, we had another event at the ChatGPT level that again resulted in questioning texts from spouses and colleagues.

There have already been a thousand takes on this and I apologize in advance if you’re already sick of the subject. However, I hope this piece can give you what Brad DeLong calls Value Above Replacement, since it shows where the myriad takes fit in the current broader narratives. I also shine a light on the “Model T” aspect, that I feel has been somewhat overlooked.

First, there is the geopolitical take, or what we can more formally call the delta between US and China. This is why Marc Andreessen and others referred to DeepSeek as a “Sputnik moment”. The long-standing assumption was that China was 1-2 years behind the U.S. in developing AI models. This assumption shattered this week; it turns out China is only a few months behind. This also relates to the long-cherished view in the U.S. of China being solely a fast follower, only able to copy the U.S., which the DeepSeek engineers put an end to by pioneering some very smart machine learning techniques, such as greater efficiencies from better use of Mixture of Experts (MOE) and Multi-Head Latent Attention (MLA). So it makes sense that this would be a shock to many Americans. However, the Sputnik analogy per se doesn’t fully make sense. Given all the focus the U.S. already has on AI and the huge investments it is already making, it is unclear how this “Sputnik moment” would change things. Trump, Altman and co just announced $500 billion in funding for Stargate, so what are they going to do as a DeepSeek response, announce another $500 billion? That seems a bit hard given that most of the money in the Stargate announcement was already committed years ago and the rest of it might be vapor dollars that do not actually exist.

Second, there is the trade policy take, or more formally questioning the efficacy of export controls. Many took DeepSeek’s prowess as a sign that the chip export controls have not worked. This is one of the few areas of overlap between the Biden and Trump administrations and Trump was expected to keep Biden’s restrictions in place. This assumption was also shattered for many people, saying “look, it backfired, we forgot that necessity is the mother of invention – this only made China more likely to innovate”. This is a natural, but incorrect take. First, it shows an incorrect understanding of the timelines, as most of the chips DeepSeek used were acquired before the restrictions were put in place. It also erroneously somehow assumes that DeepSeek wouldn’t have been able to reach even better performance with more chips and wouldn’t give anything for more and better chips. Chips will clearly continue to matter. If anything, DeepSeek shows that the LLM triad of data, compute and algorithms is still very much in play and we are not at risk of running out of any of the three anytime soon (DeepSeek also used a lot of synthetic data).

Third, there is the economic take, or rather the delta between closed source models and open-source models. Valuations for all companies in any part of the AI value chain, both listed ones such as Nvidia, and private ones such as OpenAI, have skyrocketed over the past years. Part of their sky-high valuations lies in the assumptions that they will eventually have monopolies and moats. According to this take, for Nvidia, the allegedly much smaller training cost for R1 suggests that fewer chips will be needed in the future, and for OpenAI, the fast replication of their models’ capabilities suggests that they will not be able to charge high margins for their products. This take is likely also flawed. For the public companies, after a long period of rising stock prices, many asset managers likely appreciated a moment to take some gains. There were also rumors that what really rattled the market was a leak of an upcoming Trump threat of tariffs on Taiwan. OpenAI’s valuation doesn’t seem to have taken much of a hit, if recent rumors of it doing a new funding round of $40 billion are correct. Even as its lead on models is withering, its efforts to turn itself into a product company seem to be successful. Even if the AI models are commoditized, a dominant position in the AI enterprise market would still guarantee good OpenAI margins.

Fourth, there is a product take. Some have argued that what matters in the DeepSeek story is product design choices. More specifically, since this is the first “grandma text” moment in AI in a while, most of recent AI development has taken place under the radar. This is therefore the first time the average AI user experiences models displaying their Chain-of-Thought process in real-time. This is seen as a factor to explain the unexpected popularity among the wider public of the DeepSeek app specifically. This could also be connected to the interesting trend of extremely popular Chinese apps in the U.S. It really was quite bizarre to see the Tiktok-RedNote migration, with users in the heartland of the U.S. flocking to an app named after Mao and full of indecipherable language and cultural references.

Finally, there is the simple glee take. All those who have loudly complained about Silicon Valley tech bros and their newly found clout in the US administration were happy to see Sam Altman and company get a bloodied nose. Similarly, there was glee at the upset for the Trump administration. Heady with its new powers, the Trump administration has been issuing Executive Orders left, right and center (actually, mostly far right of center), and people seem happy to see anything that threatens its confidence. This take probably has some explanatory power given the strong emotions at play.

What I would like to highlight, however, is one aspect that I think has gone somewhat underappreciated – that of impending AI ubiquity. The release of R1 could be more of a “Model T moment” than a Sputnik moment. Ford’s release of the Model T was the beginning of the automobile becoming a mass-market product. This might prove to be a similar inflection point that leads to AI with superhuman reasoning capabilities becoming ubiquitous. R1 is a model in the new paradigm of “reasoning models”, with above-human capabilities on tasks like math and coding, which we had not seen released as open-source before this. Further, it was ridiculously cheap to train and is ridiculously cheap to run. People have been arguing over the exact numbers, and of course the total cost of training the model was not solely $6 million. However, this misses the larger point that whatever the cost was, it was an order of magnitude lower than comparable other models. Whether that cost advantage came about through more legally-questionable methods such as distillation or purely through engineering brilliance does not really matter, since DeepSeek demonstrates that both of these will be present in droves in the future.

The reasoning models are the beginning of the new paradigm of inference-time compute, where the model uses computing power at the time of generating answers in addition to during training. This paradigm is just getting started. The fact that a small upstart company, using an older generation of chips, can replicate state-of-the-art performance in only a few months suggests that we are in for a wild ride for the next few years of very capable models proliferating. Further, the fact that using inference on R1 is also much cheaper than running inference on OpenAI’s o1 suggests that we will soon have intelligence “too-cheap-to-meter” running on every device. This ubiquitous intelligence might be good news for the Global South, India for example seems to have taken notice. It is also good for smaller AI companies and, of course, for scientific progress as a whole. But it remains to be seen if it is good news for the species that rose to the top of the food chain based on its one big advantage – widely distributed, unsurpassed intelligence.

Enjoying the content on 3QD? Help keep us going by donating now.