World’s biggest economies battle to dominate advanced processing power that will effect defence and climate modelling
Richard Waters in San Francisco
The US is about to vault into a new era of supercomputing, with a once in a decade leap forward in processing power that will have a big effect on fields ranging from climate change research to nuclear weapons testing.
But the national swagger usually prompted by such breakthroughs is likely to be muted. China passed this milestone first and is already well on the way to building an entire generation of advanced supercomputers beyond anything yet in use elsewhere.
What makes the advances all the more remarkable, according to US experts in the field, is that China’s achievement was made with local technology, after Washington blocked access to the American hardware long considered to be critical to such systems.
The build-up in China’s supercomputing program, which dates back more than two decades, has led to a “stunning situation” where the country now leads the world, said Jack Dongarra, a US supercomputing expert.
The most advanced supercomputers are used to improve simulations of highly complex systems, for instance creating better models of climate change or the effects of nuclear blasts. But their secret use in classified areas, such as defeating encryption, is likely to also make them key tools in national security, according to Nicholas Higham, professor of mathematics at the University of Manchester.
China already had more supercomputers on the Top 500 list(opens a new window) of the world’s most powerful computers than any other country — 186 compared with 123 in the US. Now, by beating the US to the next big breakthrough in the field and planning a spate of such machines, it is in a position to seize the high ground of computing for years to come.
The Chinese breakthrough has come in the race to build so-called exascale supercomputers, systems that can handle 10 to the power of 18 calculations per second. That makes them a thousand times faster than the first of the petaflop systems that preceded them more than a decade ago.
In recent months, work has been under way at the US Department of Energy’s Oak Ridge national laboratory in Tennessee to assemble and test the first of three exascale systems planned in the country. If the inevitable “bugs” are ironed out, the arrival of exascale computing in the US could be confirmed at the end of May with the publication of the twice-yearly Top 500 listing, according to Dongarra who maintains the list.
By contrast, China’s first exascale system has been running for more than a year and has since been joined by a second, according to a recent presentation by David Kahaner, director of the Asian Technology Information Program, whose research is widely cited as the most authoritative.
China has not officially disclosed that it has two exascale systems. But their existence was confirmed late last year when scientific research run using the machines was entered for the Gordon Bell prize, with one paper taking top honours in the international supercomputing competition.
The country with the most advanced supercomputers has a clear advantage in national defence over its adversaries, said Horst Simon, who until recently was deputy director of the US energy department’s Lawrence Berkeley national laboratory.
China’s decision not to officially confirm its supercomputing breakthrough is a departure from decades of history in the field, where scientists usually talk openly about their achievements and countries have been quick to claim bragging rights to the top machines. The secrecy may have been to prevent further retaliation from the US, according to experts.
Washington imposed targeted sanctions against five Chinese organisations involved in supercomputing in 2019, then followed up a year ago with another round against seven more groups. The second wave was put in place the month after China’s first exascale system had been fired up.
A previous Chinese effort to break the exascale barrier had relied on technology from US chipmaker AMD, leaving it vulnerable to US trade restrictions. In contrast, its current two exascale systems are based on domestic chip designs. The local developers of the chips used in the two giant new systems — Tianjin Phytium Information Technology and Shanghai High-Performance Integrated Circuit Design Center — were both on last year’s US sanctions list.
“I think it’s quite impressive that they were able to put in place a system based on their own technology over a very short period of time,” said Dongarra. He added that it was unclear whether the chips were manufactured in mainland China — which is still years behind in matching the world’s most advanced chip fabs — or in Taiwan.
China has been building a domestic industry around supercomputing for years, first shocking its main rivals in the US and Japan in 2000 when it unveiled what was then the world’s fastest machine. But the dawn of the exascale computing era could be a chance to grab a clearer lead.
While the US has three exascale systems in the works, China’s goal is to have 10 systems by 2025, according to Kahaner.
His research shows Chinese companies are now more focused on domestic competition than on what their international rivals are doing. As a gap opens up between the two nations, the US should consider loosening its sanctions against China’s leading national supercomputing centre at Wuxi in the hope of “a deeper glimpse into these [Chinese] systems”, according to Kahaner. Despite China’s lead in hardware, Kahaner and others point to the breadth of US capabilities as a strength, particularly when it comes to software. Half of the $3.2bn cost of the US energy department’s three exascale computers stems from a decade-long effort to write programs to run on the new computing architecture. Also, Chinese research in advanced mathematics seldom shows up in fields related to supercomputers, said Higham.
Regarding his call for greater collaboration between China and the US, Kahaner said: “Access to new systems allows experimentation, which benefits all parties. To the maximum extent possible, consistent with security and fair/balanced competition, more access is better.”
But with China yet to publicly acknowledge its new supercomputing prowess and the US still pressing for sanctions against China to try to limit its rise as a tech power, that may remain a distant hope.
China may have already crossed the exascale barrier - twice.
The country is secretly operating the two most powerful supercomputers in the world, and is the first nation to run systems capable of more than one exaflops (1018 floating-point operations per second), The Next Platform reports.
Officially, the title of the world's most powerful supercomputer is currently held by Japan's Fugaku supercomputer, capable of 442 petaflops.
Citing an anonymous source, TNP claim that The National Supercomputing Center in Wuxi is home to the Sunway “Oceanlite” supercomputer.
This system is a successor to the Sunway TaihuLight, officially China's most powerful supercomputer. In March, China tested Oceanlite to the Linpack benchmark and it hit 1.3 exaflops peak performance with 1.05 sustained performance, with a 35MW power consumption.
The new supercomputer is being used, among other things, for quantum simulation, with new research expected to be announced soon. It is thought to feature 42 million cores of Chinese chips.
At the same time, China has another supercomputer at the National University of Defense Technology (NUDT) capable of around the same performance, although its power consumption is not known.
The Tianhe-3 supercomputer is based on Phytium's FeiTeng chips, which were developed after US trade sanctions stopped China from acquiring Intel Xeon Phi processors.
The new system was also benchmarked in March. The following month, Phytium and Sunway were added to a list of Chinese companies sanctioned by the US government. Phytium was cut off from chip manufacturer TSMC.
It is not known why China has not publicly disclosed the two supercomputers, with major systems traditionally ranked twice a year by Top500.
The US is soon set to launch its own exascale system - one that it has called the first exascale supercomputer in the world. At 1.5 exaflops (likely 1.3 sustained performance), Frontier will become the new world's most powerful supercomputer. Work is currently underway installing the 29MW system. It will soon be joined in the US by the oft-delayed 1 exaflops Aurora supercomputer, and in 2023 by the 2 exaflops El Capitan system.
All these systems could pale into insignificance when compared to an even more ambitious project TNP's source disclosed - China's 'Futures' program, which hopes to develop a 20 exaflops system by 2025.
There are no greater bragging rights in supercomputing than those that come with top ten listing on the bi-annual list of the world’s most powerful systems – the Top500. And there are no countries more inclined to throw themselves (and billions) into that competition this decade than the U.S. and China.
Today, the latest results were announced (much more on those here) but notably absent, aside from the expected first exascale machine in the U.S., “Frontier” at Oak Ridge National Laboratory, are China’s results, which if published, would have shown two separate exascale-class machines.
This would have been a major mainstream news story had China decided to publicize its results – and on several fronts.
The most obvious is being first to peak and sustained exascale with double-precision floating point on the LINPACK benchmark (the metric by which supercomputing performance is gauged). Further, this would have been demonstrated on two separate systems with two separate homegrown processor and accelerator architectures. Third, this would have meant several billions in investments in supercomputing technology across two sites (hence serious commitment from the Chinese government over the long haul).
All of this would have shown that despite its own billions in technology investments in the last decade, the U.S. could not arrive first with functional performance at exascale.
Yet China kept this quiet. Well, mostly.
Instead of the press-friendly, mainstream attention HPC gets twice each year they quietly discussed the systems in papers showing real-world application performance. And also, China made sure the word got out in other ways beyond the Top500.
In late October, The Next Platform confirmed and reported that two separate exascale supercomputers – the first with such capabilities in the world – hit above both peak and sustained exascale performance according to LINPACK. Since that time, many have wondered why China would choose not to publish these results given the intensive, public rivalry to secure top system status throughout the last decade.
When we first got word of benchmark results reaching exascale back in April (the benchmark results came in in March, just before trade restrictions cracked down on those exascale facilities and vendors, incidentally), the first inklings came from a contact at a facility in China – one well known to followers of the Top500. The conversation at that time was off record and indicated displeasure that so much engineering work would not be recognized globally, which means the decision to keep results quiet was made early, if not in advance. It took another several months to get enough comprehensive information for us to publish confirmation.
Ultimately, while China might have been able to knock the long-reigning #1 “Fugaku” powerhouse in Japan out of the running, that effect too might not have the lasting impression China hoped for with these dual exascale systems.
With Every Reason to Claim Bragging Rights …
All of this reminds us of all the many reasons China would have had to publicize the results beyond the obvious – claiming the title on not just one, but two, exascale machines. This would have made China the first in the world to an HPC performance milestone that has been the subject of billions of dollars of U.S. investments over the last several years.
A public announcement via the Top500 list in either its June edition or this week would have also drawn attention to the significant material investments China has made in homegrown semiconductor, networking, and software technologies. Much more detail can be found by diving into the Sunway and Phytium architectures and manufacturing backgrounds. And while there are no “new” architectures with either exascale system, they do represent a noteworthy scalability leap, in addition to noteworthy performance in demanding HPC areas that also show the systems’ capability to do mixed-precision (good for AI/ML) and tightly-coupled FP64-driven traditional supercomputing.
Having an HPC complement to its existing large-scale compute infrastructure among companies like Alibaba, Baidu, Tencent and others in China would be another source of bragging rights. These companies are all pushing to build their own native processors, accelerators, and software ecosystems. Having the supercomputing/research side of native technologies would be further signs of strength.
On that note, China would also be able to showcase systems that can handle both general-purpose HPC as well as emerging AI. When results were released for the quantum simulation work on the Sunway system, we believe China was not just showing real-world, tightly coupled HPC performance, but also that it could handle complex mixed precision workloads, which are common in AI (FP16, Int-8, etc). In short, it would be touting both AI and simulation capabilities – a valuable aspect for all emerging large systems – and all without the conventional Nvidia or AMD GPUs as U.S. and European systems deploy for AI, low precision capabilities.
And this may seem minor to those outside supercomputing – but think about it: In addition to showing technological prowess and scalability of multiple homegrown architectures, there is also the lost ability to show the hard work on the part of teams in China, often over a thousand throughout an entire cutting-edge system coming to life (manufacturers, designers, architects, programmers, sysadmins, etc.). That these HPC professionals did not have a chance to celebrate such a milestone on the international stage is a shame. Heated disputes between nations or not, let’s not forget these are people – many of whom have spent careers working toward this coveted goal. This does matter, even if the bigger international picture obscures it.
Competitive Strategy, Perception, and Of Course, Politics
While we have not confirmed a direct, single reason, we have gathered a multitude of views over the last couple of weeks from national lab HPC leads in the U.S., Japan, and Europe, all of whom agreed the lack of publicization is unexpected and baffling but is, generally speaking, purely political. However, given the nuanced views politically and technically, we do have some ideas.
As mentioned above, there could simply be some strategic silence on China’s part for competitive purposes. The Chinese government, which backed these systems to the tune of billions of dollars (not just the design and build but ongoing facilities and power), likely had the final say in the strategic announcement (or lack thereof) of the machines.
What is most interesting is that instead of listing on the Top500, the teams confirmed the systems’ existence through Gordon Bell Prize paper submissions. For reference, this is the most coveted award in supercomputing beyond top system status via the Top500. With its submissions for the Sunway system in particular, these submissions established the machines exist and are in production as well as showcasing performance and scalability – albeit with a cherry-picked set of applications.
That establishes that China was eager to show “real-world” production and use of these systems over claiming the highly publicized top place on the Top500 and crown for first to reach exascale. In short, they get the recognition for technical merit without putting system specs out there for LINPACK or the more real-world focused benchmarks in HPC like HPCG, Graph500, or Green500.
Since China has built systems simply to game the Top500 in the past – including a directly replicated AMD-ish looking system that was later removed from list – one might say these exascale machines are a game. But not so, according to those sources we spoke with for the original story close to the benchmark results. In that case, this is legitimate, the machines are highly capable, and that means the trade war – likely a big part of this story – is also at the heart of this lack of publicizing important results.
The timing on the most recent U.S. restrictions to bar relationships with the labs and vendors behind both exascale systems came in April, a month after benchmarks were run on each system. It is unclear whether the decision to withhold reporting on the achievement was due to waiting for the June Top500 list or for other reasons, but those we spoke with suspect the real delay was to keep from being knocked off the number one spot too quickly by the U.S..
The “Frontier” machine in the U.S. was expected to appear on today’s Top500 rankings at the top of the list, well above either of China’s systems. If China listed in June or for today’s list, assuming “Frontier” had taken the slot followed by “Aurora” at Argonne (with projected 2+ exaflops peak) it would only hold top placement for a relatively short time. That’s important considering the lifespan of these large machines (five years on average) and the potential for new machines to further supplant China, pushing its systems further down the list.
The semiconductor shortage was not expected to impact big systems as much as it did and China likely did not see “Frontier” being off the November list for that reason.
One of the opinions we gathered about why China chose silence one stands out as a bit “out there” on the surface but is worth repeating: if the U.S. and Europe are hell-bent on rolling out several exascale-class systems in the next three years, and China blew its budget on being first – and on two systems to boot – it might be in its best interest to take its ball and go home. In other words, if China “won’t play Top500” anymore, which has long been a yardstick for national supercomputing competition, is that list valuable any longer?
Put yet another way, by choosing to publish prize-geared papers using the machines as a “soft announce” or running LINPACK and letting those results “accidentally” slip without ever publishing, yes China loses the big press day of the top system, but only this last time. The list as a metric is no longer international in the way it’s been for years. The tit-for-tat of top systems has bounced between the U.S. and China for years.
It’s hard to claim dominance when your only real contender won’t come to the plate.
While the Top500 has driven architectures in its decades, from around 2008, it drove competition between the U.S. and China in particular – and with a fierceness that has finally resulted in a flame-out, this time by choice.
What is clear is that China has set itself on its own nationalistic technological path. There are problems with that, not the least of which is a lack of fabs and semiconductor manufacturing prowess. All of that lies beyond its borders – for now (she said ominously). With multiple architectural options to go with, a strong hyperscale base within China to trade hardware and software tooling with, and all the political reason to stay this course for the long term, the news China didn’t make during this Top500 list is much bigger than any announcement it might have.
None of this bodes well for the future of the Top500 list, of course. While its creators have been open about its shortcomings and have built companion benchmarks like HPCG and HPC-AI, for instance, the double-precision floating point metric is less important for bandwidth-limited real-world applications. Even still, the announcement of each list has meant the world pays attention to global supercomputing and that is a big deal – especially for the national labs and organizations that rely on funding for the next big machine. The international competition, especially between the U.S. and China, has also highlighted the growing ambitions of both with HPC as a touchstone topic.
We expect that the current TaihuLight and other Chinese systems on the list will appear until they are decommissioned. And perhaps we won’t see any other top ten-class machines from China for some time, perhaps years. Not because it doesn’t have them, but because it will chose other paths to publicizing.