The UCDavis Presents KiloCore, The World's First Processor With 1,000 Cores

University of California, Davis

When processing power is all you want, there could never be enough of it. Modern computers no matter what size they are today, need a lot of computing power to multitask or do complex calculations such as encryption.

When we think we already have enough firepower on our hands to do what we want our computers to do, there is a chance that we can go even beyond the ordinary.

Looking back at briefly at its past, there was a time when one core CPU wasn't enough. We then created two. When two wasn't again enough, we created three, then four, six, eight, and so on. Now those numbers went far much further up when a team of researchers at the University of California, Davis, has designed a microchip with 1,000 processors with each capable to run independently.

The achievement of this processing power is called the KiloCore, first presented at the 2016 Symposium on VLSI Technology and Circuits in Honolulu on June 16th.

According to the paper, titled A 5.8 pJ/Op 115 Billion Ops/sec, to 1.78 Trillion Ops/sec 32nm 1000-Processor Array by Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, Bevan Baas, at 0.84V the array has 12 memory cores. Its 1000 cores can execute 1 trillion instructions/sec while dissipating 13.1 W.

With those specs in hands, the team believes KiloCore that contains 621 million transistors, is the highest clock-rate processor ever designed in a university.

"To the best of our knowledge, it is the world's first 1,000-processor chip and it is the highest clock-rate processor ever designed in a university," said Bevan Baas, professor of electrical and computer engineering, who led the team of graduate students in the UC Davis Department of Electrical and Computer Engineering that designed the chip architecture.


To reach such high computing power, KiloCore was made using IBM's older 32nm CMOS in PD-SOI technology where each core can run independently. The cores can also change their frequency, and has a separate router circuit infected with own clock. Overall, the chip has 2012 oscillators (1000 cores 1000 routers plus 12 SRAM blocks)

The 14nm technology has advanced to raise the possibility of many-core processors finding their way into mobile devices that are usually much smaller in size. But in terms of raw power, they're not universally helpful because many tasks are better be served by fewer core processor but faster ones.

Another reason for the team to choose IBM's older technology is because the processor is able to shut down when not in use. The cores operate at an average maximum clock frequency of 1.78 GHz, transferring data directly to each other rather than using a pooled memory area that can become a bottleneck for data.

While other multiple-processor chips have been created before, none have exceed about 300 processors, according to an analysis by Baas' team. Only a few were created, and mostly used for research purposes. And out of that few, only some are sold commercially.


While its ability is already an achievement, KiloCore is also efficient in term of energy usage. Baas said that the chip is the most energy-efficient "many-core" processor ever reported. When running at its efficient level, the 1,000 processors can execute 115 billion instructions per seconds while using only 0.7 watts. That is low enough for KiloCore to be powered by a single AA battery.

Also in order to save energy, the KiloCore waived explicit caches. There are in each core 128x40-bit instruction memory, 256x16-bit data storage and twelve times the mentioned 64 Kbytes of SRAM., a contrast to most modern 14nm chips.

"Each processor issues one in-order instruction per cycle into its 7-stage pipeline from either its 128x40-bit local instruction memory or an independent memory module," the paper explains.

"Communication on-chip is accomplished by a high-throughput circuit-switched network and a complementary very-small-area packet-switched network. The source-synchronous circuit-switched network supports communication between adjacent and distant processors, as resources allow, with each link supporting a maximum rate of 28.5 Gbps," according the paper.

This dissipates 16 percent less energy than fetching instructions from local memory. The researchers even go to an extent to say that their chips execute instructions more than 100x more efficiently than a modern laptop processor.

The researchers also say that "each processor core can run its own small program independently of the others," which is more flexible than a SIMD (Single-Instruction-Multiple-Data) approach utilized by GPUs. The idea is to break an application down into many smaller pieces, each of which can run in parallel on different processors, enabling high throughput with lower energy use, Baas said.

So if we think that our supercomputer is capable to do whatever we want, KiloCore is like a new life that breathe new possibilities. From video processing, wireless coding/decoding, cryptographic functions to manipulating parallel scientific data, datacenter record processing and simulating nuclear test, 1,000 core is indeed plenty.

The project of the KiloCore chip at the university was funder by the U.S. Department of Defense. The team also has completed a compiler and automatic program mapping tools for use in programming the chip.