Supercomputer In A Soda Can – The NanoCluster packs 100GB RAM in an Ultra-Compact Design

The Evolution of Supercomputers: From Massive Rooms to Handheld Devices

In the past, supercomputers were enormous machines the size of an entire room, emitting continuous hums beneath liquid-cooled floors. They were primarily used for processing seismic data or simulating complex nuclear physics. These machines were extremely intricate and required massive infrastructure to support them.

Over time, with technological advancements, we have witnessed a radical transformation in the size and capabilities of these computers. Today, devices the size of a soda can offer processing power of up to 28 cores. Additionally, they have memory exceeding 100 GB. This shift reflects significant progress in processor design and embedded technologies. It enables substantial computing power within a very small footprint.

Modular Design and the Raspberry Pi Model

This type of device relies on the concept of modular power, where the system is built from multiple units such as Raspberry Pi Compute Modules (CM4 or CM5). These units use ARM architecture and are integrated into a compact, easily expandable setup. It resembles the flexibility of LEGO blocks, making it easier for system administrators to manage and customize according to needs.

Each compute unit is equipped with an elegant and simple M.2 expansion board, supporting up to 16 GB of memory and 4 processing cores. When seven units are combined, the theoretical performance can reach up to 112 gigaflops. This level of performance may surpass some modern personal computers like the MacBook Air with an M2 processor. It is especially true in tasks benefiting from parallel processing.

Power Management and Cooling: Challenges of Performance and Efficiency

The system consumes power through two main methods: either via a USB-C port using a 65W GaN charger, or through Power over Ethernet technology (PoE++) which provides up to 60W. This power distribution is closely linked to the cooling system. Thus, making the balance between performance and device temperature a real challenge.

Under heavy processor load, such as running the stress-ng –matrix 0 test on all six or seven units, the system approaches the limits of its allocated power budget. This overload causes performance throttling or even instability in some nodes. In these cases, temperatures rise above 85°C, and the fan operates at maximum speed. Consequently, it produces a loud noise of around 58 decibels, comparable to a jet engine. This indicates that the system is functional but far from completely quiet.

Advantages of Network Control in the System

One distinctive feature of this design is the inclusion of a managed RISC-V network switch hidden beneath the main board. This switch supports advanced networking features such as VLANs, port switching, and controller access. Although the current user interface is stuck in Chinese and faces some issues when accessed via browser, the ability to fully control the network behavior of the cluster from any node is considered an advanced technical advantage.

It is worth mentioning that the system’s total power consumption ranges between 20 and 70 watts depending on the load. This reflects a good balance between performance and energy efficiency.

Network Performance and Its Impact on Distributed Tasks

The single uplink port with a speed of 1 Gbps represents a bottleneck when handling heavy data loads. This potentially limits performance and makes the operation of distributed storage systems like Ceph over this network a significant challenge.

However, each node within the system enjoys full 1 Gbps access. This is sufficient to meet the needs of most hobbyist Kubernetes distributions. It also supports workloads in distributed AI fields such as Llama models. This connection can also be utilised for continuous integration and continuous delivery (CI/CD) workflows. Many use tools like distcc for this purpose.

The system shows a notable improvement in the execution time of complex tasks. For example, the full compilation time of an operating system kernel decreases from 45 minutes to 22 minutes when running four nodes concurrently. This demonstrates the effectiveness of parallel processing within this configuration.

Design and Flexibility: A Platform Tailored for Hobbyists and Experimenters

The board is clearly designed to meet the needs of hobbyists and tinkerers, with every component reflecting a deliberate decision focused on flexibility and expandability. These features include M.2 adapters, USB-C ports, and support for NVMe SSD drives. Additionally, it has a smart backup power system that allows switching between PoE and USB-C power delivery depending on the actual load requirements.

However, the system is not ready for immediate plug-and-play use. You will need to manually load the operating system images. Also, you will need to understand the power consumption limits and possibly adjust a custom fan control script. Since the system does not operate completely smoothly out of the box.

Yet, these challenges are part of the system’s appeal; it opens the door for hands-on learning and experimentation. This makes it an ideal platform for users who wish to delve deeper and understand the intricacies of distributed computing and control systems.

Educational Value in a Small, Low-Cost Device

The NanoCluster system is offered at a price ranging between $50 and $150 depending on the configuration. Thus, making it an economical choice that opens wide horizons for hands-on experimentation without the burden of the high costs associated with enterprise-grade equipment.

Although it is not suitable for everyone—especially those who feel uncomfortable with technical troubleshooting tasks such as dealing with UART headers or reading complex technical documentation—it represents an ideal learning environment. It is especially suitable for developers, educators, and hobbyists eager to explore the concepts of distributed computing.

This device is not expected to replace a traditional workstation, nor is it intended for cryptocurrency mining, producing high-quality animations, or hosting massive databases. Its primary purpose is learning and experimentation.

In an era where we can easily carry large language models (LLMs) on our portable devices, owning a compact supercomputer the size of a soda can for practical experiments seems like a logical and future-ready step in the field of computing.

ArchUp | Site

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *