CM4 Cluster Node with NVMe 512GB

This cluster node is pretty much identical to the original CM4 node with the exception that it has the additional NVMe storage onboard. It's based on the Waveshare carrier board. Its main reason for existence is to run the OpenSearch based Intrusion Detection System (IDS) but also general container duties.

The essential difference between this board and the original is the inclusion of support for NVMe hardware (18).

Samsung NVMe drive attached

20220121: Samsung NVMe drive over heats causing node failure. Touching it "burnt" the finger. The nice white sticker on the memory chip effectively insulates the chip preventing heat from escaping. It was removed and a heatsink added. Unfortunately the only heatsink on hand was 5mm high preventing the carriage from being removed. Adding it was a one way trip. So far so good.

Heatsink added to Samsung NVMe after sticker removed.

20220503: The cheap and cheerful NVMe drive dies.

May 03 01:14:34 localhost kernel: [   64.480292] nvme nvme0: I/O 12 QID 0 timeout, disable controller
May 03 01:14:34 localhost kernel: [   64.588242] nvme nvme0: Device shutdown incomplete; abort shutdown
May 03 01:14:34 localhost kernel: [   64.588624] nvme nvme0: Identify Controller failed (-4)
May 03 01:14:34 localhost kernel: [   64.588636] nvme nvme0: Removing after probe failure status: -5

Originally purchased on eBay, the ad states there are no returns and the warranty is 30 days (which is about how long it ran). Also from the sticker covering the hot bits, this unit take 3.3 volts at 1.6 amps or 5.28 watts which is why it gets super hot.

The fact that this drive takes that amount of power calls in to question the viability of having NVMe on the board at all. In general Pi type devices are power constrained by the onboard electronics so that powering everything from the MacBook Pro's giant mega usb-c adaptor didn't really have any effect. So 9 watts for the CM4 itself, 5.3 watts for the NVMe, add in a USB keyboard and monitor to light up the graphics and bingo 15+ watts or 3 amps. There seems to be plenty of discussion around this topic out there on the web. For now however, a new lower capacity replacement NVMe is on order, which includes a heatsink and hopefully uses less power (and doesn't get so hot) - more when it arrives.

On the whole I'd have to warn people against buying this cheap and cheerful option as I did.

The ultimate failure is somewhat unexplained. Searching up the error that crops up at boot time reveals this is a known failure mode. I was unable to find a definitive answer as to what happened precisely but nobody was able to recover that I could find. Placing the NVMe in the big Ryzen failed in pretty much the exact same place, so a completely different CPU plus everything else, pointing to a dead dead drive.

Nevertheless while it was working, during its short life, it was pretty good and stable after the heatsink was added.

20220507: Replacement NVMe drive arrives.

Being a much smaller capacity unit this guy consumes waaaaay less power. Nevertheless the sticker was removed and placed on the back.

Also noteworthy was the recommendation to use ReiserFS. It was tried here since it's still available as a kernel module in Ubuntu. Unfortunately it crashed leading to abandoning the KingSpec recommendation.

ReiserFS crashes the node dead! Goodbye ReiserFS.

It was pointed out by Greg that Hans Reiser, the code's author, was convicted of murdering his wife which ultimately led to the demise of the file system.

Ultimately XFS produced the goods with astonishing speeds recorded.

The heatsink from the old failed NVMe drive was attached (not pictured) but that was only because it was there. This drive doesn't need a heatsink at all and never gets hot. This unit is recommended.

This node has been added to the Docker Swarm as hostc.localdomain bringing the total to 15.

 
NVME software support

Software support is provided by the standard Ubuntu Server 20.04 LTS 64 bit image installed in the usual way. The device shows up as /dev/nvme0 and /dev/nvmen1.

crw-------  1 root root    235,   0 Jan 10 15:56 nvme0
brw-rw----  1 root disk    259,   0 Jan 10 15:56 nvme0n1

The NVMe has been formatted as XFS on the basis of this review. Otherwise there has been no previous experience with this filesystem.

Where the overall winner is said to be XFS.

 

CM4 8GB before heatsink mounting - no support screws

The connector specification document claims the connector assembly to be "locking" which requires a certain force to release. This force appears to be enough to hold the CM4 in place, so no mounting screws have been used. Clearly if your project has any vibration involved secure mounting needs to be used.

Software compatibility issues

This board makes use of the FE1.1S 4 port hub chip shown as (17) in the above image.

Unfortunately it was not possible to get this running with a standard keyboard plugged in to the USB port. A support request to Waveshare was quickly dealt with, suggesting that operating system did not have a driver for the chip. Support was very prompt. However this author has not been able to determine if that chip has a driver in Ubuntu Server 20.04 LTS or not. This is still an ongoing issue, but it's now somewhat moot as the OS runs very well in headless mode.

UPDATE
Fixed: It's necessary to use different settings in /boot/firmware/config.txt to enable the different USB controller on the CM4 compared to the Pi4.

# Enable USB port on carrier board
#
dtoverlay=dwc2
#dr_mode=host
otg_mode=1

reference: https://github.com/raspberrypi/firmware/issues/1500

This is the subject of considerable debate and time wasting by thousands of people.....

The software application this node runs

Obviously the alarms went off...... https://www.youtube.com/watch?v=OWwOJlOI1nU