Rock64 SBC: Occasional crashes at high loads

Hi!

I recently purchased a Rock64 (4GB version) and a “Modell B” Premium Aluminium Casing. I successfully installed DietPi on a 16GB eMMC. There is currently no other HW attached to the SBC.

uname -a
Linux DietPi 4.4.180-rockchip64 #1 SMP Thu Jun 6 08:08:17 CEST 2019 aarch64 GNU/Linux

Unfortunately, the SBC crashes occasionally at higher loads (freezing console, SBC no longer pingable). When idleing, the SBC runs stable over days.

I attached another computer to the SBC serial console to log the kernel output when the crashes occur. I simulated load by the following command:

stress -c 4 -m 1 -t 5m

The crash occurs at very different times, sometimes earlier, sometimes I have to run the comand above multiple times. My very first impression was a thermal problem in the small housing without convection. But the peak temperature is well below 80° C.

Here are two logs from the serial console:

  1. https://pastebin.com/Mbfirn2d
  2. https://pastebin.com/LyZX6wnX

Unfortunately, I’m not an expert in reading kernel logs. I see lines with “BUG” or “Oops” and wonder if they are linked to the crash.

Can anyone help me?

Regards, rasputin

Hmm, no idea from the output as well: https://github.com/rockchip-linux/kernel/issues/67

Did monitor RAM usage as well, and dmesg for voltage or other kind of errors?

Found: https://github.com/rockchip-linux/kernel/issues/67 and https://github.com/ayufan-rock64/linux-build/issues/299
And some others… So it seems to be related to this rockchip64 kernel, sadly no response from devs at all :frowning:.

In case play around a bid with CPU governor as well, so see if it has some influence.

Hi!

I played around a little bit more with different stressors.

The kernel oops seem to be related to the modules lz4hc*, zlib or similar

The last log from the list above was produced with the latest kernel version 4.4.182.

Best regards
rasputin

rasputin
Not 100% sure but I think this line(s)
Modules linked in: lz4hc lz4hc_compress zlib rk_vcodec lzo zram ip_tables x_table
only state which modules are loaded, not necessarily that these are related to the error.

You can compare with: lsmod

I see zram… Ahh the ARMbian zRam implementation was added by some recent update of their rootfs package… This caused issues in another case already, so please try the steps provided here: https://dietpi.com/forum/t/troubleshooting-help-how-to-identify-random-reboots-and-or-non-responsive-device/3270/1

Aside from that, dmesg does not report any voltage/power related errors, does not? In case try another power supply. But not after ruling out that indeed zRam is the issue :wink:.

MichaIng

Thank you for your support!

I successfully disabled zram. This was also good for another reason, because I had frequently problems with corrupted filesystem within the ram disk, especially for /var/log.

I also followed your device to adjust the “cpu governor”. I ended up with a stable system by selecting a “conservative” throttling and reducing the maximum cpu speed to 1296 MHz (80% of hardware maximum). I ran stress tests for hours without any trouble or kernel crashes. Fortunately, cpu speed is not a critical ressource for me. It is more important to have a stable system.

Thank you very much again!

Regards, rasputin

This thread can be set as “solved”.