Dietpi on rock64, better USB3 performance than Ayufan Stretch and Armbian Bionic!

Hi,

Just wanted to share this with the Dietpi devs to see what they think,

I recently purchased a rock64 (1GB RAM) version and am very pleased with the price & performance when compared to the Raspberry PI 3B+ and also against my 3.2Ghz Intel Dual Core (Haswell) Pentium file server.

Been doing various tests on the rock64 using the Dietpi image, Ayufan’s Debian Stretch minimal image and Armbian’s Ubuntu Bionic minimal image.

One of the tests performed was to setup all 5 environments with docker and run an SABnzbd container within docker.

In all 5 environments (inlcuding the RPI 3B+ and my Intel file sevrer), download performance of the SABnzbd docker is equal (I bandwidth limited them all to 30Mbps and they all achieved that consistently throughout the download), however, the biggest difference between them is the I/O performance when unrar unpacks a download. This is what I get (Note: Exact same download in all tests):

  1. Intel Dual core Pentium file server running at 3.2Ghz (SSD connected via SATAIII) running Debian Stretch minimal - A 5GB rar set unpacks in 58 seconds.
  2. Raspberry Pi 3B+ (SSD connected via USB2) running DietPi - A 5GB rar set unpacks in 5 minutes 30 seconds.
  3. Rock64 (SSD connected via USB3) running Ayufan’s Debian Stretch minimal - A 5GB rar set unpacks in 3 minutes.
  4. Rock64 (SSD connected via USB3) running Armbian Bionic minimal - A 5GB rar set unpacks in 3 minutes.
    5) Rock64 (SSD connected via USB3) running DietPi - A 5GB rar set unpacks in 1 minute 57 seconds!

I’m very impressed with the Rock64 USB3 performance but am puzzled as to why it is so much faster when using DietPi than with Ayufan’s Stretch image and the Armbian Ubuntu Bionic image. All 3 seem to run the same kernel version.

What do you think the difference could be? I have tried my best to ensure all of these environments are the same and the same SSD drive has been used for all tests.

I’m actually now considering ditching my Intel file server and getting a dual USB3 dock for attaching 2 x large HDDs to the rock64 running DietPi. :slight_smile:

I have also compiled snapraid for the rock64 and I get 81MB/s when doing a parity sync using a USB3 connected SSD (data drive) and a USB3 connected HDD (parity drive), connected to the rock64 via a usb3 hub.

Thanks very much for maintaining a great distro!

Very interesting results, thanks for sharing.

I am also wondering about that huge difference. Lets go through some possible reasons:

  • Assure that you unrar onto disk and no accidentally into RAM, as DietPi by default uses RAMdisk for /DietPi and /var/log directories. /tmp as well, but that should be RAMdisk on nearly all Linux images. But I guess you are sure about this already.
  • As well I guess you stopped any background services, that could interfere, e.g. cron execution?
  • We mount drives by default with async and noatime modes. The first (async) leads to disk changes being held in RAM until a certain block size was reached. But this should be default on nearly all Linux distros as well. The second (noatime) is at least not included into mount defaults, and leads to file access times not being saved. This as well reduces file access times a bid, but yeah, you will only see “last modified” and not “last accessed” times :wink:. It could be checked how the defaults are on Debian/Ubuntu images and if that actually has an influence on unpacking, where only new files should be created. Also if anyhow, this settings should never have that big influence.
  • As you used docker, the SABnzbd + unrar version/setup should be the same.
  • One last thing, if cron was up, on ARMbian a minutely running cron job is placed that sets priority for every network file transfer process to real-time: /etc/cron.d/make_nas_processes_faster. The cron job itself should not have any impact, just a very quick scan and task, but not sure how the set process priorities might have an impact on other running tasks with less priorities. However, on idle this should also not influence unpacking speed in measurable way.

So finally I have no clue what might have such a huge impact. Check the above and also check htop for other running background processes that might interfere.

Hi MichaIng,

In answer to above queries:

  • Definitely extracting to disk and not RAM (I only have a 1GB RAM rock64 and the extracted file is 4.4GB in size).
  • I stopped and disabled any unused services before testing.
  • In all cases, USB3 SSD mounted using defaults,noatime
  • Same sabnzbd docker image used for all tests.
  • I have to admit to having not checked cron jobs before running the tests. I thought they would just be minimal stuff like logrotate, maybe some time sync stuff etc.

Following on from the tests in sabnzbd, I thought I would simplify the testing somewhat and just tried the test below by using the non-free version of unrar directly from the command line (docker removed from the equation) using the dietpi image and the ayufan stretch minimal image. (The Armbian SD card has been reused in another SBC today so I didn’t test Armbian again)

The test results below make me wonder how accurate the job reporting is for download speed and unpack speed in sabnzbd.

4.4GB 90 x rar file set unpacked using UNRAR 5.30 beta 2 freeware to a USB3 connected SSD, from and to the same folder on the SSD. Tests carried out straight after first boot.

SSD using ext4 filesystem and mounted using defaults,noatime

Command used for all tests “time unrar x file001.rar”

Dietpi results - Test 1

real 1m53.955s
user 0m21.585s
sys 0m32.429s

Dietpi results - Test 2

real 1m53.862s
user 0m21.681s
sys 0m32.538s


Ayufan Stretch minimal - Test 1

real 1m57.922s
user 0m21.627s
sys 0m32.276s

Ayufan Stretch minimal - Test 2

real 1m57.887s
user 0m21.561s
sys 0m32.210s

Nice results :smiley:. Jep these look more realistic. Also that for each system both results are so close, is an indication that they were accurate and not disturbed. Yey DietPi still wins, even that it’s just close :sunglasses:.

Reason when doing 4.4 GB unrar on 1 GB RAM, could be indeed the lower idle RAM usage, allowing slightly more caching and perhaps a tiny bit more free CPU time as well, due to some disabled/masked otherwise default system services? (dbus, NTP, additional TTYs and some others)

  • CPU at 100% during unpack?
  • Did the task actually fill RAM, at least with cache (yellow bar in htop)?
  • Ah in case what about swap file? DietPi pre-creates one, should be 1 GB on 1 GB system, so 2 GB overall is assured. But actually a used swap file might slow down parallel unpacking onto same disk.
  • swapiness is set to 1 on DietPi, which leads to swapfile being used in least cases. On default systems it is used more/earlier, allowing theoretically more cache use (of otherwise unused RAM), but produces disk I/O faster as well. Not sure about pros/cons in combination with unpacking.

=> So swapfile could be taken out of equation as well, disabling it on all system before doing benchmark, at least if RAM was even close to fill up and swap used.

As I seem to have the rock64 testing bug at the moment, I thought I would try the same unrar test again, same SSD etc but this time on a clean install of Arch Arm (which uses unrar 5.61 and kernel 4.18.12 with no swap and no services running)

Arch Arm results

real 1m21.725s
user 0m18.597s
sys 0m24.845s

32 seconds faster this time!

Maybe unrar 5.61 is making the difference or maybe the usb3 drivers (or filesystem caching mechanics) improved significantly between kernel 4.4 and kernel 4.18? or maybe Arch Arm is running the rock64 at 1.5Ghz by default (rather than 1.3Ghz)? Not sure how to check this.

Just ran lscpu in Arch Arm and it looks like it is running the CPU at the same max clock as Dietpi:

Architecture: aarch64
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Vendor ID: ARM
Model: 4
Model name: Cortex-A53
Stepping: r0p4
CPU max MHz: 1296.0000
CPU min MHz: 408.0000
BogoMIPS: 48.00
L1d cache: unknown size
L1i cache: unknown size
L2 cache: unknown size
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

I bet it’s the newer kernel version, I remember that there was something improved about disk I/O, perhaps the USB3 drivers on top of that.
But could have other (supporting) reasons, as Arch is a really different distro, while ARMbian, Ayufan Stretch, Rapsbian and DietPi are all based on Debian.

You should be able to check CPU clocks via:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq

Indeed on dietpi.com it’s listed with 1.3 GHz while on Arch Linux ARM page it’s listed as 1.5 GHz. On somehow official specifications I could not find something so far…

Very strange, we do just run it on default and do not change something about the CPU clock. So if it’s really 1.3 GHz on DietPi, while 1.5 GHz on default Arch install, there must have been something done. Would be interesting to find out, perhaps we can offer overclocking for Rock64 then as well.

Indication/Result of higher clocks should be of course higher temps. Would be interesting to check as well:

cat /sys/class/thermal/thermal_zone0/temp
cat /sys/devices/platform/sunxi-i2c.0/i2c-0/0-0034/temp1_input
cat /sys/class/hwmon/hwmon0/device/temp_label
  • One of these should match :wink:.

Result of cpu frequency command is:
1296000
1296000
1296000
1296000

Only the first temperature command worked:
53750

Note the above temperature is higher as the rock64 is not idle at the moment. I am now running the same docker sabnzbd container test as before.

Further feedback on this whilst testing with Arch Arm:

Tried doing some scp copying of data from the rock64 to my file server and it bombed out part way through with I/O errors. when connected to the USB3 connected SSD.

I compiled snapraid for Arch Arm and then tried to do a parity sync and that failed part way through with I/O errors when reading from the USB3 connected SSD.

I switched back to Dietpi and did another snapraid sync and I get I/O errors at the same point, so it looks like, either the SSD is on its last legs (it is very old now) or it is not getting enough voltage to power itself through the USB3 port.

I need to find another mains powered enclosure to move the SSD into to rule out the USB port power issue.

Moved the SSD to a mains powered enclosure and the I/O error issue has gone away. So clearly the SSD needs more power than the rock64 can provide, regardless of distro.

snapraid sync performance also better under Arch Arm, I get 91MB/s when doing a parity sync from USB3 SSD (data drive) to USB3 HDD (parity drive), both connected to a usb3 hub via the rock64 USB3 port.

hdparm -Tt stats from Arch Arm for the USB3 connected SSD and HDD:

/dev/sdc: (USB3 SSD - Samsung 840 Basic 120GB)
 Timing cached reads:   1860 MB in  2.00 seconds = 930.65 MB/sec
 Timing buffered disk reads: 1002 MB in  3.00 seconds = 333.59 MB/sec

/dev/sdb: (USB3 HDD - SAMSUNG HD204UI)
 Timing cached reads:   1774 MB in  2.00 seconds = 886.91 MB/sec
 Timing buffered disk reads: 394 MB in  3.01 seconds = 130.94 MB/sec

Ah jep, this is mostly the case. Always add SSDs/HDDs with dedicated power supply to an SBC. The USB power is not sufficient in most cases and/or decreases stability.

USB port mostly can only power reliably up to a USB stick and such.

I had to splice in a 5vdc power supply to my external drive on my nextcloud drive, otherwise it wouldn’t spin up and it would crash my SBC…the RPi has a hack to make it output more power but it’s easier to cut a cable and splice power into it or use a dedicated power supply for the drive…it provided proper power and is much more stable

On RPi, default max current on all USB ports together(!) is 0.6A (=3W), with max USB power setting (enabled by default on DietPi), it’s increased to 1.2A (=6W).

But this might be different on Rock64, but still, even if the USB ports provided enough power, the PSU of the board might be insufficient.

Yep, i’ve had a few USB 2.0 slots on various boards, under power a 0.7A drive.
Which makes sense, as USB 2.0 standard is 0.5A max.

USB 3.0 seems fine, 0.9A. All SSD drives, and low power platter drives should also be fine.

Easy. Use

sbc-bench -m

to monitor what’s happening in another shell while running benchmarks (benchmarking without monitoring is pretty useless):

wget https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh
chmod 755 sbc-bench.sh
sudo ./sbc-bench.sh -m

Since you’re using a Rock64 you can also check whether DietPi deleted rock64_diagnostics.sh since if not you have my code already and can run

rock64_diagnostics.sh -m

Results will then look like this (for more examples check links in right column here https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md):

Time        CPU    load %cpu %sys %usr %nice %io %irq   Temp
05:38:47: 1392MHz  3.19  26%   7%  17%   0%   1%   0%  45.5°C
05:39:17: 1392MHz  3.14  78%   1%  76%   0%   0%   0%  56.8°C
05:39:48: 1392MHz  3.19  84%   1%  82%   0%   0%   0%  58.6°C
05:40:19: 1392MHz  3.38  85%   1%  83%   0%   0%   0%  61.7°C

This is a Rock64 with recent Armbian image (there I added the 1.4GHz cpufreq OPP but we did not allow for the 1.5 GHz settings since too many instability reports occured). With ayufan images you can dynamically load a DT overlay for the higher clockspeeds and since DietPi is just a modified userland on top of ayufan’s work it should work here exactly the same (for details do a web search for ‘ayufan DT overclock’ or something like that)

SSDs with trashed performance is quite common when powered by SBC, the reason is usually undervoltage and not undercurrent (on almost all SBC USB ports there are current limiters in action exceeding the commonly known 500mA for USB2 or 900mA for USB3 – ROCK64 allows 650mA on each USB2 port and 950 mA on the USB3 port but a RPi 3 uses 1.2A for all 4 USB2 ports combined or a NanoPi M4 for example has one global 2A current limiter for all 4 USB3 ports – you always need to study schematics).

But as already said: usually it’s undervoltage (cable/contact resistance between PSU and board and again between board and disk) causing the problems and not limited current (same on the RPi 3 where 1.2A are set by default – compare with ‘vcgencmd get_config int’ output – but due to Polyfuses and Micro USB powering the voltage available to USB peripherals often drops below 4.5V and then majority of external SSDs get in trouble, the majority of 2.5" HDD already has trouble with less than 4.75V)