HDD filesystem corruption at reboot / poweroff

Deleted_User_5267 · 13 February 2019 23:06

Hello, I’ve just subscribed to this board. I’ve written what follows on the PINE64 forum a couple of days ago, but it went unnoticed and I am still trying to determine where exactly I have to ask for help.
My setup consists of a Rock64 4G board connected to a WDLabs Pi drive. I have replaced the HDD cable delivered by (the now defunct) WDLabs with another interfaced with USB3. The WDLabs Pi drive has no SATA connector, see here. I have verified that the USB driver used for the disk is not UAS, in fact the HDD drive is using the usb-storage driver:

[    2.745235] usb-storage 5-1:1.0: USB Mass Storage device detected
[    2.749781] scsi host0: usb-storage 5-1:1.0
[    2.753639] usbcore: registered new interface driver usb-storage

To test the above setup I have used DietPi, which has always performed well on my Raspberries, copied to an SD card and updated with the latest release (6.21.1), to which I have applied the ayufan kernel 4.4.132-1075. Once the test was over, I patched the SPI to enable the USB boot, and then I restarted from scratch transferring the DietPi image directly on the HDD, using Rufus on a Windows machine. The DietPi partitioning scheme, therefore, has been replicated faithfully on the HDD, and indeed after the initial boot the /dev/sda7 partition (root) is properly extended to the maximum size of the disk (314 GB).

Differently from other Rock64 users, until now I haven’t experienced any problem with the HDD while the system is running, thanks probably to the particular HDD I am using that was engineered specifically for the Raspberry environment. True, this host is only running Pi-hole, NUT and the Unifi Controller (+ MongoDB) right now, so the system load is not very high at the moment, but the performance is quite good except for the boot, where it takes ~50 seconds from power on up to the moment the login prompt is shown.

My problem is instead with the reboot / poweroff phases, where 50% of the times the EXT4 filesystem is left in a dirty state, and where I have already experienced more than once the loss / corruption of the /boot/dietpi directory, as well as the /boot/dietpi.txt file, In those cases, I had to boot the system from DietPi installation an SD card to repair / restore the HDD partition. There is definitely something wrong, which occurs in the last second(s) before the power is cut, like not giving enough time to the HDD to flush any pending writes. The strange thing is that most (all?) of the corruption happens on the DietPi specific stuff in /boot, which is where the ramdrive is saved back to disk.

I have also another setup based on the same version of DietPi, but using a Raspberry Pi3 and the special cable WDLabs provided to power the drive directly from the power supply instead via the USB port. This system has absolutely NO problem at all. Unfortunately, not only due to a lack of physical space, the same cable cannot be used with the Rock64 as it uses MicroUSB connectors and not the barrel ones.

So: what am I doing wrong? Did I make a mistake by putting the DietPi image directly on the HDD? Or is there really an issue on the Rock64, or on the DietPi distro when running on the Rock64? I have patched the shutdown sequence in the dietpi-ramdrive service to pause for 2 seconds after the sync, and then run again the sync command in the hope of giving the HDD some more time to flush any pending write. Unfortunately it didn’t cure the issue. In the mean time I have to cross my fingers whenever I have to reboot or poweroff the system.

Thanks in advance for any hint you may give me (or to redirect me to the appropriate site where to discuss this issue)

MichaIng · 21 February 2019 14:37

At best enable boot persistent journald logging to see the shutdown order, for this just the following dir is required:
mkdir /var/log/journal
Then after reboot, check journalctl to derive the exact shutdown sequence and for possible errors or wrong order.

Can’t be wrong to check dmesg as well for any errors about the drive/power/I/O.

The dietpi-ramdisk service definitely stops before any local HDD is unmounted, at least on current DietPi v6.21. There was one issue with v6.20 (respectively a certain systemd package update) where drives mounted via x-systemd.automount flag were allowed to unmount before/while DietPi-RAMdisk running.

So just to be true, check your /etc/fstab and in case remove all x-systemd.automount flags, at least from the root/boot drives.

What you manually did is adding a sleep to /DietPi/dietpi/func/dietpi-ramdisk? This could/should be done in this line, right after the sync command: https://github.com/Fourdee/DietPi/blob/dev/dietpi/func/dietpi-ramdisk#L123
Actually on shutdown, the system should calls this already and should give enough time for all drives to sync, but AFAIK this is a random timer and in case of large slow drives with large async cache (block size) that might be not sufficient.