XU4 no longer boots/connects properly since update; root partition has fsck issues after every restart

I would assume that after the install is done and I reboot the device, it wouldn’t update the kernel once I connect it to the internet, right?

Well the first run will fail if you don’t complete it, which requires internet access. But let’s first see if the images are booting at all

Ah, I forgot this works more like a netinst iso rather than containing the common packages ready for install.

Sure enough, both images boot without problems. Did another restart, no fsck or I/O errors.

Here’s dmesg and journalctl outputs in case they’re useful at all:

dmesg.2.log (35.4 KB)
journalctl.2.log (55.9 KB)

ok but I guess you did not complete the first boot process because of the missing network? Correct?

Looks like indeed the kernel. If this is the case, we might need to report towards Armbian. :thinking:

1 Like

I captured the logs when I didn’t connect the device to the network.

After that I tried to do a normal install with the ethernet cable plugged in, it did the usual apt update/upgrade, and sure enough, as you suspected, the I/O errors came back after the restart and couldn’t proceed with the setup.

I find it rather strange that nobody else raised this issue up until now. Am I the only one who has this it?

@Joulinar: Sorry to pester you with this question, but is there a workaround until a solution can be found? I would still like to be able to use my SBC if possible.

As said already it might be related to the kernel. And doing the apt upgrade will update to the kernel version is question. Maybe you can check which packages are available to upgrade? This way we could define which packages to put on hold. It should be enough to

  • boot the system without network
  • wait until first setup failed
  • connect network to enable SSH access
  • do apt update and apt list --upgradable
  • share the output

If it is the kernel, I’m not sure what we can do about this, because we don’t maintain XU4 kernel ourself

Well you are the only one reporting such issues. Probably not enough users on the XU4.

1 Like

Here’s the output requested:

# apt list --upgradable
Listing... Done
armbian-firmware/bookworm,bookworm 23.11.1 all [upgradable from: 23.08.0-trunk]
base-files/stable 12.4+deb12u4 armhf [upgradable from: 12.4+deb12u1]
curl/stable-security 7.88.1-10+deb12u5 armhf [upgradable from: 7.88.1-10+deb12u1]
debian-archive-keyring/stable 2023.3+deb12u1 all [upgradable from: 2023.3]
debianutils/stable 5.7-0.5~deb12u1 armhf [upgradable from: 5.7-0.4]
libc-bin/stable,stable-security 2.36-9+deb12u3 armhf [upgradable from: 2.36-9+deb12u1]
libc-l10n/stable,stable-security 2.36-9+deb12u3 all [upgradable from: 2.36-9+deb12u1]
libc6/stable,stable-security 2.36-9+deb12u3 armhf [upgradable from: 2.36-9+deb12u1]
libcurl4/stable-security 7.88.1-10+deb12u5 armhf [upgradable from: 7.88.1-10+deb12u1]
libdbus-1-3/stable 1.14.10-1~deb12u1 armhf [upgradable from: 1.14.8-2~deb12u1]
libgnutls30/stable 3.7.9-2+deb12u1 armhf [upgradable from: 3.7.9-2]
libgssapi-krb5-2/stable 1.20.1-2+deb12u1 armhf [upgradable from: 1.20.1-2]
libk5crypto3/stable 1.20.1-2+deb12u1 armhf [upgradable from: 1.20.1-2]
libkrb5-3/stable 1.20.1-2+deb12u1 armhf [upgradable from: 1.20.1-2]
libkrb5support0/stable 1.20.1-2+deb12u1 armhf [upgradable from: 1.20.1-2]
libnghttp2-14/stable,stable-security 1.52.0-1+deb12u1 armhf [upgradable from: 1.52.0-1]
libpam-modules-bin/stable 1.5.2-6+deb12u1 armhf [upgradable from: 1.5.2-6]
libpam-modules/stable 1.5.2-6+deb12u1 armhf [upgradable from: 1.5.2-6]
libpam-runtime/stable 1.5.2-6+deb12u1 all [upgradable from: 1.5.2-6]
libpam0g/stable 1.5.2-6+deb12u1 armhf [upgradable from: 1.5.2-6]
libssl3/stable,stable-security 3.0.11-1~deb12u2 armhf [upgradable from: 3.0.9-1]
libsystemd-shared/stable 252.19-1~deb12u1 armhf [upgradable from: 252.12-1~deb12u1]
libsystemd0/stable 252.19-1~deb12u1 armhf [upgradable from: 252.12-1~deb12u1]
libudev1/stable 252.19-1~deb12u1 armhf [upgradable from: 252.12-1~deb12u1]
linux-dtb-current-odroidxu4/bookworm 23.11.1 armhf [upgradable from: 23.02.2]
linux-image-current-odroidxu4/bookworm 23.11.1 armhf [upgradable from: 23.02.2]
linux-u-boot-odroidxu4-current/bookworm 23.11.1 armhf [upgradable from: 23.02.2]
locales/stable,stable-security 2.36-9+deb12u3 all [upgradable from: 2.36-9+deb12u1]
openssl/stable,stable-security 3.0.11-1~deb12u2 armhf [upgradable from: 3.0.9-1]
perl-base/stable 5.36.0-7+deb12u1 armhf [upgradable from: 5.36.0-7]
systemd-sysv/stable 252.19-1~deb12u1 armhf [upgradable from: 252.12-1~deb12u1]
systemd-timesyncd/stable 252.19-1~deb12u1 armhf [upgradable from: 252.12-1~deb12u1]
systemd/stable 252.19-1~deb12u1 armhf [upgradable from: 252.12-1~deb12u1]
tzdata/stable 2023c-5+deb12u1 all [upgradable from: 2023c-5]
udev/stable 252.19-1~deb12u1 armhf [upgradable from: 252.12-1~deb12u1]

Please try the following: If the XU4 hangs on boot, attach a keyboard and type some random characters. This hopefully fills the system’s entropy pool, allowing boot to continue. Then install an entropy daemon:

apt install haveged

Strange is that I tested it without, and it worked well with this kernel version, as it should feed the entropy pool by itself. But let’s see.

@MichaIng but this is not explaining the I/O errors which seems to show up after running apt update or while using the latest image.

An empty entropy cool could cause a lot of different issues, so I would not rule this out.

What caught my attentions is that the random: crng init done message appears over 90 seconds after boot, while it should show up after a few seconds. I just checked again in my case, without entropy daemon, and here it works well though:

root@OdroidXU4:~# dmesg | grep rng
[    3.744052] exynos-trng 10830600.rng: Exynos True Random Number Generator.
[   10.036423] random: crng init done

It also shows the builtin driver for the hardware random generator, which is builting:

root@OdroidXU4:~# modinfo exynos-trng
name:           exynos_trng
filename:       (builtin)
license:        GPL v2
file:           drivers/char/hw_random/exynos-trng
description:    H/W TRNG driver for Exynos chips
author:         Łukasz Stelmach

This means that rng-tools5 should work as well (and better) as entropy daemon, if this for some reason really is the issue here. @andoru can you check the output of these commands in your case as well?

Since our new images work well in your case, a difference might be Bullseye vs. Bookworm, but the kernel package is the same in both cases, so still strange. EDIT: Ah, the issues re-appeared …

Sure, I’ll try this out next week.

Please keep in mind however that the root partition is mounted as read-only with the new kernel. So I’m not sure if I’ll be able to install haveged.

What should I do after installing haveged and issuing the commands you mentioned?
Where should I type the random keys and how can I tell that I’ve increased the entropy enough?

You can remount it writable:

mount -o remount,rw /

At best attach a monitor. When kernel messages stop, hit some keys on the keyboard, which hopefully will make boot continue. When systemd boots, it uses /dev/random at some point, and if the entropy pool is empty, it hangs forever trying to get random characters from there. Once the pool is filled, it gets the wanted random characters and the process continues.

When haveged is installed, it fills the entropy pool with an additional algorithm (HAVEGE) so that this hanging boot should not happen in the first place. So just reboot and see whether this part is solved.

Generally, to check the size of the entropy pool:

cat /proc/sys/kernel/random/entropy_avail

This shows 256 in my case at all time, even when I use /dev/random continuously, e.g.:

cat /dev/random > test

The above command should be aborted via CTRL+C quickly and the file removed, since it creates an otherwise endless large file :wink:.
This shows that the pool is filled pretty fast here. But maybe there are differences between some of the XU4 variants/revisions.

1 Like

Alright, I tested what you posted @MichaIng, here’s the result:

The kernel messages don’t seem to pause at all, they go up/refresh constantly after the monitor gets a HDMI signal. So I’m not sure what you mean by “when kernel messages stop”

I tried both typing random keys as soon as the kernel messages (with the linux penguins at the top) showed up on screen, and immediately after turning on the XU4. Both had the same result.

After logging in with the default credentials, and failing the step in the setup, I got dropped at the root prompt, so I issued the commands you suggested:

I got the same entropy_avail cat output as you do, and I’m still not able to mount the root partition as writeable, and thus can’t install haveged.

I’ll attach the dmesg/journalctl logs after this in case something changed:

dmesg.3.log (37.4 KB)
journalctl.3.log (60.9 KB)

Not sure if the modprobe output is useful to you in this case, but I’ll happily provide it if it is.

[    3.898658] exynos-trng 10830600.rng: Exynos True Random Number Generator.
...
[   10.046612] random: crng init done

Okay seems like entropy is not the issue.

[    4.686539] mmcblk0: mmc0:0001 SD256 118 GiB 
[    4.699197] exynos5-dmc 10c20000.memory-controller: error -ENXIO: IRQ drex_0 not found
[    4.705637] exynos5-dmc 10c20000.memory-controller: error -ENXIO: IRQ drex_1 not found
[    4.713777] exynos5-dmc 10c20000.memory-controller: DMC initialized, in irq mode: 0
...
[    7.830765] I/O error, dev mmcblk0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
[    7.855616] I/O error, dev mmcblk0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 2
...
[   11.023732] I/O error, dev mmcblk0, sector 8192 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 2
[   11.033759] Buffer I/O error on dev mmcblk0p1, logical block 0, lost sync page write
[   11.042993] EXT4-fs (mmcblk0p1): I/O error while writing superblock

It seems to be a particular issue with the SD card.

I see the DMC errors as well in my case, so this is unrelated:

root@OdroidXU4:~# dmesg -l 0,1,2,3
[    0.017511] CPU4: Spectre v2: firmware did not set auxiliary control register IBE bit, system vulnerable
[    0.019096] CPU5: Spectre v2: firmware did not set auxiliary control register IBE bit, system vulnerable
[    0.020855] CPU6: Spectre v2: firmware did not set auxiliary control register IBE bit, system vulnerable
[    0.022368] CPU7: Spectre v2: firmware did not set auxiliary control register IBE bit, system vulnerable
[    2.420287] samsung-pinctrl 13400000.pinctrl: Failed to create device link (0x180) with soc
[    4.470046] exynos5-dmc 10c20000.memory-controller: error -ENXIO: IRQ drex_0 not found
[    4.476516] exynos5-dmc 10c20000.memory-controller: error -ENXIO: IRQ drex_1 not found
[    4.761440] OF: graph: no port node found in /soc/hdmi@14530000

Can you run this command as well, just in case I have overseen something else?

And do you have another SD card to test with? The SanDisk High Endurance is a good one, actually, but since I do have any issues with my Samsung EVO Select (Amazon’s “EVO Plus” rebranding), this is the only relevant difference I see.

@MichaIng OP already tried many cards. Issue seems to be unrelated to SD card as other OS like Ubuntu or Android are worse. Same with older DietPi image, as long as you don’t update kernel.

Issue is occurring on latest DietPi and Armbian.

My thought, it one of the latest kernel packages

1 Like

Okay, not sure then. Strange why I do not face any issues with current kernel, and also no one else of the not too small number of DietPi XU4 systems, reported issues like that :thinking:.

Uncommon if it affects the SD card, but did you try another PSU? Probably it is finally a power/voltage issue. No better idea than that.

If this was a power issue, wouldn’t the other kernel versions and other OSes be also affected?

If yes, then could we try @Joulinar 's suggestion from earlier with blacklisting the new kernel?

It is possible that peak power usage is higher with a different kernel.

To assure that it really is the new kernel, try the following:

  1. Boot with the old Bookworm.7z image, but without Ethernet cable attached
  2. On login, you should see a related error, exit from there to land in console
  3. apt-mark hold linux-{image,dtb}-current-odroidxu4
  4. Attach Ethernet, if not done automatically, ifup eth0
  5. exec bash

Then firstrun setup should start, but skip upgrading the kernel packages.

1 Like

Did what you suggested, and I was able to completely go through the setup, even after the restart.

I’ve attached the dmesg and journalctl logs after I finished the setup.

dmesg.4.log (38.8 KB)
journalctl.4.log (75.1 KB)

I think it’s safe to say that I won’t disable the hold command on the kernel until this gets fixed somehow, this has caused enough frustration.

Thank you @Joulinar and @MichaIng for the assistance.

means your system is able to update and reboot without these file system issues?