NFS hangs after upgrade/reboot

Both Raspis I use and the NAS are in the same local network.

thaonha
pls don’t mix your topic in. You described an issue with a system not booting and WiFi. But the particular issue for uweD is about NFS not being able to mount. The system themselves is booting without issues.

Sorry, it the exact problem I’m facing, because the system hang (boot fine, ssh fine but no respond to command at all) then I have to do the fresh install.

the other post was request for new image if possible. I never have problem with it, so I explain why I need a new one.

Still your issue is different. In this case it is not possible to mount the NFS share at all. Not even initially.

I have taken a tcpdump from port 2049, which is used for NFSV4:

|19:02:00,646805|         V4 Call (Reply In 56          |NFS: V4 Call (Reply In 56) LOOKUP DH: 0x62d40c52/nfs
|         |(1012)   ------------------>  (2049)   |
|19:02:00,647235|         V4 Reply (Call In 55          |NFS: V4 Reply (Call In 55) LOOKUP
|         |(1012)   <------------------  (2049)   |
|19:02:00,647339|         1012 → 2049 [ACK] Se          |TCP: 1012 → 2049 [ACK] Seq=2773 Ack=2613 Win=64000 Len=0 TSval=136386644 TSecr=25737635
|         |(1012)   ------------------>  (2049)   |
|19:02:00,647711|         V4 Call (Reply In 59          |NFS: V4 Call (Reply In 59) LOOKUP DH: 0x62d40c52/nfs
|         |(1012)   ------------------>  (2049)   |
|19:02:00,648116|         V4 Reply (Call In 58          |NFS: V4 Reply (Call In 58) LOOKUP
|         |(1012)   <------------------  (2049)   |
|19:02:00,648199|         1012 → 2049 [ACK] Se          |TCP: 1012 → 2049 [ACK] Seq=2949 Ack=2905 Win=64000 Len=0 TSval=136386645 TSecr=25737635
|         |(1012)   ------------------>  (2049)   |
|19:02:00,649052|         V4 Call (Reply In 62          |NFS: V4 Call (Reply In 62) GETATTR FH: 0x45896613
|         |(1012)   ------------------>  (2049)   |
|19:02:00,649502|         V4 Reply (Call In 61          |NFS: V4 Reply (Call In 61) GETATTR
|         |(1012)   <------------------  (2049)   |
|19:02:00,649624|         1012 → 2049 [ACK] Se          |TCP: 1012 → 2049 [ACK] Seq=3133 Ack=3073 Win=64000 Len=0 TSval=136386646 TSecr=25737635
|         |(1012)   ------------------>  (2049)   |
|19:02:00,650036|         V4 Call (Reply In 65          |NFS: V4 Call (Reply In 65) GETATTR FH: 0x45896613
|         |(1012)   ------------------>  (2049)   |
|19:02:00,650512|         V4 Reply (Call In 64          |NFS: V4 Reply (Call In 64) GETATTR
|         |(1012)   <------------------  (2049)   |
|19:02:00,650632|         1012 → 2049 [ACK] Se          |TCP: 1012 → 2049 [ACK] Seq=3313 Ack=3229 Win=64000 Len=0 TSval=136386647 TSecr=25737635
|         |(1012)   ------------------>  (2049)   |
|19:02:00,651314|         V4 Call (Reply In 68          |NFS: V4 Call (Reply In 68) GETATTR FH: 0x45896613
|         |(1012)   ------------------>  (2049)   |
|19:02:00,651856|         V4 Reply (Call In 67          |NFS: V4 Reply (Call In 67) GETATTR
|         |(1012)   <------------------  (2049)   |
|19:02:00,652027|         1012 → 2049 [ACK] Se          |TCP: 1012 → 2049 [ACK] Seq=3489 Ack=3349 Win=64128 Len=0 TSval=136386648 TSecr=25737635
|         |(1012)   ------------------>  (2049)   |
|19:02:00,652677|         V4 Call (Reply In 71          |NFS: V4 Call (Reply In 71) GETATTR FH: 0x45896613
|         |(1012)   ------------------>  (2049)   |
|19:02:00,653246|         V4 Reply (Call In 70          |NFS: V4 Reply (Call In 70) GETATTR Status: NFS4ERR_DELAY
|         |(1012)   <------------------  (2049)   |
|19:02:00,653451|         1012 → 2049 [ACK] Se          |TCP: 1012 → 2049 [ACK] Seq=3661 Ack=3449 Win=64128 Len=0 TSval=136386650 TSecr=25737635
|         |(1012)   ------------------>  (2049)   |
|19:02:00,761447|         V4 Call (Reply In 74          |NFS: V4 Call (Reply In 74) GETATTR FH: 0x45896613
|         |(1012)   ------------------>  (2049)   |
|19:02:00,762067|         V4 Reply (Call In 73          |NFS: V4 Reply (Call In 73) GETATTR Status: NFS4ERR_DELAY
|         |(1012)   <------------------  (2049)   |

Here one can see, that the hanging system is an endless loop of GETATTR FH: 0x45896613 answered by GETATTR Status: NFS4ERR_DELAY

OK, this does not answer, why it behaves different on the working system, where the GETATTR is answered proberly. This is only the reason, one has to interrupt the process.

I prepared a new clean system, based on the most recent downloadable bullseye image.
On this new clean system I opened the dietpi-drive_manager and tried to "Add network drive "
Unfortunately it hangs at the same point as metenioned earlier. If needed I can post the terminal output.

I think the dietpi-drive_manager needs to be addapted.

On the other hand, there is a command that is working on all systems
mount -t nfs -o nfsvers=4 192.168.16.32:/nfs /mnt/nfs/

If one tries to add the minor version as well, since the server runs 4.2 it gets confusing.
-o nfsvers=4,minorversion=2 and -o nfsvers=4,minorversion=1 hang as well, only -o nfsvers=4,minorversion=0 works. Therfore its better to omit.

On drive manager we don’t do any specification on the NFS version. Looks like adding nfsvers=4 might be mandatory in your cause. Did you tried to set your NFS server to 4.0?

No, the GUI of the server does not support setting different minor versions.

ok, but adding nfsvers=4 manually to the /etc/fstab is working for you now?

Probably the server does not communicate the NFSv4 subversion correctly or so, I mean they do communicate via v4 protocol already, but probably they try with v4.2 while the server actually supports v4.0 only, which is then used or tried automatically with nfsvers=4?

I can’t remember any other case where defining the NFS version explicitly would be required, as usually server and client make that out automatically, so to me this looks like a bug on the NFS server. But on the other hand you say that on a second system it works OOTB.

Can you give more details about hardware and kernel of the two systems?

uname -a

The server suports v4.2 as it is reported via the nfsstat command on the working system

root@pi3udnc:~# nfsstat -m
/mnt/nfs from 192.168.16.32:/
 Flags: rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.16.28,local_lock=none,addr=192.168.16.32

On the not working system, now mounted via this line in the fstab:

192.168.16.32:/nfs /mnt/nfs nfs nfsvers=4,nofail

I get version 4.0 as requested

root@pi2udhifi:~# nfsstat -m
/mnt/nfs from 192.168.16.32:/nfs
 Flags: rw,relatime,vers=4.0,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.16.10,local_lock=none,addr=192.168.16.32

The systems a small QNAP NAS Sever as NFS server and raspberry PI 2 and PI3 as NFS-clients.

NFS-server:

QNAS: [~] # uname -a
Linux QNAS-T20 4.2.8 #1 SMP Thu Mar 24 00:53:39 CST 2022 aarch64 GNU/Linux
[~] #



Servername	QNAS-T20
Modellname	TS-230
CPU	Realtek RTD1296 Quad-Core ARM Cortex-A53 Processor @ 1.4GHz (4 Kerne)
Seriennummer	Q20BB20161
Gesamtspeicher	2 GB
Firmware-Version	5.0.0.1986 Build 20220324
Systembetriebszeit	5 Tage 0 Stunde(n) 24 Minute(n)
Zeitzone	(GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna
Dateinamencodierung	Englisch

NFS-client
Working system:

root@pi3udnc:~# uname -a
Linux pi3udnc 5.10.103-v7+ #1529 SMP Tue Mar 8 12:21:37 GMT 2022 armv7l GNU/Linux
root@pi3udnc:~#
root@pi3udnc:~# dmesg
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.10.103-v7+ (dom@buildbot) (arm-linux-gnueabihf-gcc-8 (Ubuntu/Linaro 8.4.0-3ubuntu1) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1529 SMP Tue Mar 8 12:21:37 GMT 2022
[    0.000000] CPU: ARMv7 Processor [410fd034] revision 4 (ARMv7), cr=10c5383d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] OF: fdt: [b]Machine model: Raspberry Pi 3 Model B Rev 1.2[/b]
[

Not working system:

root@pi2udhifi:~# uname -a
Linux pi2udhifi 5.15.32-v7+ #1538 SMP Thu Mar 31 19:38:48 BST 2022 armv7l GNU/Linux
root@pi2udhifi:~#
root@pi2udhifi:~# dmesg
[    0.000000] Booting Linux on physical CPU 0xf00
[    0.000000] Linux version 5.15.32-v7+ (dom@buildbot) (arm-linux-gnueabihf-gcc-8 (Ubuntu/Linaro 8.4.0-3ubuntu1) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1538 SMP Thu Mar 31 19:38:48 BST 2022
[    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
[    0.000000] CPU: div instructions available: patching division code
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] OF: fdt: [b]Machine model: Raspberry Pi 2 Model B Rev 1.1[/b]
[

The kernel version is a difference between both systems

Linux pi3udnc 5.10.103-v7+
Linux pi2udhifi 5.15.32-v7+

You could try rolling back 5.10.103 to check if this will make a difference.

How do I rollback to this Kernel?
Another question in this context:
both systems are updated to DiePi v 8.3.1
Shouldn’t they have the same kernel version?

There is no relation between DietPi version and kernel version, as we don’t do any kernel development. We always use the kernel provided by the underlying base image. In your case it is a Raspberry OS and respective RPI OS kernel. Means, to rollback a kernel, you would need to follow official RPI OS instructions. https://github.com/Hexxeh/rpi-update#options

The following should downgrade to version 5.10.95.
!! WARNING !! you do this on your own risk

apt install rpi-update
sudo rpi-update 39821d33e777cde9ba1a3cc8a73cfdd62fbbd2de
reboot

Now where we are talking about, looks like your system pi3udnc is a Debian Buster one, while pi2udhifi seems to be Bullseye. You can check it as follow

echo $G_DISTRO_NAME $G_RASPBIAN

This would be a major difference between both systems.

I did the rollback on the newly created test system. The NFS mount works fine on the downgraded kernel. This explains, why the NFS mount did not work anymore after updating to DietPi v8.3.1 It resulted in a kernel update resulting in the NFS problem. In fact I don’t know which kernel was activ before, I did not pay attention. The update was from DietPi 8.3.0 to 8.3.1. Perhaps you can derive it from your listings.

The not working system was created using a first installation based on Bullseye and was regularly updated.

root@pi2udhifi:~# echo $G_DISTRO_NAME $G_RASPBIAN
bullseye 0
root@pi2udhifi:~#

The well working system was created using a buster based installation and manually upgrated to bullseye using the upgarde description from dietpi https://dietpi.com/blog/?p=811

root@pi3udnc:~# echo $G_DISTRO_NAME $G_RASPBIAN
bullseye 1
root@pi3udnc:~#

Basically there is no relation between the RPI OS kernel and the DietPi version. The only thing dietpi-update is doing, is to execute an apt upgrade at the beginning. This will trigger an update of all apt managed packages. This include the kernel provided by RPi devs. Usually you should see the kernel update available on the working systems as well. Just run apt update. Similar information should be shown on the DietPi banner, if apt package updates available.

Issues with the kernel would need to be reported to RPi devs directly as we don’t do kernel development https://github.com/raspberrypi/firmware/issues

Linking the likely related GitHub issue: https://github.com/MichaIng/DietPi/issues/5358

Looks like it is any Linux 5.15 (on GitHub it was an RK3399 system with Armbian 5.15 kernel) client being unable to mount QNAP NFS shares. Before we spam kernel developers, based on this find, I’d actually suggest to contact the QNAP support, in the hope that they are able to replicate it with other Linux 5.15 NFS clients.