Kernal panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,2)

Dweopi918 · 31 July 2023 19:11

Required Information

I know some information is missing, I am unsure how I can get it since I cannot boot, will provide gladly if there is a way.

SBC model | RPI 4b (8GB)
Power supply used | 5V 3A Samsung
SSD ~~card~~ used | Samsung 870 EVO SATA 2.5" SSD 1TB

Expected behaviour

Boot up normally

Actual behaviour

can’t boot:

Extra details

I can’t pinpoint a specific point when it stopped being able to boot, but I don’t think it was right after an update or else I would have noticed.

For the record I tried connecting the SSD to my laptop (windows) and I could see a small partition with some dietpi files but not the main partition since it’s ext4.

What can I do? Preferably without flashing the SSD since I have some things that are not backed up. Also, if the small partition showed up on my laptop does it mean the SSD is fine?

Joulinar · 1 August 2023 09:47

Windows is not able to read ext4 at all. Therefore, it is not displayed. There are ways to make ext4 partitions available, but they require Windows 11 and WSL2

It looks like you have data corruption on your SSD. How is the SSD supplied with power? With its own power supply or just via the RPI4’s USB port? The RPi4 is not designed to power HDD or SSD via the USB port. This is a limitation of the SBC and we strongly recommend using an external power supply or powered USB hub.

What you could try is to flash a new SD card with a new image and boot from it. Then connect your SSD (do not mount the SSD) and run a file system check.

Dweopi918 · 3 August 2023 10:30

Firstly thanks for the help!

The SSD was supplied with power through the PI itself which I didn’t know was an issue. I will order an enclosure with its own power supply for the future.

In the meantime I was unable to mount the main partition using WSL2 (I am on an appropriate windows 11 version), ran into the similar errors.

First I ran:

wsl --mount \\.\PHYSICALDRIVE1 --partition 2 --type ext4

Which resulted in:

The disk was attached but failed to mount: Input/output error.
For more details, run 'dmesg' inside WSL2.
To detach the disk, run 'wsl.exe --unmount \\.\PHYSICALDRIVE1'.

dmesg inside the container showed:

[   11.469312] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[   11.469809] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[   11.470157] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -22
[   11.470567] misc dxg: dxgk: dxgkio_query_adapter_info: Ioctl failed: -2
[   49.223123] hv_balloon: Max. dynamic memory size: 8028 MB
[  154.449449] WSL (225): Mapped partition 2 from device /sys/block/sdc to /dev/sdc2
[  169.608516] JBD2: recovery failed
[  169.608917] EXT4-fs (sdc2): error loading journal
[  169.611503] WSL (225) ERROR: UtilMount:1626: mount(/dev/sdc2, /mnt/wsl/PHYSICALDRIVE1p2, ext4
[  169.611506] , 0x0, ) failed 5

Is there anything I could do at this point to try and recover some files (or even the entire disk) or should I start fresh and nuke the data?

Joulinar · 3 August 2023 10:46

one think you could try, to flash a new image to SD card, boot the system of SD, connect your SSD (not mount) and perform some file system checks.

drummer_nrg · 6 August 2023 03:16

I encountered roughly the same problem: moving DIetPi on my RPi4b from SDCard to external USB enclosure. I downloaded the latest RPi 4 image, wrote the image to the USB device (I am on LinuxMint, and I used “Make Bootable USB stick” option from the file explorer: this turned out to be a mistake). Unmounted the USB device, plugged into the RPi, and rebooted without the SDCard.

All went fine; DietPi boots and installs. However, upon first reboot I saw nearly the same BIOS screen output as yours: can’t mount the second partition, do not see an ext4 filesystem.

So the issue is that you need to write the entire disk image properly, I saw two proper partitions and then unused space when I used “Open with…” and then selected “Disk Image Writer” which uses Disks. This worked properly upon install and first reboot!

Joulinar · 6 August 2023 13:56

How is your USB hard drive supplied with power? Does it have its own power supply or is it only powered via the USB port of the RPi4? Because the USB ports of the RPi4 are not technically designed to power things larger than a pen stick. Especially at peak times or high loads, there can be problems with the power supply and this can then lead to data corruption or data loss. We therefore recommend that you always use a separate power supply unit for the external disks or a powered USB hub.

drummer_nrg · 7 August 2023 12:50

About 24 hours later, I agree with you now!; it is the best recommendation!

A few reboots of the RPi with only the USB drive attached (only powered by RPi through USB) and it has been fine.

However, I proceeded to boot off the SDcard with the USB drive attached, rsync’d most of /mnt/dietpi_userdata from SDcard to the USB mount, shutdown, and removed the SDcard. Upon power up, I get a different error, basically bad blocks on the ext4 partition, but the boot process proceeds during a filesystem check and repair. After a reboot or two, I’ve got a working system again.

So I agree, at this point, this is likely a power issue when I have combined storage media!

My hypothesis is that my USB drive (an earlier release of the OWC Envoy Pro Elektron - USB-C Portable Storage Solution with M.2 media) plus my SDcard is too large a power draw, but without the SDcard, it may be fine.

I’ll try it out for a few more days, I don’t mind the experiment because I can always fall back to my SDcard! I may try to connect a Kill-A-Watt later to see if I can measure the total system power draw between storage options. It would be nice to know if there’s any way to determine power draw from Debian.

Joulinar · 7 August 2023 15:15

It is not about the total power consumption of the SBC. It’s more about the maximum power that the USB ports can provide. If I’m not mistaken, it’s even divided between them.

Dweopi918 · 16 August 2023 12:06

So I booted dietpi from an sd card and connected my SSD (this time with an enclosure that supplies power externally).

What can I try now?

So far I went to dietpi-drive_manager and ran a “check & repair”. It’s currently running. Seems like it doesn’t do anything. A terminal window opened at the bottom after pressing “ok” for the check, but it doesn’t further display anything, still shows the “ok window”. Not sure if something is stuck or if this is intended behavior and it’s just a long process.

For anyone wondering it does just appear stuck but is actually running, let it run for a bit.

The result is as follows:

Pressing “ok” on said screen doesn’t do anything and it appears to be stuck again after pressing ok.

Anything else I could do besides the drive_manager check?

Thanks again for the help and patience

Joulinar · 16 August 2023 17:09

probably better to run the file system checks manually. You would need to correct the issues on your SSD.

fsck -a /dev/sdxx

/dev/sdxx would need to be replaced by correct partition name. To check the name of your drive

lsblk -o name,fstype,label,size,ro,type,mountpoint,partuuid,uuid

Dweopi918 · 18 August 2023 12:40

Couldn’t repair with fsck so I decided to just reflash and start fresh. I tried copying a large directory on the new install of dietpi but twice now that it fails and the entire system becomes read only until I restart the PI.

Quick google lead me to believe something is wrong with the drive so I ran a smartctl check (smartctl -t long /dev/sda) but it seems to have passed just fine?

smartctl -a /dev/sda

Results:

=== START OF INFORMATION SECTION ===
Model Family:     WD Blue / Red / Green SSDs
Device Model:     WDC  WDS100T2B0A-00SM50
Serial Number:    22013B802470
LU WWN Device Id: 5 001b44 8b78b8363
Firmware Version: 415020WD
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Aug 18 13:28:35 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   ---    Old_age   Always       -       4772
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       100
165 Block_Erase_Count       0x0032   100   100   ---    Old_age   Always       -       571435780815
166 Minimum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       7
167 Max_Bad_Blocks_per_Die  0x0032   100   100   ---    Old_age   Always       -       102
168 Maximum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       63
169 Total_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       633
170 Grown_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       0
171 Program_Fail_Count      0x0032   100   100   ---    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   ---    Old_age   Always       -       0
173 Average_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       28
174 Unexpected_Power_Loss   0x0032   100   100   ---    Old_age   Always       -       72
184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       947
194 Temperature_Celsius     0x0022   061   060   ---    Old_age   Always       -       39 (Min/Max 26/60)
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age   Always       -       0
230 Media_Wearout_Indicator 0x0032   005   005   ---    Old_age   Always       -       0x053502500535
232 Available_Reservd_Space 0x0033   100   100   004    Pre-fail  Always       -       100
233 NAND_GB_Written_TLC     0x0032   100   100   ---    Old_age   Always       -       29666
234 NAND_GB_Written_SLC     0x0032   100   100   ---    Old_age   Always       -       41126
241 Host_Writes_GiB         0x0030   253   253   ---    Old_age   Offline      -       12020
242 Host_Reads_GiB          0x0030   253   253   ---    Old_age   Offline      -       50492
244 Temp_Throttle_Status    0x0032   000   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4772         -

Selective Self-tests/Logging not supported

I am a bit confused then. Is the drive OK or does it have a problem?
If it’s ok why does the system keep locking up as readonly when doing something as simple as copying a folder.

PS: fsck -a also shows no errors:
/dev/sda2: clean, 367760/61046784 files, 10085954/244157622 blocks

Any insights will be welcomed! Thanks!

Jappe · 18 August 2023 13:14

Before we try other things, the closest idea would be to try another drive and/or cable/enclosure.
You flashed new and still got problems, despite smartctl and fsck find nothing, which is very odd.

Dweopi918 · 18 August 2023 13:44

Yeah., I suppose it’s possible the drive was initially corrupted due to me supplying it power through the pi (even though it was running like that for a year), and that now the issues I am experiencing are due to the enclosure/cable being defective somehow.

The enclosure/cable are brand new by the way ( Amazon.com: UGREEN External Hard Drive Enclosure for 3.5 2.5 Inch SATA Drives).

Assuming I can get my hand on another SSD, which I am not sure I can at the moment, what’s a good way to check if the error is still occurring (which would suggest the enclosure is bad), since smartctl and fsck report fine. Just copy the same folder again?

Do you think I should try to test the SSD with the old enclosure that requires supplying power through the PI, to figure out if the new enclosure is defective? Or is it a bad idea since undersupplying it with power could cause more issues?

Thanks

Jappe · 20 August 2023 10:50

No I wouldn’t try the unpowered enclosure.
You don’t need to use a SSD, just test the adapter with another hard drive to figure out if your SSD has some damage.

Joulinar · 20 August 2023 13:52

maybe @MichaIng has some ideas on how to troubleshoot further.

MichaIng · 20 August 2023 19:42

Please run fsck with forced check, otherwise it skips an actual check as long as no dirty bit is set:

fsck -f /dev/sda2

And just to rule some things out, try to replace the USB cable, switch to another USB port and (as suggested by Jappe) replace the SSD with any other SATA drive, respectively.

Dweopi918 · 20 August 2023 21:27

I did end up reflashing and trying it (before your comment in my defense hah). I don’t want to speak too soon but ~~so far with this enclosure it hasn’t failed yet~~ (whereas with the other enclosure I never get this far over 3 different installations and many tries with each installation).

And as I was writing this comment it failed again, the system turned read only.
By the way restarting just now the system turned on but was very slow/slugish.

I turned off the PI and booted from the SD card so I could run fsck against the SSD.

fsck -f -n /dev/sda2

Results:

fsck from util-linux 2.38.1
e2fsck 1.47.0 (5-Feb-2023)
Pass 1: Checking inodes, blocks, and sizes
Inode 1078470 extent tree (at level 1) could be narrower.  Optimize? no

Inode 1078510 extent tree (at level 2) could be narrower.  Optimize? no

Inode 1078817 extent tree (at level 2) could be narrower.  Optimize? no

Inode 1079093 extent tree (at level 2) could be narrower.  Optimize? no

Inode 1079095 extent tree (at level 2) could be narrower.  Optimize? no

Inode 1080934 extent tree (at level 2) could be narrower.  Optimize? no

Inode 1320756 extent tree (at level 2) could be narrower.  Optimize? no

Inode 2237441 extent tree (at level 2) could be narrower.  Optimize? no

Inode 2237523 extent tree (at level 2) could be narrower.  Optimize? no

Inode 2237527 extent tree (at level 2) could be narrower.  Optimize? no

Inode 2237536 extent tree (at level 2) could be narrower.  Optimize? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sda2: 657151/61046784 files (0.5% non-contiguous), 18433284/244157622 blocks

I am no expert on parsing said results but it seems like nothing is critical on there? This just keeps getting confusing. Unfortunately, I don’t have another drive. I could buy another SSD (which is what I am trying to avoid by attempting to figure out if the SSD is the issue).

I am going to switch to another USB port and try again. But it’s a pain in the ass to troubleshoot since I don’t know what triggers the system to become read only. Like just now when I though the enclosure was the issue but turns out it still failed eventually. Any suggestion for a stress test that could trigger that?

I even ran a disk benchmark from drive manager when first starting with the underpowered enclosure and it seemed to pass just fine.

For the record booting again from the SSD (still the same USB port) and it seems it’s no longer slow/slugish, so perhaps it’s just the first boot after crashing to read only (I also observed this slowness after the system crashing with the properly power supplied enclosure)

MichaIng · 20 August 2023 21:54

Jep indeed, the filesystem seems to be fine.

It is an ext4 filesystem option to automatically remount read-only in case of I/O errors, to prevent actual/further filesystem corruption as good as possible. But I am also not sure why this happens in your case even if doing a drive benchmark does not cause an issue.

One thing, out of interest, what does this show?

hdparm -B 255 /dev/sda2

Usually SSDs do not support APM (Advanced Power Management) anyway, but in case this does does, the above command disables it. It should not cause issues anyway, but let’s just rule out everything.

Btw, does the kernel show any other error?

dmesg -l 0,1,2,3

And is/was the RPi throttled as of too high temperature or low voltage?

vcgencmd get_throttled

Dweopi918 · 20 August 2023 22:02

Just to be sure, should I try those things while booted from the SD or SSD?

MichaIng · 20 August 2023 22:27

When booting from SSD. The first at best shortly after boot, so see whether unexpectedly things remain stable, the other two diagnose commands are most valuable in case after the drive mounted read-only.