SSD goes missing...possible connection with samba?

Evening everybody,

first of all I would like to thank the team for coming up with DietPi. It has been a very nice experience with my Raspberry Pi 4 so far due to DietPi :slight_smile:

Now to the issue at hand: For whatever reason, a certain kind of load seems to make my SSD go AWOL. I’m using a external case and have had no problems whatsoever, but lately I’ve been experiencing some weird issues. Writing to the drive via the Raspberry Pi works wonderfully, yet the drive has disappeared on me a couple of times now when accessing it. Mainly via a Windows PC accessing the SSD as a Samba share, but I have seen the same thing happening during an FTP download: The transfer rate drops and I can’t access the drive anymore, not even on the Raspberry Pi.

Something I’ve noticed when this happens is that the DietPi Drive Manager is not showing me the drive’s xfs filesystem, but this:

I have never heard of a filesystem called “Net”. And it also looks like all the other options for the drive are gone. Does the Raspberry Pi think that my external drive has suddenly become a mounted network share from another computer?

Trying to run “ls” in the mount point’s folder gives me an I/O error message.

After restarting the Raspberry Pi, the drive manager shows me this:

As you can see, the other options are back again and I can actually mount the drive without a single issue.

I’ve tried to find a solution to this problem, but to be honest, I can’t even tell what’s going on. It has been quite some since I’ve dealt with linux and while I assume that something causes the connection to the drive to be disrupted, I wouldn’t even know where to look for the source of this whole thing.

I’d appreciate if you guys could lend me a hand here.

Cheers

EDIT: I was just able to recreate the issue: Running a ffmpeg command turning f4v files into mp4 while copying all streams, the drive crashed on me again. After restarting, mounting the drive, stopping both nmbd and smbd and running ffmpeg again, I had no issue and all files were successfully created. To me this points straight at the sharing the drive via Samba. Any way I can log what’s going on when the crash happens?

Hi,

many thanks for your report. Once your drive disappear, you could have a look to kernel error messages. Maybe you can find some error messages regarding usb devices.

dmesg -l err,crit,alert,emerg

As well you can check status of the USB device

lsblk -o name,label,size,ro,type,mountpoint,partuuid,uuid

Does you external case has it’s own power supply?

we highly recommend to use an external power supply for your SSD. We have seen a couple of messages where external HDD/SDD had quite some issues running just connected to USB port. In most of cases, issues are gone once connected to own power.

Thanks, I’ll look into that.


I just managed to force another crash.

root@DietPi:~# dmesg -l err,crit,alert,emerg
[ 5335.445477] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[ 5335.447503] xhci_hcd 0000:01:00.0: HC died; cleaning up
[ 5335.525443] print_req_error: I/O error, dev sda, sector 367394728
[ 5335.525497] print_req_error: I/O error, dev sda, sector 367390632
[ 5335.525537] print_req_error: I/O error, dev sda, sector 367387560
[ 5335.525576] print_req_error: I/O error, dev sda, sector 367388584
[ 5335.525615] print_req_error: I/O error, dev sda, sector 367392680
[ 5335.525651] print_req_error: I/O error, dev sda, sector 361440792
[ 5335.525660] print_req_error: I/O error, dev sda, sector 361439768
[ 5335.525673] print_req_error: I/O error, dev sda, sector 367396776
[ 5335.525686] print_req_error: I/O error, dev sda, sector 361455128
[ 5335.525701] print_req_error: I/O error, dev sda, sector 361456152
[ 5335.533666] XFS (sda1): writeback error on sector 361437720
[ 5335.537049] XFS (sda1): writeback error on sector 361486872
[ 5335.539202] XFS (sda1): writeback error on sector 361462296
[ 5335.539232] XFS (sda1): metadata I/O error in "xlog_iodone" at daddr 0xee7b080 len 64 error 5
[ 5335.539302] XFS (sda1): Log I/O Error Detected.  Shutting down filesystem
[ 5335.539313] XFS (sda1): Please umount the filesystem and rectify the problem(s)
[ 5335.583470] XFS (sda1): writeback error on sector 330838608



root@DietPi:~# lsblk -o name,label,size,ro,type,mountpoint,partuuid,uuid
NAME        LABEL    SIZE RO TYPE MOUNTPOINT PARTUUID                             UUID
mmcblk0            119,1G  0 disk
├─mmcblk0p1 boot     256M  0 part /boot      2fed7fee-01                          592B-C92C
└─mmcblk0p2 rootfs 118,9G  0 part /          2fed7fee-02                          706944a6-7d0f-4a45-9f8c-7fb07375e9f7

EDIT: Googling the error messages leads me to believe that I’m not the only one with this sort of problem:

https://www.reddit.com/r/raspberry_pi/comments/f0xkyw/usb_complete_failure_xhci_host_controller_not/
https://www.raspberrypi.org/forums/viewtopic.php?t=262649
https://www.raspberrypi.org/forums/viewtopic.php?t=263332
https://archlinuxarm.org/forum/viewtopic.php?f=65&t=14172

well you have I/O errors on this drive. Usually this is pointing to issues with the drive itself or some bad cable or some issues on thy RPi. If possible try to run some self test on the SSD, use another USB slot, change cable and check if you can use own power in the SSD.

Well I’m no expert by any means, but I reckon the input/output errors are due to the connection to the drive being lost, no?

Regarding the drive: SMART tells me there are no issues (no critical warnings, no media and data integrity errors)

root@DietPi:~# smartctl -a /dev/sda -d sntjmicron
smartctl 7.1 2019-12-30 r5022 [armv7l-linux-4.19.118-v7l+] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZVLW256HEHP-000L7
Serial Number:                      S35ENA1K272709
Firmware Version:                   5L7QCXB7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 256.060.514.304 [256 GB]
Unallocated NVM Capacity:           0
Controller ID:                      2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          256.060.514.304 [256 GB]
Namespace 1 Utilization:            244.637.016.064 [244 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 b28102e969
Local Time is:                      Thu Jun 11 10:22:42 2020 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Warning  Comp. Temp. Threshold:     69 Celsius
Critical Comp. Temp. Threshold:     72 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.60W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     5.10W       -        -    2  2  2  2        0       0
 3 -   0.0400W       -        -    3  3  3  3      210    1500
 4 -   0.0050W       -        -    4  4  4  4     2200    6000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        32 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    5%
Data Units Read:                    38.939.450 [19,9 TB]
Data Units Written:                 29.900.926 [15,3 TB]
Host Read Commands:                 307.986.706
Host Write Commands:                229.812.741
Controller Busy Time:               1.028
Power Cycles:                       3.049
Power On Hours:                     886
Unsafe Shutdowns:                   141
Media and Data Integrity Errors:    0
Error Information Log Entries:      1.611
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               32 Celsius
Temperature Sensor 2:               36 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0       1611     0  0x010e  0x4202  0x028            0     0     -
  1       1610     0  0x000d  0x4202  0x028            0     0     -
  2       1609     0  0x0f0c  0x4202  0x028            0     0     -
  3       1608     0  0x0e0b  0x4202  0x028            0     0     -
  4       1607     0  0x0d0a  0x4202  0x028            0     0     -
  5       1606     0  0x0c09  0x4202  0x028            0     0     -
  6       1605     0  0x0b08  0x4202  0x028            0     0     -
  7       1604     0  0x0a07  0x4202  0x028            0     0     -
  8       1603     0  0x0906  0x4202  0x028            0     0     -
  9       1602     0  0x0805  0x4202  0x028            0     0     -
 10       1601     0  0x0704  0x4202  0x028            0     0     -
 11       1600     0  0x0603  0x4202  0x028            0     0     -
 12       1599     0  0x0502  0x4202  0x028            0     0     -
 13       1598     0  0x0401  0x4202  0x028            0     0     -
 14       1597     0  0x030f  0x4202  0x028            0     0     -
 15       1596     0  0x020e  0x4202  0x028            0     0     -
... (48 entries not shown)

and I somewhat doubt that the cable is faulty simply because everything is running smoothly except when I remote access the drive and put some stress on it. Of course I can’t be sure, but I think it’s reasonable to assume that a faulty drive/cable would impact everything and not just during certain scenarios.

I’ll try to get a more in-depth log for a crash, maybe that will tell us sómething.

EDIT: By “everything else is running smoothly” I mean that I have not encountered a single issue running downloads with JDownloader or SABNZBD for example.

I just can share the experience on our board. Most of the issues with external drives are connected to insufficient power. It might be that there as issues on the kernel but this is something out of our scope as DietPi is using Raspberry OS for PRi devices. This include the kernel as well.

BTW there is quite a long post on HDD issues already. Might not fit 100% but probably good to read
https://dietpi.com/forum/t/external-hdd-issues/3829/1

I had loads of issues with an SSD and DietPi. I moved back to Raspbian and have had no real issues. Personally I think it must be kernel related. A small SSD draws an insignificant amount of power so power, especially with an official supply is not the issue. If it was an HDD I’d agree with you.

There is quite a long thread on it.

Also note this is a Pi4; I ran a Pi2B+ with a PiDrive and DietPi Stretch for many years with no issues.

DietPi has no own kernel. It’s using the default kernel provided by the base image. In case of Raspberry Pi it’s using the kernel provided with Raspbian OS.

All I can say is that with the same HW, DietPi install regularly disconnects from the SSD, Raspbian doesn’t.

Just to provide an update on the whole situation:

Per this comment and based on this thread at the Raspberry Pi Forums, I switched from UAS to usb-storage and have not encountered a single issue since!

thx for sharing your solution, much appreciated