Help! Cannot Understand Logs

Required Information

  • DietPi version:
G_DIETPI_VERSION_CORE=8
G_DIETPI_VERSION_SUB=7
G_DIETPI_VERSION_RC=1
G_GITBRANCH='master'
G_GITOWNER='MichaIng'
G_LIVE_PATCH_STATUS[0]='not applicable'
  • Distro version: bullseye 0
  • Kernel version:
Linux Grimlock 5.15.32-v8+ #1538 SMP PREEMPT Thu Mar 31 19:40:39 BST 2022 aarch64 GNU/Linux
  • SBC model: RPi 3 Model B+ (aarch64)
  • Power supply used: Official 5.1V 2.5A Raspberry Pi Micro USB Power Supply
  • SD card used: SanDisk ultra 32GB

Hello!
I’ve been having weird issues for the past month wherein my Pi locks up & freezes when put under high load. (Accessing Nextcloud files, or browsing my Photoprism gallery.)
It’s extremely frustrating, & no amount of waiting results in it coming back online, I just have to pull the power.

I highly doubt it’s the power supply, I do have a external 2.5" HDD connected (powered through the Pi), but I’m regularly running vcgencmd get_throttled just waiting to catch some sort of error, but… nothing.
The SDcard is unlikely as well. It’s pretty new (~6ish months?), no problems as far as I can tell.

I’ve tried overclocking, underclocking, overvolting, etc. to no avail.
While I didn’t run extensive tests (not exactly needed at the Pi locks up imminently under load), a fresh install of dietpi doesn’t seem to run into this issue.

I could never find anything in journal, syslog afterwards, so I ran journalctl -xf & then stressed the Pi.
I got the following:

Aug 16 09:51:33 Grimlock kernel: kworker/3:1H: page allocation failure: order:0, mode:0x800(GFP_NOWAIT), nodemask=(null),cpuset=/,mems_allowed=0
Aug 16 09:51:33 Grimlock kernel: CPU: 3 PID: 90 Comm: kworker/3:1H Tainted: G         C        5.15.32-v8+ #1538
Aug 16 09:51:33 Grimlock kernel: Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
Aug 16 09:51:33 Grimlock kernel: Workqueue: kblockd blk_mq_run_work_fn
Aug 16 09:51:33 Grimlock kernel: Call trace:
Aug 16 09:51:33 Grimlock kernel:  dump_backtrace+0x0/0x1b8
Aug 16 09:51:33 Grimlock kernel:  show_stack+0x20/0x30
Aug 16 09:51:33 Grimlock kernel:  dump_stack_lvl+0x8c/0xb8
Aug 16 09:51:33 Grimlock kernel:  dump_stack+0x18/0x34
Aug 16 09:51:33 Grimlock kernel:  warn_alloc+0x11c/0x1a0
Aug 16 09:51:33 Grimlock kernel:  __alloc_pages_slowpath.constprop.150+0xb78/0xba0
Aug 16 09:51:33 Grimlock kernel:  __alloc_pages+0x2b0/0x330
Aug 16 09:51:33 Grimlock kernel:  new_slab+0x34c/0x358
Aug 16 09:51:34 Grimlock kernel:  ___slab_alloc.part.102.constprop.112+0x434/0x7d8
Aug 16 09:51:34 Grimlock kernel:  __slab_alloc.isra.103.constprop.111+0x58/0x90
Aug 16 09:51:34 Grimlock kernel:  __kmalloc+0x508/0x510
Aug 16 09:51:34 Grimlock kernel:  bcm2835_dma_create_cb_chain+0x78/0x380
Aug 16 09:51:34 Grimlock kernel:  bcm2835_dma_prep_slave_sg+0x178/0x350
Aug 16 09:51:34 Grimlock kernel:  bcm2835_sdhost_request+0x460/0x668
Aug 16 09:51:34 Grimlock kernel:  __mmc_start_request+0x84/0x1d0
Aug 16 09:51:34 Grimlock kernel:  mmc_start_request+0x8c/0xb8
Aug 16 09:51:34 Grimlock kernel:  mmc_blk_mq_issue_rq+0x368/0x9b0
Aug 16 09:51:34 Grimlock kernel:  mmc_mq_queue_rq+0x11c/0x2b8
Aug 16 09:51:34 Grimlock kernel:  blk_mq_dispatch_rq_list+0x128/0x7e8
Aug 16 09:51:34 Grimlock kernel:  __blk_mq_do_dispatch_sched+0x250/0x2d0
Aug 16 09:51:34 Grimlock kernel:  __blk_mq_sched_dispatch_requests+0x160/0x1e8
Aug 16 09:51:34 Grimlock kernel:  blk_mq_sched_dispatch_requests+0x44/0x80
Aug 16 09:51:34 Grimlock kernel:  __blk_mq_run_hw_queue+0x6c/0xa8
Aug 16 09:51:34 Grimlock kernel:  blk_mq_run_work_fn+0x28/0x38
Aug 16 09:51:35 Grimlock kernel:  process_one_work+0x1f8/0x4a0
Aug 16 09:51:35 Grimlock kernel:  worker_thread+0x50/0x480
Aug 16 09:51:35 Grimlock kernel:  kthread+0x140/0x158
Aug 16 09:51:35 Grimlock kernel:  ret_from_fork+0x10/0x20
Aug 16 09:51:35 Grimlock kernel: Mem-Info:
Aug 16 09:51:35 Grimlock kernel: active_anon:10151 inactive_anon:198111 isolated_anon:197
                                  active_file:3504 inactive_file:3475 isolated_file:0
                                  unevictable:2033 dirty:24 writeback:40
                                  slab_reclaimable:6909 slab_unreclaimable:7387
                                  mapped:16629 shmem:9959 pagetables:2709 bounce:0
                                  kernel_misc_reclaimable:0
                                  free:6189 free_pcp:32 free_cma:135
Aug 16 09:51:35 Grimlock kernel: Node 0 active_anon:40604kB inactive_anon:792444kB active_file:14016kB inactive_file:13900kB unevictable:8132kB isolated(anon):788kB isolated(file):0kB mapped:66516kB dirty:96kB writeback:160kB shmem:39836kB writeback_tmp:0kB kernel_stack:4928kB pagetables:10836kB all_unreclaimable? no
Aug 16 09:51:35 Grimlock kernel: DMA free:24756kB min:24576kB low:28672kB high:32768kB reserved_highatomic:0KB active_anon:40604kB inactive_anon:792444kB active_file:14016kB inactive_file:14100kB unevictable:8132kB writepending:256kB present:1021952kB managed:982688kB mlocked:8132kB bounce:0kB free_pcp:128kB local_pcp:56kB free_cma:540kB
Aug 16 09:51:35 Grimlock kernel: lowmem_reserve[]: 0 0 0 0
Aug 16 09:51:35 Grimlock kernel: DMA: 1087*4kB (UMEC) 510*8kB (UMEC) 231*16kB (UMEC) 138*32kB (UEC) 41*64kB (UME) 22*128kB (UME) 9*256kB (UME) 1*512kB (E) 0*1024kB 0*2048kB 0*4096kB = 24796kB
Aug 16 09:51:35 Grimlock kernel: 18164 total pagecache pages
Aug 16 09:51:35 Grimlock kernel: 475 pages in swap cache
Aug 16 09:51:35 Grimlock kernel: Swap cache stats: add 1765, delete 1286, find 0/0
Aug 16 09:51:35 Grimlock kernel: Free swap  = 2089404kB
Aug 16 09:51:35 Grimlock kernel: Total swap = 2097148kB
Aug 16 09:51:35 Grimlock kernel: 255488 pages RAM
Aug 16 09:51:35 Grimlock kernel: 0 pages HighMem/MovableOnly
Aug 16 09:51:35 Grimlock kernel: 9816 pages reserved
Aug 16 09:51:35 Grimlock kernel: 16384 pages cma reserved
Aug 16 09:51:35 Grimlock kernel: SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
Aug 16 09:51:35 Grimlock kernel:   cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0
Aug 16 09:51:35 Grimlock kernel:   node 0: slabs: 130, objs: 2080, free: 0
Aug 16 09:51:35 Grimlock kernel: SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
Aug 16 09:51:35 Grimlock kernel:   cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0
Aug 16 09:51:35 Grimlock kernel:   node 0: slabs: 130, objs: 2080, free: 0
Aug 16 09:51:35 Grimlock kernel: SLUB: Unable to allocate memory on node -1, gfp=0x900(GFP_NOWAIT|__GFP_ZERO)
Aug 16 09:51:35 Grimlock kernel:   cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0
Aug 16 09:51:35 Grimlock kernel:   node 0: slabs: 130, objs: 2080, free: 0

dmesg -wH reports the same.
After rebooting, I cannot find them in the log files, so I guess they’re not saved to disk in time?
Any… idea what it means?
Help would be greatly appreciated :slight_smile:

on a default system, logs are stored on tmpfs and are not persistent. Means, they are lost on reboot. You could try following to enable persistent system logs:

dietpi-software uninstall 103 # uninstalls DIetPi-RAMlog
mkdir /var/log/journal # triggers systemd-journald logs to disk
reboot # required to finalise the RAMlog uninstall

Then you can check system logs via:

journalctl

which will then show as well logs from previous boot sessions. To limit the size, you can additionally e.g. apply the following:

mkdir -p /etc/systemd/journald.conf.d
cat << '_EOF_' > /etc/systemd/journald.conf.d/99-custom.conf
[Journal]
SystemMaxFiles=2
MaxFileSec=7day
_EOF_

This will limit logs to 14 days split across two journal files, so that with rotation you will always have between 7 and 14 days of logs available.


Have a look if you could use an own PSU or powered USB hub for your HDD. USB ports of a RPI might not be able to fully power external HDD/SSD. Especially in peak situation, this could lead to data corruption/issues.

1 Like

Alright, thanks for the options.
I did a bit of testing as well:

I’d forgot to mention persistent logging was already on, even though I followed all your steps, no avail on the logs.
Another issue is the log dump I have in the post might just be a legitimate crash that just so happened to occur at that time.

I hooked up the HDD to external power, so power issues are defiantly sorted.
To see if I could isolate the problem I ran some tests:
Stress Testing the CPU
Stress Testing Disk (SDcard & HDD) & RAM

They passed flawlessly. I ran them multiple times, once for 60+ minutes, not a hitch. I can even achieve a stable overclock.

So now what’s the problem?
Nextcloud loads up the dashboard, notes, calendar, etc. just fine. It’s when I ask it to do disk-intensive tasks (Browse files, Photos, Music) that the Pi freezes.
And there is absolutely nothing in the logs, nor the nextcloud.log

Well perhaps it’s just a disk error?
The disk is doing fine from smartctl -a & samba works just as it’s supposed to.

How do I get to the bottom of this?

Okay, I may have found the culprit: swap.

Currently, I’ve allocated 4G of swap space as recommended by Photoprism on my external HDD though dietpi-drive_manager.
Whenever my memory cannot keep up & needs to swap, it freezes up the entire Pi.

Logically I tried:

  1. Disabling swap. This fixed the freezing issue. Of course, due to memory limitations, functionality was heavily limited as everything kept stopping due to Out-of-Memory issues.

  2. Using ZRAM, this made the situation better, much fewer OOM errors, but it’s not a replacement that I can use.

  3. Swapping on the SDCard, oh boy was this ever painful. The SDCard is pretty slow to begin with & swapping to it made the system slow to a crawl, (not to mention it’ll likely have a significant impact on the lifetime of the card) BUT it works! I have almost all functionality other than the angry logs:

Aug 17 11:09:10 Grimlock mariadbd[738]: 2022-08-17 11:09:10 231 [Warning] Aborted connection 231 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:09:11 Grimlock mariadbd[738]: 2022-08-17 11:09:10 225 [Warning] Aborted connection 225 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:09:11 Grimlock mariadbd[738]: 2022-08-17 11:09:11 223 [Warning] Aborted connection 223 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:09:11 Grimlock mariadbd[738]: 2022-08-17 11:09:11 226 [Warning] Aborted connection 226 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:09:12 Grimlock mariadbd[738]: 2022-08-17 11:09:12 229 [Warning] Aborted connection 229 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:09:12 Grimlock mariadbd[738]: 2022-08-17 11:09:12 227 [Warning] Aborted connection 227 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:09:12 Grimlock mariadbd[738]: 2022-08-17 11:09:12 224 [Warning] Aborted connection 224 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:09:12 Grimlock mariadbd[738]: 2022-08-17 11:09:12 230 [Warning] Aborted connection 230 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:09:12 Grimlock systemd[1]: phpsessionclean.service: Succeeded.
Aug 17 11:09:12 Grimlock systemd[1]: Finished Clean php session files.
Aug 17 11:09:12 Grimlock mariadbd[738]: 2022-08-17 11:09:12 228 [Warning] Aborted connection 228 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:09:12 Grimlock systemd[1]: phpsessionclean.service: Consumed 1.577s CPU time.
Aug 17 11:09:22 Grimlock mariadbd[738]: 2022-08-17 11:09:22 233 [Warning] Aborted connection 233 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:09:23 Grimlock mariadbd[738]: 2022-08-17 11:09:23 234 [Warning] Aborted connection 234 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)

Aug 17 11:09:36 Grimlock mariadbd[738]: 2022-08-17 11:09:36 236 [Warning] Aborted connection 236 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:02 Grimlock CRON[4108]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Aug 17 11:10:26 Grimlock mariadbd[738]: 2022-08-17 11:10:26 239 [Warning] Aborted connection 239 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:26 Grimlock mariadbd[738]: 2022-08-17 11:10:26 254 [Warning] Aborted connection 254 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:26 Grimlock mariadbd[738]: 2022-08-17 11:10:26 256 [Warning] Aborted connection 256 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:26 Grimlock mariadbd[738]: 2022-08-17 11:10:26 250 [Warning] Aborted connection 250 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:26 Grimlock mariadbd[738]: 2022-08-17 11:10:26 245 [Warning] Aborted connection 245 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:26 Grimlock mariadbd[738]: 2022-08-17 11:10:26 258 [Warning] Aborted connection 258 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:26 Grimlock mariadbd[738]: 2022-08-17 11:10:26 242 [Warning] Aborted connection 242 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:26 Grimlock mariadbd[738]: 2022-08-17 11:10:26 249 [Warning] Aborted connection 249 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:26 Grimlock mariadbd[738]: 2022-08-17 11:10:26 246 [Warning] Aborted connection 246 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:30 Grimlock mariadbd[738]: 2022-08-17 11:10:30 259 [Warning] Aborted connection 259 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:33 Grimlock mariadbd[738]: 2022-08-17 11:10:33 268 [Warning] Aborted connection 268 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:47 Grimlock mariadbd[738]: 2022-08-17 11:10:47 262 [Warning] Aborted connection 262 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:52 Grimlock mariadbd[738]: 2022-08-17 11:10:52 252 [Warning] Aborted connection 252 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:10:52 Grimlock mariadbd[738]: 2022-08-17 11:10:52 241 [Warning] Aborted connection 241 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:11:16 Grimlock mariadbd[738]: 2022-08-17 11:11:16 269 [Warning] Aborted connection 269 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:11:34 Grimlock jackett[821]: 08-17 11:11:34 Warn As of "08/17/2022 05:41:04 +00:00", the heartbeat has been running for "00:00:23.8013463" which is longer than "00:00:01". This could be caused by thread pool starvation.
Aug 17 11:11:34 Grimlock jackett[821]: 08-17 11:11:34 Warn As of "08/17/2022 05:41:04 +00:00", the heartbeat has been running for "00:00:23.8013463" which is longer than "00:00:01". This could be caused by thread pool starvation.
Aug 17 11:13:01 Grimlock mariadbd[738]: 2022-08-17 11:12:52 308 [Warning] Aborted connection 308 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:13:01 Grimlock mariadbd[738]: 2022-08-17 11:12:53 307 [Warning] Aborted connection 307 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:13:01 Grimlock mariadbd[738]: 2022-08-17 11:12:53 310 [Warning] Aborted connection 310 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:13:01 Grimlock mariadbd[738]: 2022-08-17 11:12:53 311 [Warning] Aborted connection 311 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:13:01 Grimlock mariadbd[738]: 2022-08-17 11:12:53 309 [Warning] Aborted connection 309 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:13:02 Grimlock systemd[1]: Started Nextcloud cron.php job.
Aug 17 11:13:13 Grimlock jackett[821]: 08-17 11:13:13 Warn As of "08/17/2022 05:42:33 +00:00", the heartbeat has been running for "00:00:02.1393685" which is longer than "00:00:01". This could be caused by thread pool starvation.
Aug 17 11:13:15 Grimlock jackett[821]: 08-17 11:13:15 Warn As of "08/17/2022 05:42:33 +00:00", the heartbeat has been running for "00:00:02.1393685" which is longer than "00:00:01". This could be caused by thread pool starvation.
Aug 17 11:14:24 Grimlock jackett[821]: 08-17 11:14:21 Warn As of "08/17/2022 05:43:44 +00:00", the heartbeat has been running for "00:00:02.3844794" which is longer than "00:00:01". This could be caused by thread pool starvation.
Aug 17 11:14:25 Grimlock jackett[821]: 08-17 11:14:25 Warn As of "08/17/2022 05:43:44 +00:00", the heartbeat has been running for "00:00:02.3844794" which is longer than "00:00:01". This could be caused by thread pool starvation.
Aug 17 11:14:45 Grimlock kernel: [ 1342.264675] perf: interrupt took too long (6219 > 6218), lowering kernel.perf_event_max_sample_rate to 32000
Aug 17 11:15:04 Grimlock CRON[4741]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 17 11:15:06 Grimlock mariadbd[738]: 2022-08-17 11:15:06 312 [Warning] Aborted connection 312 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:15:07 Grimlock mariadbd[738]: 2022-08-17 11:15:07 314 [Warning] Aborted connection 314 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:15:08 Grimlock mariadbd[738]: 2022-08-17 11:15:08 313 [Warning] Aborted connection 313 to db: 'nextcloud' user: 'nextclouduser' host: 'localhost' (Got timeout reading communication packets)
Aug 17 11:15:19 Grimlock php[4581]: PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away in /var/www/nextcloud/3rdparty/doctrine/dbal/src/Driver/PDO/Statement.php:92
Aug 17 11:15:19 Grimlock php[4581]: Stack trace:
Aug 17 11:15:19 Grimlock php[4581]: #0 /var/www/nextcloud/3rdparty/doctrine/dbal/src/Driver/PDO/Statement.php(92): PDOStatement->execute()
Aug 17 11:15:19 Grimlock php[4581]: #1 /var/www/nextcloud/3rdparty/doctrine/dbal/src/Connection.php(1165): Doctrine\DBAL\Driver\PDO\Statement->execute()
Aug 17 11:15:19 Grimlock php[4581]: #2 /var/www/nextcloud/lib/private/DB/Connection.php(294): Doctrine\DBAL\Connection->executeStatement()
Aug 17 11:15:19 Grimlock php[4581]: #3 /var/www/nextcloud/3rdparty/doctrine/dbal/src/Query/QueryBuilder.php(354): OC\DB\Connection->executeStatement()
Aug 17 11:15:19 Grimlock php[4581]: #4 /var/www/nextcloud/lib/private/DB/QueryBuilder/QueryBuilder.php(281): Doctrine\DBAL\Query\QueryBuilder->execute()
Aug 17 11:15:19 Grimlock php[4581]: #5 /var/www/nextcloud/lib/private/AppConfig.php(325): OC\DB\QueryBuilder\QueryBuilder->execute()
Aug 17 11:15:19 Grimlock php[4581]: #6 /var/www/nextcloud/lib/private/AllConfig.php(231): OC\AppConfig->deleteKey()
Aug 17 11:15:19 Grimlock php[4581]: #7 /var/www/nextcloud/lib/base.php(702): OC\AllConfig->deleteAppValue()
Aug 17 11:15:19 Grimlock php[4581]: #8 /var/www/nextcloud/lib/base.php(1104): OC::init()
Aug 17 11:15:19 Grimlock php[4581]: #9 /var/www/nextcloud/cron.php(43): require_once('...')
Aug 17 11:15:19 Grimlock php[4581]: #10 {main}

After that, going back to any size of swap oh the HDD, still freezes the entire system.
But why??
It was working fine until now, nothing’s changed, while I would love to none of the above solutions will work for me.

Yep,
dietpi-drive_manager complete freeze.
dphys-swapfile complete freeze.

dd if=/dev/zero of=/mnt/HDD1/.swapfile bs=1M count=4096
mkswap /mnt/HDD1/.swapfile
chmod 0600 /mnt/HDD1/.swapfile
swapon /mnt/HDD1/.swapfile

This… survives for a bit longer , but in the end complete system freeze.

Are you able to attach a USB pen stick to test behaviour on it?

Not sure if there is time before the system freeze, but you could try to check kernel error logs

dmesg -l err,crit,alert,emerg

Sure, I tested with a spare Sandisk USB 2.0 Flash drive I had lying around.
Formatted it Ext4, & mounted with dietpi-drive_manager

Using dietpi to create swap - Same results, complete system lock up almost immediately. Ran your command, dmesg -w, tail -f /var/log/syslog before the freeze, nothing.

Using mkswap - This is much the same BUT:
I noticed this both wit the flash drive & HDD, if I removed the load (closing the tab) immediately after the freeze, the Pi might recover.

From there:
Nothing in the logs, no errors, warnings, or anything out of the ordinary.
The swap usage is very low (2-4 M). I tested with swappiness set to 1 & 20.
htop reports the load average as very high. 14,30 is usually what I see if I can recover.
Nothing abnormal in the processes though.

hmm tested on a RPi4B without issues

[ SUB1 ] DietPi-Set_swapfile > Applying 1024 /mnt/pve/local2/.swapfile
[ INFO ] DietPi-Set_swapfile | Disabling and deleting all existing swap files
[  OK  ] DietPi-Set_swapfile | swapoff -a
[  OK  ] DietPi-Set_swapfile | Free space check: path=/mnt/pve/local2/ | available=26353 MiB | required=1024 MiB
[  OK  ] DietPi-Set_swapfile | Generating new swap space
[ INFO ] DietPi-Set_swapfile | Size = 1024 MiB
[ INFO ] DietPi-Set_swapfile | Path = /mnt/pve/local2/.swapfile
[  OK  ] DietPi-Set_swapfile | fallocate -l 1024M /mnt/pve/local2/.swapfile
[  OK  ] DietPi-Set_swapfile | mkswap /mnt/pve/local2/.swapfile
[  OK  ] DietPi-Set_swapfile | swapon /mnt/pve/local2/.swapfile
[  OK  ] DietPi-Set_swapfile | chmod 0600 /mnt/pve/local2/.swapfile
[  OK  ] DietPi-Set_swapfile | eval echo '/mnt/pve/local2/.swapfile none swap sw' >> /etc/fstab
[  OK  ] DietPi-Set_swapfile | Setting in /boot/dietpi.txt adjusted: AUTO_SETUP_SWAPFILE_SIZE=1024
[  OK  ] DietPi-Set_swapfile | Setting in /boot/dietpi.txt adjusted: AUTO_SETUP_SWAPFILE_LOCATION=/mnt/pve/local2/.swapfile
[ INFO ] DietPi-Set_swapfile | Setting /tmp tmpfs size: 4443 MiB
[  OK  ] DietPi-Set_swapfile | mount -o remount /tmp

@MichaIng any ideas why creating SWAP file could be failing?

Creating the swap still works but using it the system freezes on high I/O load, right?

A HDD should be able to handle concurrent swapping and file access, while of course both share the limited bandwidth on RPi 3 USB.

Can you try to max/disable APM:

hdparm -B 254 /dev/sda

Most newer drives do not support it (it shows an info then, meaning it cannot be the culprit), and where it works it disables idle spindown, but worth a test.

@MichaIng
But it was failing on a flash drive as well. Therefore hdparm should not have any effect, Isn’t it?

Ah right, there it shouldn’t have any effect. Hmm, I have no RPi 3B+, but on RPi 2 I use a swap file together with Nextcloud data on USB HDD since a long time without issues :thinking:.

Just to rule it out, you did try different USB ports?

Which filesystem type is the drive formatted with?

Well, just in case:
hdparm -B 254 /dev/sda gives me:

/dev/sda2:
 setting Advanced Power Management level to 0xfe (254)
 APM_level      = 254

But no, it doesn’t fix the problem, but it does seem to help, monitoring htop, I can watch it freeze & unfreeze at the load average climbs up to 7 before permanently freezing, unlike the standard 2-3.

This is a new problem, I’ve been using this HDD for swap just fine over the past year.
I’ve tried different ports on both the Pi & Powered Hub, 3.0 & 2.0
All of my drives are formatted Ext4.

Just in case it’s helpful, I’ve attached a run of sudo atop -w foo.log 1 300 during the freeze.
foo.log.txt (395.7 KB)
(I’ve had to rename it to upload)

Yes, we have set some restrictions on allowed file extension.