Random diet-Pi crashes -> reboot -> not coming back

psychoquaker · 15 June 2021 05:50

Hi,

I’ve got a strange behaviour with my dietpi running on a Pi Zero (with external USB adapter): The sytem crashes randomly after 1-5 days of operation.
In the syslog I can see, that there was a reboot happening but somehow the system does not really come back and I lose my LAN connection.

Do you have any ideas what could be the reason?
I tried to exchange the power supply - this did not solve the problem.
Attached you can find the syslog before the crash.

I saw in this forum, that this happend to other users a well, but I’m not sure, if it was solved:

I hope you can help me…
syslog.txt (65.1 KB)

Joulinar · 15 June 2021 08:42

Hi,

there is nothing in the logs indicating a crash. It simply seems to stop at 10:40am Jun 14th and got rebooted this morning around 06:39am

did you already activated full log mode? As well you could have a look for critical kernel messages. But it will show messages since the reboot only

dmesg -l err,crit,alert,emerg

psychoquaker · 15 June 2021 14:23

Hi,

if the system stops at 10:40 I would call this a crash
The leds were still on, but there was no SSH connection over LAN possible.
This I why I did a reboot today in the morning by removing the power supply.

Full log mode is active. Do you need any specific file?

MichaIng · 15 June 2021 15:05

/var/log/syslog and /var/log/kern.log should be the most interesting ones.

psychoquaker · 15 June 2021 19:15

The syslog is already above.
Here comes the kern.log
kern.log.txt (420 Bytes)

Joulinar · 15 June 2021 22:18

hmm some messages from the ethernet adapter. Maybe this leads to connection loss and the system become unavailable.

psychoquaker · 16 June 2021 05:17

first i also thought so, but I had a logging tool running and this stopped as well. also you can see in the syslog, that there are no entries anymore. so for me this rather looks like a system stop/crash…

Joulinar · 16 June 2021 07:37

well there is nothing in the logs giving an indication why your system stop.

MichaIng · 16 June 2021 09:50

Check also daemon.log and cron.log. It might just double with syslog, not sure currently. Also pinging the Pi doesn’t work anymore after the crash, right?

And please also run vcgencmd get_throttled by times, to check whether there happened any voltage or temperature related CPU throttling. If the output is throttled=0x0, then all is fine, else some throttling happened.

psychoquaker · 16 June 2021 13:40

Here’s the daemon log. The cron log is not existing.
In the daemon log there is an entry just before the system stopped:

Jun 14 10:39:01 DietPi systemd[1]: Starting Clean php session files...
Jun 14 10:39:02 DietPi systemd[1]: phpsessionclean.service: Succeeded.
Jun 14 10:39:02 DietPi systemd[1]: Started Clean php session files.

Is this something critical?

Concerning the vcgencmd get_throttled:
How low is this status valid? Do you have a script that could write a log?
daemon.zip (16.1 KB)

MichaIng · 16 June 2021 21:15

Is the PHP sessions module actually used? ls -l /run/php_sessions
If it has no content, you can disable the service/timer for it: systemctl disable --now phpsessionclean.timer
But I cannot imagine how it would cause a crash .

vcgencmd get_throttled has bits for past events (of current boot session, of course) and for current events: https://github.com/raspberrypi/documentation/blob/master/raspbian/applications/vcgencmd.md#get_throttled
Those can stack

Joulinar · 16 June 2021 21:50

there was still activity a minute later

Jun 14 10:39:01 DietPi systemd[1]: Starting Clean php session files...
Jun 14 10:39:02 DietPi systemd[1]: phpsessionclean.service: Succeeded.
Jun 14 10:39:02 DietPi systemd[1]: Started Clean php session files.
Jun 14 10:40:01 DietPi CRON[17678]: (www-data) CMD (php /var/www/nextcloud/cron.php)

psychoquaker · 17 June 2021 06:05

Yes, there was one more minute of activity. So the last thing that happened was the nextcloud cron job.

I will log the vcgencmd get_throttled now and see, if something happens.
Also I attached an oscilloscope at the 5V supply to check whether there are voltage dips.

psychoquaker · 17 June 2021 18:40

Update:

During the day it did log the 5V supply (via external oscilloscope) and also via vcgencmd get_throttled.
Result: neither voltage dips nor a vcgencmd log entry.
So it looks like the power supply is not the issue
Later today I had another crash. Again the last syslog entry is:

Jun 17 14:55:01 DietPi CRON[9574]: (www-data) CMD (php /var/www/nextcloud/cron.php)

Now I tried to uninstall nextcloud.

Could the RAM usage be an issue? As I said: I’m running a PI Zero with little RAM.
Is there any command to log when the system is out of memory?

Joulinar · 18 June 2021 09:22

well you coud use free -m to watch the ram usage. or if you have a spare device and/or VM, we could setup a monitoring solution that will collect performance data from the PI Zero. Why to use a spare device? Because your system seems already overloaded and It doesn’t make sense to put additional stuff on it like InfluxDB + Grafana

MichaIng · 18 June 2021 10:53

Uninstalling Nextcloud is a bit radical. You could disable the cron job and see if it solves the issue. Then we could try to identify the exact job which causes the creash:

crontab -u www-data -e
Comment the Nextcloud line by prefixing it with #
Save via CTRL+O and exit via CTRL+X

psychoquaker · 24 June 2021 06:28

Uninstalling nextCloud did not help.
This morning I had another crash.

See logs attached.

Do you have any more ideas?
crash4.zip (31.3 KB)

Joulinar · 24 June 2021 07:56

this seems to be last message inside the syslog before system was restarted

Jun 24 07:16:09 DietPi rngd[166]: stats: bits received from HRNG source: 2700064
Jun 24 07:16:09 DietPi rngd[166]: stats: bits sent to kernel pool: 2641536
Jun 24 07:16:09 DietPi rngd[166]: stats: entropy added to kernel pool: 2641536
Jun 24 07:16:09 DietPi rngd[166]: stats: FIPS 140-2 successes: 135
Jun 24 07:16:09 DietPi rngd[166]: stats: FIPS 140-2 failures: 0
Jun 24 07:16:09 DietPi rngd[166]: stats: FIPS 140-2(2001-10-10) Monobit: 0
Jun 24 07:16:09 DietPi rngd[166]: stats: FIPS 140-2(2001-10-10) Poker: 0
Jun 24 07:16:09 DietPi rngd[166]: stats: FIPS 140-2(2001-10-10) Runs: 0
Jun 24 07:16:09 DietPi rngd[166]: stats: FIPS 140-2(2001-10-10) Long run: 0
Jun 24 07:16:09 DietPi rngd[166]: stats: FIPS 140-2(2001-10-10) Continuous run: 0
Jun 24 07:16:09 DietPi rngd[166]: stats: HRNG source speed: (min=418.982; avg=867.935; max=1110.109)Kibits/s
Jun 24 07:16:09 DietPi rngd[166]: stats: FIPS tests speed: (min=2.953; avg=6.683; max=8.867)Mibits/s
Jun 24 07:16:09 DietPi rngd[166]: stats: Lowest ready-buffers level: 2
Jun 24 07:16:09 DietPi rngd[166]: stats: Entropy starvations: 0
Jun 24 07:16:09 DietPi rngd[166]: stats: Time spent starving for entropy: (min=0; avg=0.000; max=0)us

But this is something happening on hourly basis and I’m not sure if this is a trigger

Very last message is the hourly cron job

Jun 24 07:17:01 DietPi CRON[10447]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)

MichaIng · 24 June 2021 10:50

To mute those random generator daemon logs, migrate to the new one:

apt install rng-tools5

To rule out Cron jobs completely:

systemctl stop cron

This will also disable hourly /var/log RAMlog clearing, to keep in mind in case something is logging heavily there and is able to fill it up to its 50 MiB limit within a few hours. But usually it should survive a few days or more.

psychoquaker · 24 June 2021 11:48

Would you really recommend to stop cron completely?

I would expect that this can lead to other unexpected behavior…?