Troubleshooting help - how to identify "Random" reboots and/or non-responsive device

Having issues with your DietPi installation, or, found a bug? Post it here.
User avatar
MichaIng
Site Admin
Posts: 2293
Joined: Sat Nov 18, 2017 6:21 pm

Re: Troubleshooting help - how to identify "Random" reboots and/or non-responsive device

Post by MichaIng »

@1activegeek
As mentioned above, another way is to disable RAMlog and enable persistent journald log, so you can check last entries before the reboot.

Does it hang on MOTD indeed, so when accessing internet to retrieve the current MOTD (although this should only occur once a day)? This has a timeout of 2 seconds (to not delay the banner too much), so if it didn't get everything until then, you end up with an empty MOTD until it gets reset on daily cron.

But if it hangs longer on network access there might be some other issue. Not sure if PoE and network access itself can conflict/interfere by times, definitely something to check journalctl/dmesg and persistent logs about in case.

One last other idea, as this caused several different issues, is that the device ran out of entropy. Should not lead to a reboot but to hanging boot/network/services, at least for a while after boot. As we anyway install this on all DietPi systems with v6.25 you might want to give it a shot already: G_AGI haveged
Ah to check if indeed entropy is an issue: dmesg | grep 'crng init' should have a few seconds timestamp. If this is lets say 20 seconds or more, then the above entropy daemon can indeed speed up boot and resolve issues of several different kinds: https://github.com/MichaIng/DietPi/issues/2806
1activegeek
Posts: 8
Joined: Sat Jun 15, 2019 8:30 pm

Re: Troubleshooting help - how to identify "Random" reboots and/or non-responsive device

Post by 1activegeek »

Ok so I managed to switch over to persistent logging. In doing so I'm really confused about some of what I'm seeing - though I surmise one is because I wasn't intended to view the raw log entries in the files in /var/log/journal folders. I tried to, but they're all gibberish as far as I can see with bits/pieces of journal entries.

What is more confusing though is that I'm not seeing the journal entries relative to the most recent restart. Right now logging in I can see that the device has been up 15 minutes as it reports. When checking the journalctl output, the last log line I'm seeing is 11:59:17 EDT, but current time is 15:13, and yet the device is reporting current date output as 7/4/19 11:52:05.

So there are logs future from current date/time. And additionally in the log output I can see some random lines intermingled that are from Jul 2 right in between lines from Jul 4.

Also I snagged a screenshot of what I mean with the MOD - doesn't show you much other than where the hangup is - and I've not let it sit long enough to see how long it will last, but its definitely more than like 30 seconds that I've waited before.

So a bunch of oddities and the date problem leads me to believe there is potentially an on-board problem since it can't seem to keep time anymore.

Code: Select all

root@odyssey:/mnt/dietpi_userdata# date
Thu  4 Jul 11:52:05 EDT 2019
root@odyssey:/mnt/dietpi_userdata# journalctl
-- Logs begin at Thu 2019-07-04 11:17:01 EDT, end at Thu 2019-07-04 11:59:17 EDT. --
Attachments
Screen Shot 2019-07-04 at 3.00.53 PM.png
User avatar
MichaIng
Site Admin
Posts: 2293
Joined: Sat Nov 18, 2017 6:21 pm

Re: Troubleshooting help - how to identify "Random" reboots and/or non-responsive device

Post by MichaIng »

@1activegeek
Sorry for the late reply.

Most probably some time sync update that caused the future timestamps. However anther reboot should solve, or simply ignore ;).

What you describe with the hanging MOTD pretty much sounds like an entropy issue or one with IPv6.

So I would try to disable IPv6 (dietpi-config > Network Options: Adapters) and install an entropy daemon as mentioned above. Btw better than haveged on RPi is:
apt install rng-tools
This is an alternative that consumes less RAM but does not work on all machines. But on RPi it works and is default on fresh Raspbian as well.
1activegeek
Posts: 8
Joined: Sat Jun 15, 2019 8:30 pm

Re: Troubleshooting help - how to identify "Random" reboots and/or non-responsive device

Post by 1activegeek »

@MichaIng
Sorry to ping you on an old thread, but I wanted to circle back to highlight something in DietPi that must be either incompatible or problematic with the Rock64 board I have. A few things of note to highlight my rationale:
- After the issues described here which were persistent over and over, fresh new images, different SD cards, etc - always the same issue. After running at most a day, it would continually have random reboots and some would cause the system to hang. This never went away and I was forced to move the workload I had off of the device.
- I've now come back to trying to play around again and found the same issue. This time I tried using the Rock64 default images for Ubuntu, and it has now been up and running for almost 5 days straight - same hardware, same SD card, and same network/power supply as I was running when I just had DietPi in it and crashing.
- I don't know how to identify if I have a V2 or V3, but the Rock64 site does indicate that the Rock64 is only compatible with DietPi for Ver 2 ONLY. I'm wondering if I have a Ver 3, and there is something in your image that is incompatible with the newer board.
- My guess is that something may have changed in your code around the timeframe of my initial post here. Prior to my posts noticing that the device was having the reboot/freeze issue, I was running DietPi religiously without issue for quite some time.

All this to say, I'd love to try and help you isolate or find out what could be wrong - but I'm not sure where to start to easily help with this. I'm not a coder so I'm not likely going to be solid at digging through the repo to find what major changes happened that could be indicative of the version that made the shift, but knowing I can successfully run an Ubuntu base image with no issue, tells me something in the image must be the cause. I'd rather run DietPi as I like your setup, menus, custom scripts, and slimmed image - but I can't sacrifice features for stability.
1activegeek
Posts: 8
Joined: Sat Jun 15, 2019 8:30 pm

Re: Troubleshooting help - how to identify "Random" reboots and/or non-responsive device

Post by 1activegeek »

As a followup, and this would lead me to believe possibly the Rock64 documentation is wrong - I believe I have a Ver2, as I now see on the board actually it says "Rock64_VER2.0" printed. That said, I do notice in the diagrams on the wiki, there main difference I believe at a quick glance - is V3 has a RTC built into the board. Wondering if its possible that something was changed around the clocking to address the Ver 3.0 boards? And/or if something around time was the cause of the original issue.
Post Reply