Linux Mint / LinuxCNC 2.8.0 freezing - SOLVED?

More
28 Aug 2021 12:26 - 28 Aug 2021 12:26 #218990 by Muzzer
The last freezes were almost 2 days ago during and following an update using Update Manager. Since then it's been running continuously without issue. It's too early to say but it's possible that fixed it. Dangerous talk, I know.

I've got Axis Sim running in a continuous loop plus Youtube streaming endless music video, Visual Studio code sitting in the background, Thunderbird email and VNC server running.

It has a Sunon fan in the case running all the time and I've not been able to get the core temperature above 59C no matter what I do.

About to go for a few hours walk in the hills. If it's still running on our return, I'll be tempted to consider itself fixed. Fingers crossed....
Last edit: 28 Aug 2021 12:26 by Muzzer.

Please Log in or Create an account to join the conversation.

More
01 Sep 2021 15:38 #219347 by Muzzer
I hope this isn't premature but I'm becoming more confident that I've fixed the issue here. After an unplanned excursion into Linux crash logs and bug fixes, I appear to have a solution. If so, it's possible someone else may find this useful.

The error messages in the crash log seemed to feature 2 main themes:
  • PAM adding faulty module: pam_kwallet5.so
  • [drm:intel_pipe_update_end [i915}} *ERROR* Atomic update failure on pipe B .........etc.
The first seems to refer to a password wallet (KWallet), so was easy to disable.  forums.linuxmint.com/viewtopic.php?t=286950

The second seems to be caused by an "open bug" in the Intel HD graphics device. Something about a "PSR" feature that doesn't appear to be critically required yet can cause an "Atomic failure" (aka the processor won't continue until it's finished).  www.dedoimedo.com/computers/intel-microcode-atomic-update.html . This sounded like a strong candidate.

My first step to fixing the i915 error was to run Update Manager. I was 99% certain I'd done this previously but when I ran it late last week, I recall it reported that it planned to install the "Intel microcode", which is the firmware for the chipset to run Linux. Perhaps I hadn't run it previously after all. This almost fixed it - but not quite. I got a dark screen lockup shortly after the update which at least represented a change from the previous freeze (a change is as good as a rest?). The second and hopefully final step was to disable the PSR feature and after something like 4 days of continuous stress testing, I haven't seen any issues. That could change at any moment of course but I seem to have found a plausible root case and a credible corrective action.

If I've spoken too soon, I'll report back....
The following user(s) said Thank You: tommylight

Please Log in or Create an account to join the conversation.

More
08 Sep 2021 13:29 #219943 by Muzzer
Hmmph. That didn't stick. I started to get freezes again and in the end they were as little as 10 mins apart. No issues shown through the memory tests. I even did a fresh install of Debian on the same SSD without LinuxCNC and it froze too. Enough.

I made a complete new install of Mint 20 on a different (Samsung) SSD at the weekend and just installed 2.8.2. So far it seems to be stable. There's always the possibility it's the motherboard, so I can never be 100% certain if a freeze is seconds away. Fingers remain tightly crossed again.....

Please Log in or Create an account to join the conversation.

More
29 Sep 2021 19:46 #221849 by Muzzer
Quick update on this. It's still randomly freezing, although it typically runs for several days between freezes. None of the system test tools find anything of concern and this is a new SSD with a fresh install of Mint 20 and LCNC 2.8.2-39. The mobo and memory were brand new at the start of this recent  build.

Perhaps my mistake is expecting it to run for days on end. I'm guessing most people will power down after each session, so perhaps the underlying instability isn't confined to my system but wouldn't make itself widely apparent. Does anyone else actually leave their PCs running for days at a time? Although I will likely be powering down myself once I'm done fiddling, it feels to me that the underlying OS for a machine tool should be reasonably robust - more so than I'm seeing.

I don't need to be actually running LinuxCNC for this problem to show itself. However, it sounds as if the RT kernel is always running, which may be a key difference between the std Mint installation and Mint after LinuxCNC / RT has been installed.

I have a life to live and metal to cut, so perhaps I should leave it there and focus on making stuff now!

Please Log in or Create an account to join the conversation.

More
29 Sep 2021 20:10 #221851 by PCW
I would expect that it should just run. I have had instances of linucCNC running for many months without issue. I have had issues with some kernels and they are very frustrating to debug

You might try another kernel

Please Log in or Create an account to join the conversation.

More
29 Sep 2021 20:27 #221853 by Muzzer
Could you point me at some info about how I might do that? I'd give it a go but at the moment I wouldn't have a clue where to start, so I'd need to read up on it.

Many thanks

Please Log in or Create an account to join the conversation.

More
29 Sep 2021 22:54 #221857 by tommylight

... Does anyone else actually leave their PCs running for days at a time?

This one i am typing on had a reboot last night, not powered off for more than two months! :) But not running LinuxCNC, it has a Radeon RX 6900 XT in it so it is mining Ethereum.
PCW is right, try a different kernel, what do you have now?
uname -a

Please Log in or Create an account to join the conversation.

More
30 Sep 2021 07:04 #221871 by Muzzer
Looks like PREEMPT:

muzzer_linux@LinuxCNC:~$ uname -a
Linux LinuxCNC 4.9.0-13-rt-amd64 #1 SMP PREEMPT RT Debian 4.9.228-1 (2020-07-05) x86_64 x86_64 x86_64 GNU/Linux

What should I try? I seem to recall my first install (also prone to freezing, top of this thread) used RTAI(?)

Please Log in or Create an account to join the conversation.

More
30 Sep 2021 09:22 #221876 by tommylight
You can try 5.10 kernel, but am on phone so link in the evening.
It is easy, download, double click, reboot.

Please Log in or Create an account to join the conversation.

More
30 Sep 2021 10:31 #221884 by Muzzer
Many thanks, I'll look out for that.

The PC has just frozen again, some time in the last hour or so.....

Please Log in or Create an account to join the conversation.

Time to create page: 0.085 seconds
Powered by Kunena Forum