Version 2.8.2 Debian Buster PREEMPT-RT ISO Freezes

More
20 Jun 2022 13:17 #245481 by jcbryant
I'm new to LinuxCNC and indeed to Linux in general so please forgive what may seem like silly questions.

I'm putting together a Taig CNC Mill and have installed LinuxCNC on a mini industrial PC that I obtained for the purpose.  I replaced WiCD with network manager but otherwise it was a straightforward install.

I have successfully configured everything and things work as they should - for a while.  Then the system simply freezes (becomes unresponsive to both the mouse and the keyboard).  To date the freezes have only occurred while running LInuxCNC.  I really haven't tried exercising the machine in other ways, but if I simply do nothing it seems more stable.

I'm using a Mesa 7i76e card and the PC has a NEX 604 motherboard with an Intel Atom D52550 processor.  Latency was terrible until I disabled multi threading, after which it became acceptable.

My problem seems very similar to that described in a recent topic (linux-mint-cnc-2-8-0-freezing).  In that case the solution evidently involved going back to an earlier kernel (with presumably, the RT patch applied).  Is this something I should consider trying and, if so, how might I go about doing it?

Is it really possible that the current kernel isn't compatible with what seems to be a pretty standard PC?  If so, are there known hardware configurations (e.g. processors)  that cause problems?  I'm also considering obtaining another PC and would like to know if there's anything I should avoid.

I've tried using each of the 2G memory cards independently, and this hasn't changed anything.  Adding new heat paste is also on the list of things to try (it is a fanless PC) but the CPU to heat sink contact appears to be pretty good.  Is there some kind of  "system exerciser" application that I can install and run to see whether the freezes only occur while LinuxCNC is running?

Any help or suggestions would be most appreciated.
 

Please Log in or Create an account to join the conversation.

More
21 Jun 2022 23:28 #245568 by andypugh
Does LinuxCNC continue to work properly, but with a frozen GUI?

I have had this happen with a dodgy wireless keyboard. It's maybe worth unplugging the dongle if you have one, then seeing if the problem goes away.

Please Log in or Create an account to join the conversation.

More
23 Jun 2022 12:44 #245675 by jcbryant
As I've yet to get to running G-code, I can't say whether any code that was running might continue to execute. But other evidence indicates that I'm dealing with complete and utter machine failures, The last time around I had a monitor running in another window, and when the freeze occurred the monitor stopped as well.

Since posting, I have answered one of my own questions. I've found a computer stress tester (stress, s-tui), and have installed and used it. It ran for 25 hours with all test options enabled before I shut it down. So I don't think that I've a dodgy PC that is prone to crashing from time to time. Also CPU overheating can be ruled out as an issue. The temperature never exceed 68C.

When I invoke LInuxCNC and can be pretty sure that a freeze will occur within an hour. And this is with LinuxCNC just sitting there (I'm not actually doing anything). I make a few axes movements, go away, and come back to find everything frozen.

And I've no wireless dongles anywhere. The mouse and keyboard connect to the PS-2 ports via USB to PS-2 adapters.

Please Log in or Create an account to join the conversation.

More
23 Jun 2022 19:43 #245716 by andypugh
Which kernel are you running? (uname -a will say)

I am afraid I can't offer much advice as my machines all run the RTAI kernel (which is known to be flaky when LinuxCNC shuts down) but stays up without issue for weeks at a time on my test machines.

With RTAI the exact kernel version and the LinuxCNC build are intimately tied together. I think that there is a bit more leeway with PREEMPT-RT but perhaps it would be worth trying to match them. I am pretty sure that the current Buster packages were compiled on this kernel:
Linux buster-rtpreempt-amd64 4.19.0-20-rt-amd64

Please Log in or Create an account to join the conversation.

More
23 Jun 2022 20:05 #245717 by jcbryant
The uname command returns:

4.19.0-17-rt-amd64 #1 SMP PREEMPT RT Debian 4.19.194-2 (2021-06-21) x86_64 GNU/Linux

I presume that the 4.19.0-17-rt-amd64 is the kernel and the Debian 4.19.194-2 is the Linux version?

In any case I definitely didn't do any fancy mixing and matching. I just downloaded the current standard LinuxCNC package.

Please Log in or Create an account to join the conversation.

More
23 Jun 2022 21:35 #245727 by andypugh
The ISO installs a 4.19.0 kernel:

github.com/LinuxCNC/buster-live-build/blob/master/auto/config

So it looks like you are probably using the one that was provided there.

I am a bit puzzled, all in all. I have not heard of many others having this problem.

It's a bit of a wild shot, but you could try the known-unstable RTAI kernel and matcjhing LinuxCNC version. (On the plus side, latency is likely to be rather better)

linuxcnc.org/docs/stable/html/getting-st...#cha:Installing-RTAI

Your existing configs should work just the same with this setup.

Please Log in or Create an account to join the conversation.

  • tommylight
  • tommylight's Avatar
  • Away
  • Moderator
  • Moderator
More
23 Jun 2022 22:11 #245732 by tommylight
Does the BIOS have TPM?
try disabling it.
Also disable all virtualisations, on some older Dell's this will lock the PC solid with any RT kernel, but will work OK with a generic kernel. Not sure but it was something to do with blacklisting i915 chipset.
It would be helpful to do a dmseg clean, run LinuxCNC, then copy the dmesg after it locks and a reboot. Usually there will be some traces of what was going on before the crash.

Please Log in or Create an account to join the conversation.

More
24 Jun 2022 02:17 #245752 by jcbryant
The catch is that I'm using a Mesa Ethernet card, and my understanding is that the RTAI kernels are definitely not compatible with these cards.

Please Log in or Create an account to join the conversation.

More
24 Jun 2022 02:24 #245753 by jcbryant
I'll try the dmseg idea and will report back.

The BIOS definitely doesn't support TPM.

I'm not sure what you mean by virtualizations, but I have disabled multi-threading.

The PC uses a NEX 604 motherboard (Intel NM10 chipset + Intel Atom D2550 processor). It's a small fanless unit intended for industrial use. Nothing fancy, and nothing to do with Dell. Seemed perfect for the job....

Please Log in or Create an account to join the conversation.

More
24 Jun 2022 12:56 #245778 by jcbryant
I did a dmesg clear and ran LinuxCNC.Some time later LinuxCNC reported "Unexpected realtime delay on task 0 with period 1000000. Run the latency  test and resolve before continuing".  This was a bit distressing as I though that disabling multi threading had put paid to this issue.  Evidently not completely.....

After I while I checked (and recleared) dmesg.  There were five messages, all of the same form but with increasing values:
perf: interrupt took too long (3171 > 3170) lowering kernel.prf
....
perf: interrup took too long (7834 > 7832) lowering kernel.perf

More evidence of latency issues?

Ultimately I decided that the system was taking rather longer than usual to fail, and I wondered whether this might be because having the terminal application open was somehow stabilizing things.  So I closed the terminal application and, sure enough, the system froze shortly afterwards.

After rebooting I obtained the attached dmesg file.  Hopefully it will shed some light.
Attachments:

Please Log in or Create an account to join the conversation.

Time to create page: 0.067 seconds
Powered by Kunena Forum