Performance issues after switching from LCNC 2.8.4 Buster to 2.9.3 Bookworm

More
16 Oct 2024 09:28 - 16 Oct 2024 09:37 #312229 by BaxEDM
Hi,
Just switched from LCNC 2.8.4 Buster to 2.9.3 Bookworm and am running into what seems to be a performance issue. Occasionally I get a random servo following error, which is impossible since I run my steppers open loop. I use EtherCAT stepper drives. These errors stop G-code execution at random moments and turn the machine off, making it unusable. 
I remember when I first tried out LCNC on a really old laptop I got this as well, I then switched to a dedicated mini PC and never saw those errors again, until now. Now I'm running the same stuff on the same PC, but instead with LCNC 2.9.3 and Bookworm, the occasional servo errors have returned.

The mini-PC is a 8GB memory, 4-core Intel celeron 2-2.7GHz (The Minisforum GK41).

My initial thought was to replace the PC with something faster, but that might not be the right approach since the hardware is probably already fast enough. It works like a charm on 2.8.4 + Buster. Could it be that Bookworm is so much more demanding? I would like to hear what the experts on the forum recommend.
Last edit: 16 Oct 2024 09:37 by BaxEDM.

Please Log in or Create an account to join the conversation.

More
16 Oct 2024 12:25 #312247 by Aciera
(Maybe an admin could move this to the EtherCAT section)

Please Log in or Create an account to join the conversation.

More
16 Oct 2024 12:56 #312257 by Todd Zuercher
Have you checked to see if your new installation is having latency issues? Just because your PC had acceptable latency with Buster, does not mean that it will with Bookworm (at least not necessarily right out of the box). You may have to tweak some kernel parameters to improve the latency to acceptable levels.

I'm not completely familiar with how Linuxcnc EtherCAT configurations work or should be set up. But at a minimum there must be some form of feedback loop for Linuxcnc. Usually for open loop systems the stepgen sends a position feedback signal back to Linuxcnc. If that feedback signal is from an external hardware stepgen, you can get a large enough following error to set an alarm if latency causes delayed reads of the feedback, especially if following error limits are set too tight.

Did you reuse the old working machine configuration you created in Buster, or did you start over making a new config? (If you started fresh, is it possible you didn't configure something the same?)

You might need to post copies of your configuration files to get better answers.
The following user(s) said Thank You: Aciera

Please Log in or Create an account to join the conversation.

More
16 Oct 2024 13:27 #312262 by tommylight

(Maybe an admin could move this to the EtherCAT section)

Done.
Thank you.

Please Log in or Create an account to join the conversation.

More
17 Oct 2024 11:09 - 17 Oct 2024 11:09 #312356 by BaxEDM
I checked out the latencies:

On LCNC 2.8.4 + Buster:
Servo thread (1ms): Max interval 1024713 ns, Max Jitter 25556 ns
Base thread (25us): Max interval 65792 ns, Max Jitter 40792 ns

On LCNC 2.9.3 + Bookworm
Servo thread (1ms): Max interval 1069862 ns, Max Jitter 69928 ns
Base thread (25us): Max interval 120306 ns, Max Jitter 95306 ns

On LCNC 2.9.3+bookworm, the latencies are indeed more, especially for the base thread where it seems to have almost doubled.
Both machines run the exact same hardware and config. I would like to know more about "tweak some kernel parameters to improve latency to acceptable levels". I'm unfamiliar with that.
Last edit: 17 Oct 2024 11:09 by BaxEDM.

Please Log in or Create an account to join the conversation.

More
17 Oct 2024 11:10 #312357 by rodw
Your PC should be fine but the mini PC's have some power saving features.

This document of mine covers a lot of this but primarillary for Mesa cards.
docs.google.com/document/d/1jeV_4VKzVmOI...diY/edit?usp=sharing
I am assuming it will be an Intel NIC but please check that out as Realtek needs a different path
Start with #21, #22 #23
If intel you may benefit from #27 (which Mesa needs)
Also the forum scripts for CPU affinity
My theory Is that Ethercat is not as stringent as Mesa because it has its own NIC drivers but you might prove Me wrong!
The following user(s) said Thank You: Aciera

Please Log in or Create an account to join the conversation.

More
17 Oct 2024 11:22 #312358 by tommylight
For use with Mesa boards, there is no need for base period, so do latency tests with:
latency-histogram --nobase --sbinsize 1000 --show
Some RealTek network cards have issues with 6.x kernels, Rod has some nice documentation on how to fix it.
Some Intel network cards have coalescing enabled, look for PCW's howto for setting it to 0.
If using EtherCAT, see Rod's and some other members wealth of info on this forum.
-
Sorry for the wague answers, but on the phone in a hurry.

Please Log in or Create an account to join the conversation.

More
17 Oct 2024 11:36 #312362 by Aciera
Thanks Rod, I find it rather unfortunate that somebody like you does not seem be offered write access to the 'official' install documentation. The pull request process is so sluggish that by the time new install docs might actually make it through they're already outdated.
The following user(s) said Thank You: rodw

Please Log in or Create an account to join the conversation.

More
18 Oct 2024 07:00 - 18 Oct 2024 08:02 #312472 by BaxEDM
Hi Rod,

I went through your document. My NIC is identified as:

edmuser@edmpc:~$ lspci -k | grep -A 3 -i ethernet
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
Subsystem: IP3 Tech (HK) Limited RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller
Kernel driver in use: r8168
Kernel modules: r8168

I updated the driver according to the document and rebooted. Now, I would like to add the settings r8168.aspm=0 r8168.eee_enable=0 pcie_aspm=off loglevel=3
If I add r8168.aspm=0 or r8168.eee_enable=0 then the grub customizer does not allow me to save, throwing a "not found" error. I was able to save the pcie_aspm=off, loglevel=3 and quiet isolcpus=2,3 settings.

I will run tests to see if the new driver and the settings I did manage to save already fixes my issue, but since you suspect that the energy efficient ethernet might be a root cause of the problem, I really want to set that to off, but can't.
Last edit: 18 Oct 2024 08:02 by BaxEDM.

Please Log in or Create an account to join the conversation.

More
18 Oct 2024 11:03 - 18 Oct 2024 11:08 #312487 by BaxEDM
Yes! The problem is gone, even without the r8168.aspm=0 and r8168.eee_enable=0 settings. Before I also tried quiet isolcpus=2,3 on its own, and although the latency improved, it did not resolve the EtherCAT servo following errors.

From what I can conclude now is that updating the realtech drivers according to Rodw's document did the trick.

Some additional info. For my implementation, the EtherCAT reading and writing combined takes 14% of the 1 ms servo loop time.
To check that you can get the number of CPU ticks for each with:
halcmd show param lcec.read-all.tmax
halcmd show param lcec.write-all.tmax
Add those together and multiply by 1/CPU frequency, that gives you the total time for each servoloop iteration that EtherCAT requires.

Now I can finally write a detailed update/installation procedure for my customers and release my new EDM components.
Many thanks again! Great way to start the weekend :-)
Last edit: 18 Oct 2024 11:08 by BaxEDM.
The following user(s) said Thank You: tommylight

Please Log in or Create an account to join the conversation.

Time to create page: 0.155 seconds
Powered by Kunena Forum