EtherCAT Communication Issues, Lost Frames, Interruptions in Program Execution

More
16 Sep 2024 08:50 - 16 Sep 2024 09:01 #310239 by stenly
Hello again everyone. Thank you all for your responses and for letting me know I shouldn't waste my time with RTAI.

I believe I have managed to solve the latency errors and am now at the next roadblock..

I had previously passed isolcpus=1 to grub as a kernel parameter, however I read up more and realized that was incorrect. LinuxCNC uses the last core for realtime processes from what I understand. I changed that to isolcpus=2,3 (extra core helped even more) and the latency is in my opinion very good now. <10k on idle and no more than 55k even with 20 instances of glxgears open (I should have tried glxspheres too, but I'm not sure how to install it on Debian). I let it run for quite a bit and there were also no random spikes to speak of.

Now the issue I will look into is a random "joint X following error" or a "joint X amplifier fault" (it is a seemingly random joint from 0-4 each time) at weird times. I had previously also been getting this error, however I neglected to mention that as I thought it was all due to the latency. Sometimes this will occur as my test program is running, but the weird thing is that it also happens as the machine is idle. Even more strangely, one time the program conducted successfully with no issues and immediately as it ended the joint following error appeared. All of this leads me to rule out the latency being problematic now after I improved it - I doubt the latency would "wait" for the program to finish before it caught up.

I also tested the same machine on another small fairly modern i3 >3GHz PC (the onboard one is around 1.2GHz) just to be sure. Same situation.

Only now that I look into it, do I think to focus on the word "following" in the following error. The .ini file does have an FERROR variable which the previous colleague has set to 0.5 all across and MIN_FERROR to 0.05.

I am going to spend the day reading the wiki page on the "Following Error" and making sense of it. In the first paragraph it says we can be almost certain this error implies a mistake in the .ini. I thought I'd post this beforehand to give an update and to let you kind folks know not to bother with the RTAI line of thinking anymore.

I am attaching my .ini once again just in case someone may have the time to point me in the right direction.

Also, since my issues have moved rather out of scope of the EtherCAT forum subsection, do let me know if I should make another post in a different section.

PS. Neglected to mention that the maximum velocity of the configuration is 6000mm/min. Interestingly, I have the impression that it works much better at lower velocities in the 2-3k range, which should additionally point to the FERROR not being configured properly. I can't conclude it for certain since my test program is rather long and I haven't had the time to extensively test it at even lower speeds.
Attachments:
Last edit: 16 Sep 2024 09:01 by stenly. Reason: Added MIN_FERROR value

Please Log in or Create an account to join the conversation.

More
16 Sep 2024 13:24 #310261 by stenly
Some reading later and now I am investigating for DC Sync misconfigurations or problems.

Firstly, I noticed the BASE_THREAD in the .ini was set to 0. Why that was the case? I have no idea. I set it to 60000 as that was what the latency test reported and then decided to give it more headroom and set it to 100000. I'm still testing if this helped or not. I couldn't find any documentation on this value for servos, just for step motors. I'm not sure if the step motor logic can be applied to servos in this case. Could that be why the value was set to 0 (as in, is it irrelevant for servos somehow)?

I can definitely confirm that changing refClockSyncCycles in the .xml affects the machine behavior greatly. Initially I had a wrong understanding of what that value does so I set it to 60000 as well... That lead to extremely violent noises from the machine. I then tried to set it at 4 as my previous colleague had left it. Definite improvement, but there are still weird vibrations and noises sometimes, especially during homing. I then tried 1, 0 and -1 and -1 seems to give the best result, however the amplifier faults and following errors are still there.

From what I read, refClockSyncCycles="-1" used to be something only available if you additionally patched LinuxCNC, but from 2.9 onward that patch has been merged into the version that comes standard with the ISO? Could someone explain to me in more detail what this does? From what I get this is the so called "Slave Synchronization Mode" which is dependent on the master appTimePeriod and the PID values? However, my configuration uses digital servos, which should not require a PID config, correct?

Another thing I did was to reduce the axis acceleration rates drastically in the hopes that that would catch the error, but it doesn't seem it did yet. Now it does not reach more than 3200 velocity, even if the max is set to 6000mm/min.

A different approach I read about is the sync0Shift value of each slave, however I read that it must be the same as the BASE_THREAD. And that is already the case in my config.

I have not yet tried playing around with the FERROR values. I will try that tomorrow. I am seeing conflicting reports on them, though. The docs state that 0.0 is the minimum and 1.0 is the maximum, right? But even then, across the forums I see people with various configurations that have the value set as high as 5.0 or even 10.0?

Please Log in or Create an account to join the conversation.

  • tommylight
  • tommylight's Avatar
  • Away
  • Moderator
  • Moderator
More
16 Sep 2024 16:34 #310269 by tommylight
I do not think you need a base period for EtherCAT, so if that is the case it would be good to remove it.
Ferror values for meteic machines are usually 0.1 and 1, maybe someone with working machine can chime in on what those values should be as those can cause joint errors.
Amplifier fault is caused most probably from interference, if those pins are used in hal and attached to drives, but can also be from drive falting, so check the lights on the drive when it happens to be sure.
The following user(s) said Thank You: stenly

Please Log in or Create an account to join the conversation.

More
17 Sep 2024 09:31 - 17 Sep 2024 09:32 #310295 by stenly
Thank you, Tommy, happy to say that removing the base period and adjusting the FERROR values to 0.1-1.0 has yielded results. The machine has been running my very long test program for quite a few hours now with no issues yet. I will continue testing and gradually increasing the acceleration and velocity.

Hopefully this will be the end of this saga, I will report again after a day or three of thorough testing.
Last edit: 17 Sep 2024 09:32 by stenly.
The following user(s) said Thank You: tommylight

Please Log in or Create an account to join the conversation.

  • tommylight
  • tommylight's Avatar
  • Away
  • Moderator
  • Moderator
More
17 Sep 2024 10:36 #310300 by tommylight
Glad it worked out and thank you for reporting back.
The following user(s) said Thank You: stenly

Please Log in or Create an account to join the conversation.

More
17 Sep 2024 20:19 #310344 by bkt

I am pretty sure that RTAI is not supported  by ethercat debs. For starters, the debs use DKMS  so the kernel modules are compiled on yur computer so I suspect that requires a mainstream kernel to fetch the linux-headers source code.
 

seems from today DKMS under kernel 6.10.9-1 maybe not work at all .... update to these header unistal all DKMS file ... and not see it install again during update process ..... check these if you can ... maybe a problem. Pherhaps seem we looses libc6 and other necessary library .... I've not a machine for test if new kernel is perfect compatible with linuxnc and ethercat and if possible install again right libc6 and DKMS file.
 

Please Log in or Create an account to join the conversation.

  • rodw
  • rodw's Avatar
  • Away
  • Platinum Member
  • Platinum Member
More
18 Sep 2024 09:27 #310359 by rodw
I think you are doing it the hard way.... Use the linuxcnc iso and follow the sticky...

Please Log in or Create an account to join the conversation.

More
18 Sep 2024 18:07 #310394 by bkt

I think you are doing it the hard way.... Use the linuxcnc iso and follow the sticky...
 

I'm not using rtai....I'm just asking if you've checked the new kernel 6.10.9-1....linuxcnc in my 6.10.1 kernel installation works fine...it's not linuxcnc that worries me.....
regards

Please Log in or Create an account to join the conversation.

More
19 Sep 2024 06:32 - 19 Sep 2024 06:57 #310423 by stenly
Hi again.

After some testing, I can say I have had no more program interruptions. I've been playing around with the max velocities and accelerations of each axis and joint to find a sweet spot where the machine does not produce those unpleasant vibrations and knocking noises. There is, however, something still weird about it.

When running LinuxCNC and even some light app like a file manager or something, everything is in order. This may sound silly, but whenever there is a terminal open and I am interacting with it, the vibrations happen. So much so that when I have for example my .hal open in vim in the terminal and I use the mouse scroll wheel to navigate the file, the vibrations are in tandem with my scrolling of the wheel. Extremely weird in my opinion, yet I can definitely say it's not a joke.

Any idea what could be causing this? Granted, I am not too worried, as I am not planning on having any terminal (or anything for that matter) be opened alongside LinuxCNC when the machine is in production.

PS. Could anyone help me out in making Axis GUI not set the max velocity to the maximum allowed when I start LinuxCNC? This behavior could be dangerous if someone were to run a poorly written program, so I'd like to have it set to a smaller value by default on startup.
Last edit: 19 Sep 2024 06:57 by stenly.

Please Log in or Create an account to join the conversation.

  • tommylight
  • tommylight's Avatar
  • Away
  • Moderator
  • Moderator
More
19 Sep 2024 14:25 #310456 by tommylight
Try a different mouse. If the mouse cable is nicked or worn it may cause huge delays in communications, or might be interference or bad USB chip in the mouse. Mouses are cheap.
Also, is the mouse a PS/2 one or USB?
The following user(s) said Thank You: stenly

Please Log in or Create an account to join the conversation.

Time to create page: 0.127 seconds
Powered by Kunena Forum