Systematic approach to tracking down latency issue
- tommylight
- Away
- Moderator
Less
More
- Posts: 19188
- Thank you received: 6432
06 Mar 2019 01:42 #127847
by tommylight
Oh yeah, stay away from new Nvidia cards. New Radeon works fine......until you try to install the 3D drivers, a new install is inevitable ! Have an RX580 in the main computer that i use daily.
Replied by tommylight on topic Systematic approach to tracking down latency issue
I did the same thing with Dell, but that stock is wearing thin lately !Any btw: I stocked up on Core-II-Duo based Dell Optiplexes and Fujitsus when they were still available.
They have enough CPU horsepower for LinuxCNC with Axis and perform much better latency-wise than anything more modern I tried.
Oh yeah, stay away from new Nvidia cards. New Radeon works fine......until you try to install the 3D drivers, a new install is inevitable ! Have an RX580 in the main computer that i use daily.
Please Log in or Create an account to join the conversation.
- RichJordan
- Offline
- New Member
Less
More
- Posts: 16
- Thank you received: 0
06 Mar 2019 19:12 #127906
by RichJordan
Replied by RichJordan on topic Systematic approach to tracking down latency issue
Thanks for your reply Hase.
I'm going to check memory usage and the SMART capability of hard drive.
I use the Mesa FPGA boards (5i25 specifically) at work to run automation prototypes. No problems using those. I think I'll buy one. They run on a 1ms servo thread, right? I left my machine with the latency profiler running over the course of about 9-10 hours and found a max spiked latency of 2.5ms - this could be a problem with the FPGA boards. Any thoughts about how they might function with a late latency?
Anyway, thanks again for all your thoughts.
Richard
I'm going to check memory usage and the SMART capability of hard drive.
I use the Mesa FPGA boards (5i25 specifically) at work to run automation prototypes. No problems using those. I think I'll buy one. They run on a 1ms servo thread, right? I left my machine with the latency profiler running over the course of about 9-10 hours and found a max spiked latency of 2.5ms - this could be a problem with the FPGA boards. Any thoughts about how they might function with a late latency?
Anyway, thanks again for all your thoughts.
Richard
Please Log in or Create an account to join the conversation.
07 Mar 2019 09:33 #127961
by hase
Replied by hase on topic Systematic approach to tracking down latency issue
The servo thread is scheduled at 1ms intervals, indeed.
This is the thread where LinuxCNC trcks the position of the tool (in the coordinate space) and calculates the next moves.
Basically, the output is the movement speed of the tool in space (and derived from that the speed of each joint) for the next interval.
If this is delayed by 2.5 periods (1ms schedule and 2.5ms lag in your case), that *will* result in problems: no matter how smart the servo thread is, this lag it cannot compensate for; I think. Not entirely sure.
To be honest: I use LinuxCNC only in my hobby shop and tend to be lazy about it: I have not analyzed the server thread and how smart it is at handling lag, i.e. the difference between "i should have run at time x, but current time is x + delta".
Still: a 2.5ms lag is hard or impossible to correct for.
When I had the problem with the SMART readout from the harddrive, the lags were even longer (like 10 times longer) at the peak, but iirc. the 2.5ms could indicate a relation.
But back to the servo thread: the Mesa card runs on its internal crystal oscillator. not really on the 1ms servo schedule.
But the servo thread runs on that schedule (see above), and it gives the commands (speed of each joint) to the card, which then generates step pulses corresponding to that desired speed.
And again: that speed will be the wrong one if the calculation stopped for too long a time (and 2.5ms is too long
My problem with the latency test in 2014 was this: it showed me a measured value and a peak.
But even on a system where the latency test showed no problems, actual LinuxCNC had one. Axis shows this as a popup.
This is easily missed and when you click on it, it goes away and never appears again - because all future errors are ignored. This is a bit dangerous (but I support the design decision behind it, btw.).
And I have at least one system where LinuxCNC runs fine, but the latency test shows large latency peaks.
That btw. was my reason to start this thread back in the day: the fact that the latency test obviously uses a different method to measure the delay between calls to threads, or a different thread setup or some other difference to an actual LinuxCNC thread setup.
So I followed all the advice in the wiki like
- disabling CPU power save and sleep states (since I do not care about saving a Watt there)
- binding Interrupt handling to a core (well - did that, no effect actually, but no harm either)
and in the BIOS setting I disabled everything I thought could influence RT performance: all unnecessary peripherals virtualization etc.
A really unsystematic, shotgun-type approach
So short morale: you must test with an actual machine setup and see, if Axis reports errors.
I ended up running the actual machine (with drives disables to conserve power and spare my nerves of the noise for a day or so. Just the actual machine config and a repeating G-code sequence resembling actual parts (I just created a file with actual parts in it in CAM and edited it into an endless loop). I let that run as an air cut (no tool) and then disabled the drive power and let the controller run over the weekend.
And when the machine worked fine within my parameters for "fine", I invoked the cardinal rule of IT ("never change a running system") and let layziness win over the engineers desire for asystematic approach to the problem
This is the thread where LinuxCNC trcks the position of the tool (in the coordinate space) and calculates the next moves.
Basically, the output is the movement speed of the tool in space (and derived from that the speed of each joint) for the next interval.
If this is delayed by 2.5 periods (1ms schedule and 2.5ms lag in your case), that *will* result in problems: no matter how smart the servo thread is, this lag it cannot compensate for; I think. Not entirely sure.
To be honest: I use LinuxCNC only in my hobby shop and tend to be lazy about it: I have not analyzed the server thread and how smart it is at handling lag, i.e. the difference between "i should have run at time x, but current time is x + delta".
Still: a 2.5ms lag is hard or impossible to correct for.
When I had the problem with the SMART readout from the harddrive, the lags were even longer (like 10 times longer) at the peak, but iirc. the 2.5ms could indicate a relation.
But back to the servo thread: the Mesa card runs on its internal crystal oscillator. not really on the 1ms servo schedule.
But the servo thread runs on that schedule (see above), and it gives the commands (speed of each joint) to the card, which then generates step pulses corresponding to that desired speed.
And again: that speed will be the wrong one if the calculation stopped for too long a time (and 2.5ms is too long
My problem with the latency test in 2014 was this: it showed me a measured value and a peak.
But even on a system where the latency test showed no problems, actual LinuxCNC had one. Axis shows this as a popup.
This is easily missed and when you click on it, it goes away and never appears again - because all future errors are ignored. This is a bit dangerous (but I support the design decision behind it, btw.).
And I have at least one system where LinuxCNC runs fine, but the latency test shows large latency peaks.
That btw. was my reason to start this thread back in the day: the fact that the latency test obviously uses a different method to measure the delay between calls to threads, or a different thread setup or some other difference to an actual LinuxCNC thread setup.
So I followed all the advice in the wiki like
- disabling CPU power save and sleep states (since I do not care about saving a Watt there)
- binding Interrupt handling to a core (well - did that, no effect actually, but no harm either)
and in the BIOS setting I disabled everything I thought could influence RT performance: all unnecessary peripherals virtualization etc.
A really unsystematic, shotgun-type approach
So short morale: you must test with an actual machine setup and see, if Axis reports errors.
I ended up running the actual machine (with drives disables to conserve power and spare my nerves of the noise for a day or so. Just the actual machine config and a repeating G-code sequence resembling actual parts (I just created a file with actual parts in it in CAM and edited it into an endless loop). I let that run as an air cut (no tool) and then disabled the drive power and let the controller run over the weekend.
And when the machine worked fine within my parameters for "fine", I invoked the cardinal rule of IT ("never change a running system") and let layziness win over the engineers desire for asystematic approach to the problem
The following user(s) said Thank You: RichJordan
Please Log in or Create an account to join the conversation.
- Mike_Eitel
- Offline
- Platinum Member
Less
More
- Posts: 1150
- Thank you received: 184
07 Mar 2019 16:55 #128001
by Mike_Eitel
Replied by Mike_Eitel on topic Systematic approach to tracking down latency issue
@PCW
I wonder what happens in reality in a mesa stepper configuration when by latency two or three servocycles go lost.
Are the according steps lost or generates that something like an "Input" jump with according ferror?
Mike
I wonder what happens in reality in a mesa stepper configuration when by latency two or three servocycles go lost.
Are the according steps lost or generates that something like an "Input" jump with according ferror?
Mike
Please Log in or Create an account to join the conversation.
- Todd Zuercher
- Offline
- Platinum Member
Less
More
- Posts: 5007
- Thank you received: 1441
07 Mar 2019 21:08 #128031
by Todd Zuercher
Replied by Todd Zuercher on topic Systematic approach to tracking down latency issue
It will react very similarly to how a servo would in the same situation. What happens depends on if there was a velocity command change during the latency overrun, and how large of a change there was and if the latency over-run is large enough the watchdog will bite, and shut down the system (more than 4 cycles will probably cause a shut down) Basically any commanded change in pulse rate will be delayed by the bad timing, then the system will correct if any correction is needed. (The actual creation of pulses is closed loop like on a real servo.)
Please Log in or Create an account to join the conversation.
- Mike_Eitel
- Offline
- Platinum Member
Less
More
- Posts: 1150
- Thank you received: 184
07 Mar 2019 21:54 #128032
by Mike_Eitel
Replied by Mike_Eitel on topic Systematic approach to tracking down latency issue
Yes, maybe...
But I never read about some kind of regulator (position / speed) inside the Mesa card.
I guess it is similar to position regulation, means pc reads actual pos, calculates against his internal model and then sends commanded pos to card.
That's why I would love to get explained from pcw how it is implemented.
Mike
But I never read about some kind of regulator (position / speed) inside the Mesa card.
I guess it is similar to position regulation, means pc reads actual pos, calculates against his internal model and then sends commanded pos to card.
That's why I would love to get explained from pcw how it is implemented.
Mike
Please Log in or Create an account to join the conversation.
07 Mar 2019 21:56 #128033
by PCW
Replied by PCW on topic Systematic approach to tracking down latency issue
In addition to what Todd stated, If you are moving at a constant velocity, nothing would happen, except the control loop in LinuxCNC will actually make a bogus correction because it will have new position command data but stale position feedback data.
You can avoid this bogus correction with some hal plumbing so an occasional dropped or timed-out update causes minimal disruption
The actual (transient) error caused by a late velocity update is fairly small.
on nominal acceleration machines,say you have a CNC machine with 1/4G
acceleration and a 1 ms servo thread:
1/4G = ~100 IPS/S, 100 IPS/S = 0.1 IPS/ms
a velocity error of 0.1 IPS for 1 ms = .0001"
You can avoid this bogus correction with some hal plumbing so an occasional dropped or timed-out update causes minimal disruption
The actual (transient) error caused by a late velocity update is fairly small.
on nominal acceleration machines,say you have a CNC machine with 1/4G
acceleration and a 1 ms servo thread:
1/4G = ~100 IPS/S, 100 IPS/S = 0.1 IPS/ms
a velocity error of 0.1 IPS for 1 ms = .0001"
The following user(s) said Thank You: Mike_Eitel, RichJordan
Please Log in or Create an account to join the conversation.
- Mike_Eitel
- Offline
- Platinum Member
Less
More
- Posts: 1150
- Thank you received: 184
07 Mar 2019 22:14 #128034
by Mike_Eitel
Replied by Mike_Eitel on topic Systematic approach to tracking down latency issue
THX
I think this clarifies once for ever that sometimes few milliseconds missed for latency will not be remarked in normal machines.
I know why I like your products
And to be even more clear:
I can not understand why some people's give for good reason good money for good mechanics, and then try to spare few dollars to use chepo parallel port solutions. Especially when sw "pulsing" can never be so smooth/precise as hw generation.
m5c
Mike
I think this clarifies once for ever that sometimes few milliseconds missed for latency will not be remarked in normal machines.
I know why I like your products
And to be even more clear:
I can not understand why some people's give for good reason good money for good mechanics, and then try to spare few dollars to use chepo parallel port solutions. Especially when sw "pulsing" can never be so smooth/precise as hw generation.
m5c
Mike
Please Log in or Create an account to join the conversation.
- tommylight
- Away
- Moderator
Less
More
- Posts: 19188
- Thank you received: 6432
08 Mar 2019 01:12 #128052
by tommylight
Parallel port is ....... was a mighty device. I used it to copy files from one PC to another some 20 odd years ago as it was much faster than any floppy at that time. And it never fails to work.
And i have a lot of Mesa boards, a lot!
It all depends on a lot of factors.
Replied by tommylight on topic Systematic approach to tracking down latency issue
I have to agree with you on that.I can not understand why some people's give for good reason good money for good mechanics, and then try to spare few dollars to use chepo parallel port solutions.
Parallel port is ....... was a mighty device. I used it to copy files from one PC to another some 20 odd years ago as it was much faster than any floppy at that time. And it never fails to work.
And i have a lot of Mesa boards, a lot!
It all depends on a lot of factors.
Please Log in or Create an account to join the conversation.
- Mike_Eitel
- Offline
- Platinum Member
Less
More
- Posts: 1150
- Thank you received: 184
08 Mar 2019 06:48 #128060
by Mike_Eitel
Replied by Mike_Eitel on topic Systematic approach to tracking down latency issue
Yes. And for light small, or low efforts machines or "just to have fun" etc. Linuxcnc parport solution is a super solution. Good for both worlds. Big THX for the developers.
Please Log in or Create an account to join the conversation.
Time to create page: 0.082 seconds