FAQ - The Latency Problem
08 Apr 2012 10:03 - 22 Apr 2012 13:27 #19022
by ArcEye
FAQ - The Latency Problem was created by ArcEye
The Latency Problem
*******************
This subject must be the single most vexed and most frequently discussed subject in relation to running Linuxcnc, on this section of the forum.
From the outset it has to be acknowledged that some computers, cannot and will never run Linuxcnc properly.
This is down to their design, chipsets etc. and they are just inherently unsuited to the task.
The whole ethos of Linuxcnc is to be able to run realtime NC machines on low spec, cheap computer hardware.
If you find you have a lemon, don't waste time on it and get something else that works.
(wiki.linuxcnc.org/cgi-bin/wiki.pl?Latency-Test has list of computers that do work and some that definitely don't)
What is latency
***************
A computer cannot act on something instantaneously, and the amount of waiting time between an input and its output is called latency.
In order for the delay between input and output to be perceived as non-existent (in other words, for a computer to "react in real-time,") the latency must be low.
Latency must also be consistent without fluctuations and spikes.
Having a low base figure with occasional much longer delays is actually far more disruptive to smooth running
than a higher base figure which is constant.
If you know you can only get xxx pulses per second, which will limit your top velocity to yyy, you can live with that and work within the limits of the hardware.
If however you think you can get XXX+ pulses but occasionally you only get xx- and this causes missed steps and following errors, wrecking a work piece, .......... well you get the picture.
What is special about a real-time linux kernel
**********************************************
The default behaviour of a real-time kernel is to slant process priority towards the real time processes which require lowest latency.
In theory it can trap all interrupts and prevent userspace processes running to the detriment of real-time ones.
Only specific processes are designed to request high-priority scheduling. Each process is given (or asks for) a priority number, and the real-time kernel will always give processing time to the process with the highest priority number, even if that process uses up all of the available processing time.
This enables real time processes to function efficiently but low priority processes just have to fit in where they can.
If you set your base thread period too low, you can see the extreme symptoms of this, the thread will poll so quickly that Axis, which is userspace, will not be able to load and render graphics etc. in the time between polls.
What affects latency
********************
You would think having read about how the real time kernel schedules, that latency was sorted.
You would be reckoning without computer manufacturers for a start.
With processors getting faster and running hotter, demand for extended battery life, lower power consumption etc etc, MB makers devised lots of methods to monitor stuff, vary fan speed, decrease power consumption etc. and often at the BIOS level, unaffected by whatever operating system sat above it.
Many graphics chip and board makers produced their hardware and drivers with the sole aim of speed of rendering, to try to capture the largely windoze based gaming market.
Some of their 'down and dirty' techniques (seizing system memory for paging, BIOS level access etc) in the best tradition of windoze are un-documented and jealously guarded, and can have a profound effect on other processes.
Others took an opposite tack and the 'all in one' MB with on board graphics and audio, shared and often hogged resources to the detriment of latency.
So the list goes on, certainly no-one produced a MB with Linuxcnc users in mind!
Do you have a latency problem?
******************************
Only way to find out is to do a long test and be sure to load the system, with GLX apps, big file moves, networking, dragging windows to force repaints etc
The link is below for this basic test.
wiki.linuxcnc.org/cgi-bin/wiki.pl?Latency-Test
If you have a problem, it can greatly help to run a further test to try to identify the causes, eg is it high from the outset or are you getting periodic spikes.
Execute /usr/realtime-(kernelversion)-rtai/testsuite/user/latency/run inside a terminal
This will run a latency test with a per second printout.
*
* Type ^C to stop this application.
*
## RTAI latency calibration tool ##
# period = 100000 (ns)
# avrgtime = 1 (s)
# do not use the FPU
# start the timer
# timer_mode is oneshot
RTAI Testsuite - KERNEL latency (all data in nanoseconds)
RTH| lat min| ovl min| lat avg| lat max| ovl max| overruns
RTD| -1571| -1571| 1622| 8446| 8446| 0
RTD| -1558| -1571| 1607| 7704| 8446| 0
RTD| -1568| -1571| 1640| 7359| 8446| 0
RTD| -1568| -1571| 1653| 7594| 8446| 0
RTD| -1568| -1571| 1640| 10636| 10636| 0
RTD| -1568| -1571| 1640| 10636| 10636| 0
There should be no overruns, and "lat max" is the figure you are watching
If you get periodic spikes at regular intervals, this might indicate the SMI problem, see the upcoming Latency_solutions FAQ
If the spike coincides with networking activity, window repainting or file access, that again might assist in nailing down what is wrecking latency.
It is worth mentioning now, as prompted by Rick G, that all real time kernels are not equal and that it is well worth testing with 8.04 and 10.04 before taking any drastic remedial action.
A lot of the older hardware in the P3, P4 era will run the kernel supplied with 8.04 quite happily, but produce bad figures with the newer 2.6.32-122-rtai kernel.
A good proportion of the results quoted in wiki.linuxcnc.org/cgi-bin/wiki.pl?Latency-Test will be using 8.04 and may vary considerably using 10.04, if it will actually install.
More of this in Part 2.
Links
******
wiki.linuxcnc.org/cgi-bin/wiki.pl?Latency-Test
cvs.gna.org/cvsweb/magma/base/arch/i386/...2Fplain;cvsroot=rtai
wiki.linuxcnc.org/cgi-bin/wiki.pl?TroubleShooting
wiki.linuxcnc.org/cgi-bin/wiki.pl?RealTime
*******************
This subject must be the single most vexed and most frequently discussed subject in relation to running Linuxcnc, on this section of the forum.
From the outset it has to be acknowledged that some computers, cannot and will never run Linuxcnc properly.
This is down to their design, chipsets etc. and they are just inherently unsuited to the task.
The whole ethos of Linuxcnc is to be able to run realtime NC machines on low spec, cheap computer hardware.
If you find you have a lemon, don't waste time on it and get something else that works.
(wiki.linuxcnc.org/cgi-bin/wiki.pl?Latency-Test has list of computers that do work and some that definitely don't)
What is latency
***************
A computer cannot act on something instantaneously, and the amount of waiting time between an input and its output is called latency.
In order for the delay between input and output to be perceived as non-existent (in other words, for a computer to "react in real-time,") the latency must be low.
Latency must also be consistent without fluctuations and spikes.
Having a low base figure with occasional much longer delays is actually far more disruptive to smooth running
than a higher base figure which is constant.
If you know you can only get xxx pulses per second, which will limit your top velocity to yyy, you can live with that and work within the limits of the hardware.
If however you think you can get XXX+ pulses but occasionally you only get xx- and this causes missed steps and following errors, wrecking a work piece, .......... well you get the picture.
What is special about a real-time linux kernel
**********************************************
The default behaviour of a real-time kernel is to slant process priority towards the real time processes which require lowest latency.
In theory it can trap all interrupts and prevent userspace processes running to the detriment of real-time ones.
Only specific processes are designed to request high-priority scheduling. Each process is given (or asks for) a priority number, and the real-time kernel will always give processing time to the process with the highest priority number, even if that process uses up all of the available processing time.
This enables real time processes to function efficiently but low priority processes just have to fit in where they can.
If you set your base thread period too low, you can see the extreme symptoms of this, the thread will poll so quickly that Axis, which is userspace, will not be able to load and render graphics etc. in the time between polls.
What affects latency
********************
You would think having read about how the real time kernel schedules, that latency was sorted.
You would be reckoning without computer manufacturers for a start.
With processors getting faster and running hotter, demand for extended battery life, lower power consumption etc etc, MB makers devised lots of methods to monitor stuff, vary fan speed, decrease power consumption etc. and often at the BIOS level, unaffected by whatever operating system sat above it.
Many graphics chip and board makers produced their hardware and drivers with the sole aim of speed of rendering, to try to capture the largely windoze based gaming market.
Some of their 'down and dirty' techniques (seizing system memory for paging, BIOS level access etc) in the best tradition of windoze are un-documented and jealously guarded, and can have a profound effect on other processes.
Others took an opposite tack and the 'all in one' MB with on board graphics and audio, shared and often hogged resources to the detriment of latency.
So the list goes on, certainly no-one produced a MB with Linuxcnc users in mind!
Do you have a latency problem?
******************************
Only way to find out is to do a long test and be sure to load the system, with GLX apps, big file moves, networking, dragging windows to force repaints etc
The link is below for this basic test.
wiki.linuxcnc.org/cgi-bin/wiki.pl?Latency-Test
If you have a problem, it can greatly help to run a further test to try to identify the causes, eg is it high from the outset or are you getting periodic spikes.
Execute /usr/realtime-(kernelversion)-rtai/testsuite/user/latency/run inside a terminal
This will run a latency test with a per second printout.
*
* Type ^C to stop this application.
*
## RTAI latency calibration tool ##
# period = 100000 (ns)
# avrgtime = 1 (s)
# do not use the FPU
# start the timer
# timer_mode is oneshot
RTAI Testsuite - KERNEL latency (all data in nanoseconds)
RTH| lat min| ovl min| lat avg| lat max| ovl max| overruns
RTD| -1571| -1571| 1622| 8446| 8446| 0
RTD| -1558| -1571| 1607| 7704| 8446| 0
RTD| -1568| -1571| 1640| 7359| 8446| 0
RTD| -1568| -1571| 1653| 7594| 8446| 0
RTD| -1568| -1571| 1640| 10636| 10636| 0
RTD| -1568| -1571| 1640| 10636| 10636| 0
There should be no overruns, and "lat max" is the figure you are watching
If you get periodic spikes at regular intervals, this might indicate the SMI problem, see the upcoming Latency_solutions FAQ
If the spike coincides with networking activity, window repainting or file access, that again might assist in nailing down what is wrecking latency.
It is worth mentioning now, as prompted by Rick G, that all real time kernels are not equal and that it is well worth testing with 8.04 and 10.04 before taking any drastic remedial action.
A lot of the older hardware in the P3, P4 era will run the kernel supplied with 8.04 quite happily, but produce bad figures with the newer 2.6.32-122-rtai kernel.
A good proportion of the results quoted in wiki.linuxcnc.org/cgi-bin/wiki.pl?Latency-Test will be using 8.04 and may vary considerably using 10.04, if it will actually install.
More of this in Part 2.
Links
******
wiki.linuxcnc.org/cgi-bin/wiki.pl?Latency-Test
cvs.gna.org/cvsweb/magma/base/arch/i386/...2Fplain;cvsroot=rtai
wiki.linuxcnc.org/cgi-bin/wiki.pl?TroubleShooting
wiki.linuxcnc.org/cgi-bin/wiki.pl?RealTime
Last edit: 22 Apr 2012 13:27 by ArcEye.
The following user(s) said Thank You: Engineer Dwayne
The topic has been locked.
Time to create page: 0.067 seconds