FAQ - Some Latency Solutions
11 Apr 2012 09:51 - 19 Jul 2013 22:31 #19125
by ArcEye
FAQ - Some Latency Solutions was created by ArcEye
Solving Latency Problems
************************
You should start with the trouble shooting guide and links therein (largely sections 4 - 9 plus 17)
wiki.linuxcnc.org/cgi-bin/wiki.pl?TroubleShooting
This FAQ is far from a comprehensive or in depth treatise, it just seeks to add some more flesh to the bones in some cases.
Graphics cards and drivers:
****************************
The problems with video cards etc can manifest as poor video rendering, ragged windows, lower part of Axis plot window just a series of horizontal lines, jerky movements etc.
They can also appear to run fairly well, but latency takes a huge hit whilst they are doing so.
Do not confuse this with just getting a blank black square or dot where the Axis plot should be.
This is a classic symptom of an OpenGL problem, which can usually be fixed quite simply by switching to software libraries
( wiki.linuxcnc.org/cgi-bin/wiki.pl?TroubleShooting section 5.2)
Nvidia is a particular favourite for causing problems, but far from the only one to do so.
The standard first advice is to try substituting the open source driver for the proprietory one.
If that doesn't improve matters, try the bog standard vesa driver and / or use an old PCI card like a Matrox or an older ATI.
However these cards are becoming harder to get hold of and may not actually support the screens we are using more and more.
To quote a contributor to the user-lists, regards vesa.
"This does generally help considerably. Unfortunately, the vesa driver, specs written back in 4/3 CRT display days, has not kept pace with the available monitors, which are all only available in 16/9 formats, at least here in the states, so while the vesa driver will work at its maximum 1024x768 resolution, on modern monitors you no longer have 'square pixels, meaning round objects aren't round, but oval.
Occasionally a 3rd party driver will work well. On the box I just retired because the motherboard caps were failing, I had an ATI 9200SE card in it, and switching to the ati driver as above (in xorg.conf) , which in turn loads the correct ATI driver for that card, I was amazed to find that latencies had not taken a big hit (unlike the nvidia drivers which are truly horrible to the latency figures) and that I once again had square pixels on a 16/9 monitor.
In fact it worked amazingly well."
Not all on-board graphics are bad. Some Intel chipsets work well and cause little or no problems.
I have 2 P4 2.8GHz Fujitsu-Siemens PCs with Intel mini ATX boards with on-board video (Intel 82865G video chipset) that return a consistant max jitter of 9,500.
The popular mini ITX Intel Atom D525 board likewise does not seem problematic and returns sub 10K readings.
SMI
***
This problem is peculiar to a particular group of Intel chipsets.
It tends to manifest itself in very regular spikes in latency, 32 or 64 seconds being the commonly mentioned
An explanation of the problem from the magma (real time kernel development) team
"When using RTAI with certain Intel-based motherboards, some people have experienced seemingly unexplainable latencies (many 100's of microseconds) that are unacceptable for the real-time performance they need.
In the past it was believed that this problem was being caused by "buslocking" and that the only way around it was to use different hardware.
While buslocking may indeed be a problem in multiprocessor systems, if you are using a single processor system the more likely culprit is System Management Interrupts (SMI) being generated by the power management hardware on the board.
In a real time environment, SMIs are evil. First off, they can last for hundreds of microseconds, which for many RT applications causes unacceptable jitter. Second, they are the highest priority interrupt in the system (even higher than the NMI). Third, you can't intercept the SMI because it doesn't have a vector in the CPU.
Instead, when the CPU gets an SMI it goes into a special mode and jumps to a hard-wired location in a special SMM address space (which is probably in BIOS ROM). This is why RTAI can't help -- SMI interrupts are essentially "invisible" to it.
The good news is that in many (maybe even all) cases, SMIs can be disabled by twiddling a few bits in the system controller on the motherboard."
This is the advice re SMI and install the de-activating module
wiki.linuxcnc.org/cgi-bin/wiki.pl?FixingSMIIssues
The rtai_smi module only works with specific chipsets, these are;
Intel MB chips
82801AA_0
82801AB_0
82801BA_0
82801BA_10
82801E_0
82801CA_0
82801CA_12
82801DB_0
82801DB_12
82801EB_0
ICH6_0
ICH6_1
ICH6_2
ICH7_0
ICH7_1
ICH8_4
Running lspci -vv will give a print which should show you if one of these chips figures large
I tried hacking the module to get it to accept the rest of the ICH8 family of chips.This worked in so far as the module loaded happily, but latency became far worse!
Doubtless the register bits which rtai_smi changes, have moved or changed meaning in these chips.
So probably best to stick with this list!
NB.
The chipset 82801EB/ER revision 02 (ICH5/ICH5R) seems to work without any SMI problems, I have 2 computers with it in and latency is 9653 loaded, with no spikes.
( see next post re details of the apparent SMI spikes which can in fact be removed via BIOS settings)
The latest magma CVS has ICH10_1 listed in the source (smi-module.c), together with the pci_ids definition in case it is not already in pci_ids.h
If you have a SMI problem with this chip, looks like you should be able to compile the new code and catch that one too.
Disk access
****************
Whilst disk access itself should not be a problem with LinuxCNC up and loaded, it can be, as a consequence of system memory shortage.
If you are trying to run Linuxcnc on a very old machine with very limited system memory, the swap partition is liable to come into play.
When system memory is full and the processor needs to load more data, it 'swaps out' memory pages not currently being accessed to hard disk.
When it finds it needs that data again, it reads it from disc and swaps out something else to make room for it.
This gives rise to the characteristic constant churning of hard drive and slow response of old systems.
Sometimes another stick of memory can have miraculous effects
Which Kernel?
*************
This is an area which does not get discussed much in the base documentation but comes up a lot on the forum in terms of practical advice.
As alluded to in the FAQ on latency problems, the base Ubuntu system kernel version and the Linuxcnc kernel version based upon it, can have a profound effect, not only upon latency, but whether you can even install from Live CD and if you can, whether it will run.
The kernel/Ubuntu version and Linuxcnc version are not the same thing. You can run the latest 2.5 Linuxcnc version on either system, so your primary consideration should be which kernel runs best and gives the best latency on your hardware.
As a general rule of thumb, older P3 and P4 single processor based boards ( and AMD equivilents etc), are more likely to run better on 8.04
Version 10.04 may actually not install at all on these machines, but where it does, it can give horrible latency figures.
On the other hand, the newer multi-core processor based systems will often refuse to even install 8.04, so you may have no option as to which you use.
This does not mean that multi-core machines will run well on 10.04 without additional changes however.
Linuxcnc seems to run best on one, single cored processor, with a kernel to suit.
To the chagrin of people with 8 cored, fire breathing bleeding edge computers, a humble P4 will very often have far better latency.
Why is this?
At the root of this is probably SMP and its effect on real-time processing.
"Symmetric MultiProcessing (SMP) involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance.
In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors."
An explanation of what goes on was postulated by 2 of the developers, in relation to the use of a 'CPU Hog' on a dual cored processor machine.
The CPU Hog is a do nothing program that consumes all spare CPU time with an endless while() loop.
It was found that running this, considerably improved latency on this machine.
"In my experience, the "cpu hog" is able to reduce latencies from 10-20 microseconds down to perhaps 5-7uS.
My own theory ........ about why the cpu hog works is related to cache. The hog uses very little memory, and since it keeps one CPU busy, that CPU never runs any other code. So the RT code doesn't get flushed out of cache, and doesn't have to get fetched back into cache later."
"The trick is to make sure the RT tasks never hop from one CPU to another. Probably having a process always ready to run on one CPU makes the choice easy for the RT scheduler to pick the same CPU always for the RT threads.
The CPU hog doesn't get complete use of a CPU, it has to share the CPU with all other user-mode processes.
What it does is guarantee that that CPU always has a process ready to run."
So returning to the Linuxcnc kernels.
The 2.6.24 based kernel in 8.04 was non-SMP. In other words it was designed for single processors.
The kernel scheduler only had one choice of processor and if the polling thread was fast enough other processes didn't get much of a look in and most of what the rt threads needed was in cache all the time, minimising delays and increasing latency.
The 2.6.32-122-rtai kernel in 10.04 is SMP enabled, but whilst it will run on newer hardware, it brings its own problems.
You may be able to 'turn off' SMP in BIOS by disabling multi cores. I have done this in experiments with compiling alternative rtai kernels and it does lower latency figures. As time goes on this option may not remain in BIOS, manufacturers will not be able to envisage anyone not wanting to run their latest offering at 'full potential'.
Instead of running a CPU Hog, a far more elegant solution can be to use isolcpus
wiki.linuxcnc.org/cgi-bin/wiki.pl?The_Is..._Parameter_And_GRUB2
Here by use of a kernel parameter at boot time, all bar the highest number processor core are isolated from use and you are back to a single processor dominated by the real time processes again.
The only other thing that may need doing is to disable hyper-threading, if that option exists in BIOS.
"Hyper threading is a technology allows multithreaded software applications to run threads in parallel on a single multi-core processor. This is instead of processing threads in a linear way. Older processors took advantage of dual processing threading in software applications by splitting their instructions into multiple streams so that more than one processor could act upon them at the same time."
(Beyond this, you are into re-compiling the kernel and RT system, tailoring it specifically to your system to extract the optimum latency. This works, and I have a kernel running on my test machine that far out performs the stock one regards latency, but this is all outside the scope of a general FAQ or the needs of the normal user.)
Links
*****
wiki.linuxcnc.org/cgi-bin/wiki.pl?TroubleShooting
wiki.linuxcnc.org/cgi-bin/wiki.pl?EMC_With_Custom_Kernel (SMP)
help.ubuntu.com/8.04/installation-guide/...dware-supported.html (8.04 standard is non smp)
wiki.linuxcnc.org/cgi-bin/wiki.pl?The_Is..._Parameter_And_GRUB2
cvs.gna.org/cvsweb/magma/base/arch/i386/...2Fplain;cvsroot=rtai
************************
You should start with the trouble shooting guide and links therein (largely sections 4 - 9 plus 17)
wiki.linuxcnc.org/cgi-bin/wiki.pl?TroubleShooting
This FAQ is far from a comprehensive or in depth treatise, it just seeks to add some more flesh to the bones in some cases.
Graphics cards and drivers:
****************************
The problems with video cards etc can manifest as poor video rendering, ragged windows, lower part of Axis plot window just a series of horizontal lines, jerky movements etc.
They can also appear to run fairly well, but latency takes a huge hit whilst they are doing so.
Do not confuse this with just getting a blank black square or dot where the Axis plot should be.
This is a classic symptom of an OpenGL problem, which can usually be fixed quite simply by switching to software libraries
( wiki.linuxcnc.org/cgi-bin/wiki.pl?TroubleShooting section 5.2)
Nvidia is a particular favourite for causing problems, but far from the only one to do so.
The standard first advice is to try substituting the open source driver for the proprietory one.
If that doesn't improve matters, try the bog standard vesa driver and / or use an old PCI card like a Matrox or an older ATI.
However these cards are becoming harder to get hold of and may not actually support the screens we are using more and more.
To quote a contributor to the user-lists, regards vesa.
"This does generally help considerably. Unfortunately, the vesa driver, specs written back in 4/3 CRT display days, has not kept pace with the available monitors, which are all only available in 16/9 formats, at least here in the states, so while the vesa driver will work at its maximum 1024x768 resolution, on modern monitors you no longer have 'square pixels, meaning round objects aren't round, but oval.
Occasionally a 3rd party driver will work well. On the box I just retired because the motherboard caps were failing, I had an ATI 9200SE card in it, and switching to the ati driver as above (in xorg.conf) , which in turn loads the correct ATI driver for that card, I was amazed to find that latencies had not taken a big hit (unlike the nvidia drivers which are truly horrible to the latency figures) and that I once again had square pixels on a 16/9 monitor.
In fact it worked amazingly well."
Not all on-board graphics are bad. Some Intel chipsets work well and cause little or no problems.
I have 2 P4 2.8GHz Fujitsu-Siemens PCs with Intel mini ATX boards with on-board video (Intel 82865G video chipset) that return a consistant max jitter of 9,500.
The popular mini ITX Intel Atom D525 board likewise does not seem problematic and returns sub 10K readings.
SMI
***
This problem is peculiar to a particular group of Intel chipsets.
It tends to manifest itself in very regular spikes in latency, 32 or 64 seconds being the commonly mentioned
An explanation of the problem from the magma (real time kernel development) team
"When using RTAI with certain Intel-based motherboards, some people have experienced seemingly unexplainable latencies (many 100's of microseconds) that are unacceptable for the real-time performance they need.
In the past it was believed that this problem was being caused by "buslocking" and that the only way around it was to use different hardware.
While buslocking may indeed be a problem in multiprocessor systems, if you are using a single processor system the more likely culprit is System Management Interrupts (SMI) being generated by the power management hardware on the board.
In a real time environment, SMIs are evil. First off, they can last for hundreds of microseconds, which for many RT applications causes unacceptable jitter. Second, they are the highest priority interrupt in the system (even higher than the NMI). Third, you can't intercept the SMI because it doesn't have a vector in the CPU.
Instead, when the CPU gets an SMI it goes into a special mode and jumps to a hard-wired location in a special SMM address space (which is probably in BIOS ROM). This is why RTAI can't help -- SMI interrupts are essentially "invisible" to it.
The good news is that in many (maybe even all) cases, SMIs can be disabled by twiddling a few bits in the system controller on the motherboard."
This is the advice re SMI and install the de-activating module
wiki.linuxcnc.org/cgi-bin/wiki.pl?FixingSMIIssues
The rtai_smi module only works with specific chipsets, these are;
Intel MB chips
82801AA_0
82801AB_0
82801BA_0
82801BA_10
82801E_0
82801CA_0
82801CA_12
82801DB_0
82801DB_12
82801EB_0
ICH6_0
ICH6_1
ICH6_2
ICH7_0
ICH7_1
ICH8_4
Running lspci -vv will give a print which should show you if one of these chips figures large
I tried hacking the module to get it to accept the rest of the ICH8 family of chips.This worked in so far as the module loaded happily, but latency became far worse!
Doubtless the register bits which rtai_smi changes, have moved or changed meaning in these chips.
So probably best to stick with this list!
NB.
The chipset 82801EB/ER revision 02 (ICH5/ICH5R) seems to work without any SMI problems, I have 2 computers with it in and latency is 9653 loaded, with no spikes.
( see next post re details of the apparent SMI spikes which can in fact be removed via BIOS settings)
The latest magma CVS has ICH10_1 listed in the source (smi-module.c), together with the pci_ids definition in case it is not already in pci_ids.h
If you have a SMI problem with this chip, looks like you should be able to compile the new code and catch that one too.
Disk access
****************
Whilst disk access itself should not be a problem with LinuxCNC up and loaded, it can be, as a consequence of system memory shortage.
If you are trying to run Linuxcnc on a very old machine with very limited system memory, the swap partition is liable to come into play.
When system memory is full and the processor needs to load more data, it 'swaps out' memory pages not currently being accessed to hard disk.
When it finds it needs that data again, it reads it from disc and swaps out something else to make room for it.
This gives rise to the characteristic constant churning of hard drive and slow response of old systems.
Sometimes another stick of memory can have miraculous effects
Which Kernel?
*************
This is an area which does not get discussed much in the base documentation but comes up a lot on the forum in terms of practical advice.
As alluded to in the FAQ on latency problems, the base Ubuntu system kernel version and the Linuxcnc kernel version based upon it, can have a profound effect, not only upon latency, but whether you can even install from Live CD and if you can, whether it will run.
The kernel/Ubuntu version and Linuxcnc version are not the same thing. You can run the latest 2.5 Linuxcnc version on either system, so your primary consideration should be which kernel runs best and gives the best latency on your hardware.
As a general rule of thumb, older P3 and P4 single processor based boards ( and AMD equivilents etc), are more likely to run better on 8.04
Version 10.04 may actually not install at all on these machines, but where it does, it can give horrible latency figures.
On the other hand, the newer multi-core processor based systems will often refuse to even install 8.04, so you may have no option as to which you use.
This does not mean that multi-core machines will run well on 10.04 without additional changes however.
Linuxcnc seems to run best on one, single cored processor, with a kernel to suit.
To the chagrin of people with 8 cored, fire breathing bleeding edge computers, a humble P4 will very often have far better latency.
Why is this?
At the root of this is probably SMP and its effect on real-time processing.
"Symmetric MultiProcessing (SMP) involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance.
In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors."
An explanation of what goes on was postulated by 2 of the developers, in relation to the use of a 'CPU Hog' on a dual cored processor machine.
The CPU Hog is a do nothing program that consumes all spare CPU time with an endless while() loop.
It was found that running this, considerably improved latency on this machine.
"In my experience, the "cpu hog" is able to reduce latencies from 10-20 microseconds down to perhaps 5-7uS.
My own theory ........ about why the cpu hog works is related to cache. The hog uses very little memory, and since it keeps one CPU busy, that CPU never runs any other code. So the RT code doesn't get flushed out of cache, and doesn't have to get fetched back into cache later."
"The trick is to make sure the RT tasks never hop from one CPU to another. Probably having a process always ready to run on one CPU makes the choice easy for the RT scheduler to pick the same CPU always for the RT threads.
The CPU hog doesn't get complete use of a CPU, it has to share the CPU with all other user-mode processes.
What it does is guarantee that that CPU always has a process ready to run."
So returning to the Linuxcnc kernels.
The 2.6.24 based kernel in 8.04 was non-SMP. In other words it was designed for single processors.
The kernel scheduler only had one choice of processor and if the polling thread was fast enough other processes didn't get much of a look in and most of what the rt threads needed was in cache all the time, minimising delays and increasing latency.
The 2.6.32-122-rtai kernel in 10.04 is SMP enabled, but whilst it will run on newer hardware, it brings its own problems.
You may be able to 'turn off' SMP in BIOS by disabling multi cores. I have done this in experiments with compiling alternative rtai kernels and it does lower latency figures. As time goes on this option may not remain in BIOS, manufacturers will not be able to envisage anyone not wanting to run their latest offering at 'full potential'.
Instead of running a CPU Hog, a far more elegant solution can be to use isolcpus
wiki.linuxcnc.org/cgi-bin/wiki.pl?The_Is..._Parameter_And_GRUB2
Here by use of a kernel parameter at boot time, all bar the highest number processor core are isolated from use and you are back to a single processor dominated by the real time processes again.
The only other thing that may need doing is to disable hyper-threading, if that option exists in BIOS.
"Hyper threading is a technology allows multithreaded software applications to run threads in parallel on a single multi-core processor. This is instead of processing threads in a linear way. Older processors took advantage of dual processing threading in software applications by splitting their instructions into multiple streams so that more than one processor could act upon them at the same time."
(Beyond this, you are into re-compiling the kernel and RT system, tailoring it specifically to your system to extract the optimum latency. This works, and I have a kernel running on my test machine that far out performs the stock one regards latency, but this is all outside the scope of a general FAQ or the needs of the normal user.)
Links
*****
wiki.linuxcnc.org/cgi-bin/wiki.pl?TroubleShooting
wiki.linuxcnc.org/cgi-bin/wiki.pl?EMC_With_Custom_Kernel (SMP)
help.ubuntu.com/8.04/installation-guide/...dware-supported.html (8.04 standard is non smp)
wiki.linuxcnc.org/cgi-bin/wiki.pl?The_Is..._Parameter_And_GRUB2
cvs.gna.org/cvsweb/magma/base/arch/i386/...2Fplain;cvsroot=rtai
Last edit: 19 Jul 2013 22:31 by ArcEye.
The topic has been locked.
19 Jul 2013 22:25 #36807
by ArcEye
Replied by ArcEye on topic FAQ - Some Latency Solutions
An addendum to the previous regards one particular Intel chipset
The 82801EB/ER Rev 2 (82801EB_02) chipset was used on some of the later Intel Pentium P4 boards with 2.4GHz and 2.8GHz processors
It is NOT one of the chipsets with which the SMI fix works, yet it presents initially as having a classic SMI problem, with 64 second spaced spikes of 120 - 200 K on
my board, which normally ran unloaded at 5K and heavily loaded at 9K.
I only discovered this when starting a machine had been unused for 6 months.
I got an immediate RTAI error on starting Linuxcnc and did a latency check, which to my horror returned 230K.
It dawned on me that there had been a BIOS fault which I paid little attention to on boot up
When started the BIOS battery was flat and BIOS had defaulted to base settings.
After changing the battery and reverting the BIOS settings the machine went back to a rock solid 5 - 9K max on base thread
The settings in question are all in BIOS;
Fan speed - Enhanced / Full (stops it checking what the processor temp is)
All power management off - as in everything off and shown as disabled
Hyper-threading off
The moral of the story - if it looks like SMI but the chip is not covered by the SMI fix, double check it is not doing the same things as SMI does via BIOS, by turning off everything that could generate an interrupt (temperature checking being a classic)
The 82801EB/ER Rev 2 (82801EB_02) chipset was used on some of the later Intel Pentium P4 boards with 2.4GHz and 2.8GHz processors
It is NOT one of the chipsets with which the SMI fix works, yet it presents initially as having a classic SMI problem, with 64 second spaced spikes of 120 - 200 K on
my board, which normally ran unloaded at 5K and heavily loaded at 9K.
I only discovered this when starting a machine had been unused for 6 months.
I got an immediate RTAI error on starting Linuxcnc and did a latency check, which to my horror returned 230K.
It dawned on me that there had been a BIOS fault which I paid little attention to on boot up
When started the BIOS battery was flat and BIOS had defaulted to base settings.
After changing the battery and reverting the BIOS settings the machine went back to a rock solid 5 - 9K max on base thread
The settings in question are all in BIOS;
Fan speed - Enhanced / Full (stops it checking what the processor temp is)
All power management off - as in everything off and shown as disabled
Hyper-threading off
The moral of the story - if it looks like SMI but the chip is not covered by the SMI fix, double check it is not doing the same things as SMI does via BIOS, by turning off everything that could generate an interrupt (temperature checking being a classic)
The topic has been locked.
Time to create page: 0.108 seconds