Random read errors on Mesa 7i92
16 Jul 2016 01:14 #77539
by PCW
Replied by PCW on topic Random read errors on Mesa 7i92
Well that indicates 2 things
1. I think the read timeout does not work on all systems (Ive seen this as well )
2. The Liva may not be a good enough Preempt-RT network host for 1 KHz operation so you may have to run at say 500 hz
1. I think the read timeout does not work on all systems (Ive seen this as well )
2. The Liva may not be a good enough Preempt-RT network host for 1 KHz operation so you may have to run at say 500 hz
Please Log in or Create an account to join the conversation.
20 Jul 2016 14:57 - 20 Jul 2016 15:00 #77724
by 10K
Replied by 10K on topic Random read errors on Mesa 7i92
I made some more progress.
First, I found that my network chip was causing the random delay. It's a RTL8111G. Linux uses the R8169 driver, and it's been documented that driver takes a smoke break after several hours. I downloaded the R8168 driver from Realtek, and the random delays went away!
I did more experiments the latency. Surprisingly enough, I got good results by using the default UEFI settings, with only one change to grub - adding the maxcpus=1 command. I tried all sorts of other things, and they didn't help.
I ran a Latency Test for 9 hours, and got Max Jitter on the base thread of 24,981, and on the serve thread of 22,473. I was able to get lower results with some UEFI changes and grub tweaks, but that caused me trouble inside of LinuxCNC itself.
I ran another test for about 14 hours using the Latency Plot. In this test, I surfed the internet, ran glxgears, pinged the Mesa card, installed a program, and moved and resized windows. Nothing made any spikes in the plot. The odd thing is that the latency stays below 5,000 for most of the test. The maximum values are seen in the first 30 seconds - 1 minute, and then I don't see any more high values. And if I hit the Reset button, I see the spikes again for about 30 seconds, then consistent low numbers again. If anyone has seen or addressed this problem, please let me know!
Here's the plot right after I hit Reset:
Everything looks good here on Latency, and I would seem to have a workable system. On to the behavior of LinuxCNC.
Thanks to PCW, I have a way to quantify problems I was experiencing inside of LinusCNC. For each change I made above, I ran LinuxCNC and monitored the Parameter servo-thread.tmax. There was only a loose correlation between values I'd see for tmax and the latency I measured. I found that if the value for servo-thread.tmax got above 1.55 million, I'd get an "Unexpected realtime delay" error.
This particular test ran for over 16 hours.
The value for servo-thread.time was generally about 200-250,000. The servo-thread.tmax crept up slowly during the test. It started at about 600,000. At the 13 hour mark, it was about 1.1 million. By the 16 hour mark it was at 2.6 million, and it caused the realtime delay error. I checked it intermittently during the test, and each time the maximum was slightly higher. Has anyone else seen this behavior?
In any case, I seem to have a usable system. I'm unlikely to need to machine anything that will take more than 13 hours. The ECS Liva X is a tiny computer., about the size of four cell phones stacked on top of each other. It mounts easily on the back of my touch screen monitor. It's all-in-one, and is fairly inexpensive. It has three USB ports, wireless and Bluetooth. It runs on DC voltage. The only disadvantage is how long it's taken me to get where I am now!
First, I found that my network chip was causing the random delay. It's a RTL8111G. Linux uses the R8169 driver, and it's been documented that driver takes a smoke break after several hours. I downloaded the R8168 driver from Realtek, and the random delays went away!
I did more experiments the latency. Surprisingly enough, I got good results by using the default UEFI settings, with only one change to grub - adding the maxcpus=1 command. I tried all sorts of other things, and they didn't help.
I ran a Latency Test for 9 hours, and got Max Jitter on the base thread of 24,981, and on the serve thread of 22,473. I was able to get lower results with some UEFI changes and grub tweaks, but that caused me trouble inside of LinuxCNC itself.
I ran another test for about 14 hours using the Latency Plot. In this test, I surfed the internet, ran glxgears, pinged the Mesa card, installed a program, and moved and resized windows. Nothing made any spikes in the plot. The odd thing is that the latency stays below 5,000 for most of the test. The maximum values are seen in the first 30 seconds - 1 minute, and then I don't see any more high values. And if I hit the Reset button, I see the spikes again for about 30 seconds, then consistent low numbers again. If anyone has seen or addressed this problem, please let me know!
Here's the plot right after I hit Reset:
Everything looks good here on Latency, and I would seem to have a workable system. On to the behavior of LinuxCNC.
Thanks to PCW, I have a way to quantify problems I was experiencing inside of LinusCNC. For each change I made above, I ran LinuxCNC and monitored the Parameter servo-thread.tmax. There was only a loose correlation between values I'd see for tmax and the latency I measured. I found that if the value for servo-thread.tmax got above 1.55 million, I'd get an "Unexpected realtime delay" error.
This particular test ran for over 16 hours.
The value for servo-thread.time was generally about 200-250,000. The servo-thread.tmax crept up slowly during the test. It started at about 600,000. At the 13 hour mark, it was about 1.1 million. By the 16 hour mark it was at 2.6 million, and it caused the realtime delay error. I checked it intermittently during the test, and each time the maximum was slightly higher. Has anyone else seen this behavior?
In any case, I seem to have a usable system. I'm unlikely to need to machine anything that will take more than 13 hours. The ECS Liva X is a tiny computer., about the size of four cell phones stacked on top of each other. It mounts easily on the back of my touch screen monitor. It's all-in-one, and is fairly inexpensive. It has three USB ports, wireless and Bluetooth. It runs on DC voltage. The only disadvantage is how long it's taken me to get where I am now!
Last edit: 20 Jul 2016 15:00 by 10K. Reason: typo
Please Log in or Create an account to join the conversation.
21 Jul 2016 11:15 - 21 Jul 2016 11:21 #77763
by vre
Replied by vre on topic Random read errors on Mesa 7i92
Use stock r8169 driver and execute in terminal
Have you tried jessie with 4.6.0-0.bpo.1-rt-686-pae kernel ?
I have this kernel from repowith jessie and no probs with r8169
from my pc
sudo aptitude update
sudo aptitude install firmware-realtek
Have you tried jessie with 4.6.0-0.bpo.1-rt-686-pae kernel ?
I have this kernel from repo
deb http://ftp.debian.org/debian jessie-backports main
from my pc
$ dmesg
..............................
[ 5.050182] r8169 0000:03:00.0: firmware: direct-loading firmware rtl_nic/rtl8168e-3.fw
..............................
Last edit: 21 Jul 2016 11:21 by vre.
Please Log in or Create an account to join the conversation.
01 Aug 2016 15:21 #78238
by 10K
Replied by 10K on topic Random read errors on Mesa 7i92
I did some more work and testing.
I installed Debian Jessie and the 4.6.0-0.bpo.1-rt-686-pae kernel. One nice thing about this kernel is that is has a built-in driver for the ELO touchscreen.
In the set-up and testing of this new installation, I found out that it was the WiFi driver that was giving me bad letencies. I was using the WiFi to give me two LAN ports. I used the internal port with the Mesa, using the rtl8168 driver downloaded from realtek Here are the results of the testing:
I blacklisted rtl8723be for the "none" and "USB" cases.
For reference, I got latency numbers of ~20,000 max and ~2,500 average for the "none" case.
For my previous testing using Wheezy, I had slightly better results using the WiFi with the 80211 driver from ECS / Liva X. It supports both the WiFi and the bluetooth, while the lwfinger version supports only the WiFi. Unfortunately, I got compile errors in Jessie when I tried to use this driver.
I tried some of the switches available in the rtl8723be driver using modprobe. None of them had an effect.
So, unless someone has some suggestions on things to try with the WiFi, I'll stick to using the USB LAN. The only disadvantage to this set-up is that I don't have any free USB ports to plug in a keyboard. Fortunately, I don't need one very often with the touchscreen.
I installed Debian Jessie and the 4.6.0-0.bpo.1-rt-686-pae kernel. One nice thing about this kernel is that is has a built-in driver for the ELO touchscreen.
In the set-up and testing of this new installation, I found out that it was the WiFi driver that was giving me bad letencies. I was using the WiFi to give me two LAN ports. I used the internal port with the Mesa, using the rtl8168 driver downloaded from realtek Here are the results of the testing:
LAN 2 | Chip | Driver | Tmax @ 30 minutes |
none | n/a | n/a | 332,000 |
USB | dm9601 | stock | 555,000 |
WiFi | 80211 | rtl8723be (lwfinger) | 1,937,000 + latency error message |
I blacklisted rtl8723be for the "none" and "USB" cases.
For reference, I got latency numbers of ~20,000 max and ~2,500 average for the "none" case.
For my previous testing using Wheezy, I had slightly better results using the WiFi with the 80211 driver from ECS / Liva X. It supports both the WiFi and the bluetooth, while the lwfinger version supports only the WiFi. Unfortunately, I got compile errors in Jessie when I tried to use this driver.
I tried some of the switches available in the rtl8723be driver using modprobe. None of them had an effect.
So, unless someone has some suggestions on things to try with the WiFi, I'll stick to using the USB LAN. The only disadvantage to this set-up is that I don't have any free USB ports to plug in a keyboard. Fortunately, I don't need one very often with the touchscreen.
Please Log in or Create an account to join the conversation.
01 Aug 2016 18:02 #78247
by PCW
Replied by PCW on topic Random read errors on Mesa 7i92
I know I had trouble with Broadcom MinPCIE WIFI cards and terrible latency
I'm generally suspicious of WIFI cards since they run binary blobs of probably very real time unfriendly code
Some USB WIFI dongles seem OK though
I'm generally suspicious of WIFI cards since they run binary blobs of probably very real time unfriendly code
Some USB WIFI dongles seem OK though
Please Log in or Create an account to join the conversation.
Time to create page: 0.098 seconds