Is my 7i76e about to expire?

More
29 Aug 2022 14:35 #250686 by charleyann
I've been getting read errors more and more often.

Lately the hm2_7i76e.0.read.tmax has been hitting 15,000,000.
The machine will be running commands or just sitting there and hm2_7i76e.0.read.tmax will be around 400,000
and then bam it jumps up and errors out.

I have tried 3 different computers all with good latency numbers around 25,000ns.
All with Debian bullseye. Mate, xfce and LxQt have all been used with QtDragon and Probe Basic.

The machine has been operational for almost 2 years without any issues like this.
The errors started early this summer and have been getting more frequent.
I thought it might be heat related but it happened this morning right after a cold start.

I've read through many threads about trouble shooting this, nothing has had any effect. I'm thinking it's the 7i76e.

Any thoughts?

Please Log in or Create an account to join the conversation.

More
29 Aug 2022 15:00 #250687 by PCW
Replied by PCW on topic Is my 7i76e about to expire?
If you get a read error, does the /INIT LED light on the 7I76E?

What are the ping times?

(run for a minute or so while stressing the PC)

Please Log in or Create an account to join the conversation.

More
29 Aug 2022 16:37 #250695 by arvidb
Replied by arvidb on topic Is my 7i76e about to expire?
A long shot, but have you tried a different cable?

I know any Ethernet cable should work, but while fooling around running 'ping -i 0.002 <Mesa-7i96S-ip>' using an old CAT5E cable (3 m long) I got a noticeable amount of lost packets. After changing to a new CAT6 cable I got none. Not sure how significant it is (or if it would even show up as large tmax times) - but yeah, maybe worth a try?

Please Log in or Create an account to join the conversation.

More
29 Aug 2022 16:43 #250697 by charleyann
Ping times are good:
cncv1@cncv1-1:~$ ping 10.10.10.10
PING 10.10.10.10 (10.10.10.10) 56(84) bytes of data.
64 bytes from 10.10.10.10: icmp_seq=1 ttl=64 time=0.133 ms
64 bytes from 10.10.10.10: icmp_seq=2 ttl=64 time=0.071 ms
64 bytes from 10.10.10.10: icmp_seq=3 ttl=64 time=0.069 ms
64 bytes from 10.10.10.10: icmp_seq=4 ttl=64 time=0.069 ms
64 bytes from 10.10.10.10: icmp_seq=5 ttl=64 time=0.071 ms

The INIT led has been on in the past.

This is so frustrating! I had half a dozen read errors in the first half hour this morning. Now its over two hours and I can't get it to fail.
This is what I have been seeing for the last couple of months. Many errors in a short time mixed with long run times with no errors!

Looking at tmax I can see its right on the edge:
cncv1@cncv1-1:~$ halcmd show param *.tmax  
Parameters:
Owner   Type  Dir         Value  Name
    42  s32   RW          85799  classicladder.0.refresh.tmax
    71  s32   RW          47605  dbmist.tmax
    68  s32   RW          68668  debounce.0.tmax
    36  s32   RW              0  hm2_7i76e.0.read-request.tmax
    36  s32   RW        2794658  hm2_7i76e.0.read.tmax
    36  s32   RW         383016  hm2_7i76e.0.write.tmax

Generally anything over 3000000 will cause the error.



 

Please Log in or Create an account to join the conversation.

More
29 Aug 2022 17:10 #250701 by PCW
Replied by PCW on topic Is my 7i76e about to expire?
If the /init light is on it suggests more of a host or al least
Ethernet communications issue.

Does dmesg show a link up/down transition?

The write tmax is quite bad also which points to a host issue

Please Log in or Create an account to join the conversation.

More
29 Aug 2022 17:11 #250702 by tommylight
First change the cable, i had the same issue as ArviDB with a 5M cable, it would loose connection at random.
Then pay attention at the other equipment in the shop, what is turned on when connection gets flaky?

Please Log in or Create an account to join the conversation.

More
29 Aug 2022 17:47 #250708 by charleyann
ok, more testing.

I have backups of two linuxcnc installations, one with debian 11 mate and the other with Debian 11 LxQt
Using Clonezilla I can switch back and forth in about 10 minutes.

Both installations use the same 5.10.0-17 rt kernel

With LxQt when I start up Probe basic I get the read error immediatly
ncv1@cncv1-1:~$ halcmd show param *.tmax
Parameters:
Owner   Type  Dir         Value  Name
    42  s32   RW          36994  classicladder.0.refresh.tmax
    71  s32   RW          29561  dbmist.tmax
    68  s32   RW          16164  debounce.0.tmax
    36  s32   RW              0  hm2_7i76e.0.read-request.tmax
    36  s32   RW       15480633  hm2_7i76e.0.read.tmax
    36  s32   RW          77653  hm2_7i76e.0.write.tmax
    
With Mate I get the results I've described earlier.

Out of desperation I added everything I've tried in the past to boot parameters and now the mate installation seems stable (for how long I don't know)
I have tried all of these in  the past but not all combined.


Grub as of now:
GRUB_CMDLINE_LINUX_DEFAULT="splash isolcpus=1 intel_idle.max_cstate=0 processor.max_cstate=0 acpi_irq_nobalance noirqbalance resume=UUID=5ec67429-5af1-4b03-9800-a8b2fc0ec9dd"

I have changed the cable in the past, but I'll try another one.

Interesting that there is so much difference between the two OS's wit the same kernel.
 

Please Log in or Create an account to join the conversation.

More
29 Aug 2022 18:37 #250717 by tommylight
Now i can add, check the PC power supply or try another one as checking it requires a scope.
Do you have another PC to test with?
No need to install anything, just yank the drive and put it in the other PC.

Please Log in or Create an account to join the conversation.

More
29 Aug 2022 18:48 #250720 by charleyann
Interesting, all of my testing has been with 3 motherboards but the same case and power supply.
I'll swap out the power supply and see what happens.

Please Log in or Create an account to join the conversation.

More
29 Aug 2022 20:25 #250729 by charleyann
Swapped out the power supply and it got worse hm2_7i76e.0.read.tmax went to 1505820
Tried another and I'm back to about what I had with the original. I only have the two extras to test with.
This is what I have now:
cncv1@cncv1-1:~$ halcmd show param *.tmax
Parameters:
Owner   Type  Dir         Value  Name
    42  s32   RW          56029  classicladder.0.refresh.tmax
    71  s32   RW          30192  dbmist.tmax
    68  s32   RW          39133  debounce.0.tmax
    36  s32   RW              0  hm2_7i76e.0.read-request.tmax
    36  s32   RW         499782  hm2_7i76e.0.read.tmax
    36  s32   RW         127248  hm2_7i76e.0.write.tmax
    77  s32   RW          48908  hold1and.tmax
    80  s32   RW          30413  hold1or.tmax
    77  s32   RW          37965  hold2and.tmax
    80  s32   RW          17408  hold2or.tmax
    71  s32   RW          17121  holdtgl.tmax
    74  s32   RW          30097  holdtgln.tmax
    86  s32   RW          44482  ilowpass.0.tmax
    29  s32   RW          50874  motion-command-handler.tmax
    29  s32   RW          90288  motion-controller.tmax
    83  s32   RW          30428  mpgincr.tmax
    39  s32   RW          44037  pid.a.do-pid-calcs.tmax
    39  s32   RW          45234  pid.s.do-pid-calcs.tmax
    39  s32   RW          46969  pid.x.do-pid-calcs.tmax
    39  s32   RW          38577  pid.y.do-pid-calcs.tmax
    39  s32   RW          43093  pid.z.do-pid-calcs.tmax
    89  s32   RW          16823  probe-sel.tmax
    30  s32   RW         683116  servo-thread.tmax
    77  s32   RW          29899  start1and.tmax
    77  s32   RW          29883  start2and.tmax
    74  s32   RW          39159  tglmist.tmax

I also swapped out the cable, didn't see any change.
I'll leave it run the rest of the day and see what happens.
 

Please Log in or Create an account to join the conversation.

Moderators: PCWjmelson
Time to create page: 0.218 seconds
Powered by Kunena Forum