5i25 Deterioration?
22 May 2019 16:26 #134587
by RobTC
5i25 Deterioration? was created by RobTC
I'm a diagnose-via-Google-Fu kinda guy, so I've lurked through most of the relevant threads/mailing lists/etc already, but at this point I'm against a wall and have a sinking feeling the only solution, if there is one, is JTAG. I'm gonna try to relate what I've already done in as much excruciating detail as I remember in case it's relevant. The whole thing is just odd because as I narrowed down the problem scope, things seemed to get worse.
I built/converted the machine in March and it was originally essentially working, albeit with not enough I/O and extremely limited researchable documentation, with a 5i25, SainSmart ST-V3 5-axis BOB on P3 and DB-25 terminal breakout on P2 (for a pendant/MPG and general I/O) running the 5i25_5ABOBx2 firmware. Motors are NEMA 34 closed-loop steppers, and other than an odd issue with z-offsets that I hadn't yet gotten to diagnosing, the motion was great and it made a couple simple test parts.
The computer is a pretty old AMD-based 3GB RAM machine I was given that had a dead Vista install (is there any other type?!) that I threw a cheap GPU into for better latency (~11,000ns) and installed LinuxCNC 2.7 - 32-bit, because it's much less hassle by all accounts.
Because just that one pendant basically took up all of the useable I/O on P2 and I still needed probing, toolchanging, etc, I needed a new plan. Enter the 7i76, a fairly obvious choice. I added the 7i85 on a whim at checkout for glass scales on all axes (which I already owned and were part of the larger plan anyway) but haven't yet tried to use it beyond plugging it into my P2 breakout connector (both cables are Mesa M-M IEEE-1284s).
So a couple of days ago, I tore out the old BOBs, put the new boards in their place, flashed the 5i25 with 5i25_7i76_7i85.bit firmware- which needed two attempts because the first just didn't do anything, which I thought odd at the time but may not be unusual. The firmware tested good and --readhmid gave all the appropriate details. However, it wouldn't let me --reload, claiming the "firmware is too old to support it", a message I've gotten several times since when attempting a random refresh "just in case", despite me using the latest version of everything and it allowing me to --reload with the original 5ABOBx2 firmware flashing.
I wire up the 7i76, with only TB2 1/3/5-positive STP/DIR wires and grounds and ENA X/Y/Z into TB6 output 0,1,2, and fire up PNCConf. All of this early testing was done pretty much exclusively with the "test X axis motor" tool. When that didn't work, I realised the board didn't have power, so I moved the jumper on the 7i76- still nothing. Well, I didn't buy everything together, duh. Pulled the 5i25, moved the jumper to supply cable 5V, and I got the (first) light on the 7i76. Still nothing, of course. I was listening for the "chunk" of the steppers when I hit "Enable".
More reading later, I wired up 12V to TB1 1/8. W1 jumper was already left to supply VIN. I get my second power light, and try PNCConf again. Still no movement! Realising the ramifications of "isolated", I first try putting ENA lines into the fourth axis block of TB2 with a StepGen=3 config, and when that also failed (which was my first clue that it was probably a software issue), I rewired the drivers to have separate STP/DIR and ENA grounds. Still nothing. PNCConf is using "7i76x2 with one 7i76" which I'm assuming it interprets as the best possibility from the onboard bitfile. I checked with HAL oscilloscope that enable was definitely being fired, which it was, but a multimeter on the 7i76 outputs showed no voltage changes.
Lots of reading and minor prodding later, nothing was happening. However, when I opened LinuxCNC itself, it closed with an actual text error, which could tell me something! Looking at the dmesg output, the error always happened here:
(This is a snip from a later verbose echo (attached at bottom), but other than the system time number, it was always identical)
Cue several hours of failing to figure out why the serial was timing out.
That brings us to this morning!
More of the same Googling brought up fairly little though it seemed to be more of a PCI issue than a serial issue; I cleaned the PCI slot and 5i25 contacts with IPA, I tried different PCI slots, still nothing. I was mostly running the 5i25 by itself with nothing connected at this point, since it clearly seemed to be an issue there.
Since PNCConf clearly wasn't going to work due to the issue that was afflicting the realtime thread started by LinuxCNC proper, I finally resorted to step-by-step terminal commands. So a fairly standard...
No matter the presence or absence of a config argument, or trying different sserial_port_0 values, the result was always the same as I'd had the previous two days:
Finally, I even reinstalled LinuxCNC from the original liveUSB in case some bad kernel or hidden config or something weird was hanging out like some results had suggested. After PNCConf decided it wasn't going to play ball for some reason, I sudo apt-get installed mesaflash and checked if it was still happy with a --readhmid. I get the (spoiler: completely empty) config attached below. What?! It already had the 7i76_7i85 bitfile on it when I reinstalled Linux, how could it get wiped by something that should have nothing to do with it?
I tried to write both the 7i76_7i85 and 7i76x2 bitfiles to it, but both failed with a "bad bootsector" or similar error, I didn't take a picture of that screen. When I power cycled it, only one LED (the rightmost, nearest the corner) flashed, and of course when I tried to load it into a realtime thread, I got an invalid cookie error with a "FPGA failed to initialize, or unexpected firmware?". So no firmware, no way of loading any, completely unusable.
That was when I started typing this screed. Since I've been typing and double-checking error texts as I go, it seems to have... Somewhat recovered? I went down to check the "bad bootsector" error wording, but it actually flashed the firmware! It still wouldn't let me --reload, but upon power cycle, both LEDs flashed. I still get the same 1007ms timeout and an exit error of -22, but it does successfully verify the bitfile, so I seem to be back where I've been since yesterday morning.
Any sense whatsoever to anyone?! I'd appreciate some kind of light in the dark here.
I built/converted the machine in March and it was originally essentially working, albeit with not enough I/O and extremely limited researchable documentation, with a 5i25, SainSmart ST-V3 5-axis BOB on P3 and DB-25 terminal breakout on P2 (for a pendant/MPG and general I/O) running the 5i25_5ABOBx2 firmware. Motors are NEMA 34 closed-loop steppers, and other than an odd issue with z-offsets that I hadn't yet gotten to diagnosing, the motion was great and it made a couple simple test parts.
The computer is a pretty old AMD-based 3GB RAM machine I was given that had a dead Vista install (is there any other type?!) that I threw a cheap GPU into for better latency (~11,000ns) and installed LinuxCNC 2.7 - 32-bit, because it's much less hassle by all accounts.
Because just that one pendant basically took up all of the useable I/O on P2 and I still needed probing, toolchanging, etc, I needed a new plan. Enter the 7i76, a fairly obvious choice. I added the 7i85 on a whim at checkout for glass scales on all axes (which I already owned and were part of the larger plan anyway) but haven't yet tried to use it beyond plugging it into my P2 breakout connector (both cables are Mesa M-M IEEE-1284s).
So a couple of days ago, I tore out the old BOBs, put the new boards in their place, flashed the 5i25 with 5i25_7i76_7i85.bit firmware- which needed two attempts because the first just didn't do anything, which I thought odd at the time but may not be unusual. The firmware tested good and --readhmid gave all the appropriate details. However, it wouldn't let me --reload, claiming the "firmware is too old to support it", a message I've gotten several times since when attempting a random refresh "just in case", despite me using the latest version of everything and it allowing me to --reload with the original 5ABOBx2 firmware flashing.
I wire up the 7i76, with only TB2 1/3/5-positive STP/DIR wires and grounds and ENA X/Y/Z into TB6 output 0,1,2, and fire up PNCConf. All of this early testing was done pretty much exclusively with the "test X axis motor" tool. When that didn't work, I realised the board didn't have power, so I moved the jumper on the 7i76- still nothing. Well, I didn't buy everything together, duh. Pulled the 5i25, moved the jumper to supply cable 5V, and I got the (first) light on the 7i76. Still nothing, of course. I was listening for the "chunk" of the steppers when I hit "Enable".
More reading later, I wired up 12V to TB1 1/8. W1 jumper was already left to supply VIN. I get my second power light, and try PNCConf again. Still no movement! Realising the ramifications of "isolated", I first try putting ENA lines into the fourth axis block of TB2 with a StepGen=3 config, and when that also failed (which was my first clue that it was probably a software issue), I rewired the drivers to have separate STP/DIR and ENA grounds. Still nothing. PNCConf is using "7i76x2 with one 7i76" which I'm assuming it interprets as the best possibility from the onboard bitfile. I checked with HAL oscilloscope that enable was definitely being fired, which it was, but a multimeter on the 7i76 outputs showed no voltage changes.
Lots of reading and minor prodding later, nothing was happening. However, when I opened LinuxCNC itself, it closed with an actual text error, which could tell me something! Looking at the dmesg output, the error always happened here:
[ 1246.723476] hm2_pci: discovered 5i25 at 0000:03:0a.0
[ 1247.735413] hm2/hm2_5i25.0: hm2_sserial_waitfor: Timeout (1007mS) waiting for addr 5a00 &mask ffffffff val 1
[ 1247.735418] hm2/hm2_5i25.0: DATA addr 5b00 after timeout: 0
[ 1247.735421] hm2/hm2_5i25.0: failed to parse Module Descriptor 4
[ 1247.735429] hm2_5i25.0: board fails HM2 registration
[ 1247.735467] hm2_pci: probe of 0000:03:0a.0 failed with error -22
(This is a snip from a later verbose echo (attached at bottom), but other than the system time number, it was always identical)
Cue several hours of failing to figure out why the serial was timing out.
That brings us to this morning!
More of the same Googling brought up fairly little though it seemed to be more of a PCI issue than a serial issue; I cleaned the PCI slot and 5i25 contacts with IPA, I tried different PCI slots, still nothing. I was mostly running the 5i25 by itself with nothing connected at this point, since it clearly seemed to be an issue there.
Since PNCConf clearly wasn't going to work due to the issue that was afflicting the realtime thread started by LinuxCNC proper, I finally resorted to step-by-step terminal commands. So a fairly standard...
~$ halrun
halcmd: loadrt threads name1=th period1=1000000
halcmd: loadrt hostmot2
halcmd: loadrt hm2_pci
No matter the presence or absence of a config argument, or trying different sserial_port_0 values, the result was always the same as I'd had the previous two days:
Error: could not insert module /usr/realtime-3.4.9-rtai-686-pae/modules/linuxcnc/hm2_pci.ko: Invalid parameters
<stdin>:3: exit value: 1
<stdin>:3: insmod for hm2_pci failed, returned -1
Finally, I even reinstalled LinuxCNC from the original liveUSB in case some bad kernel or hidden config or something weird was hanging out like some results had suggested. After PNCConf decided it wasn't going to play ball for some reason, I sudo apt-get installed mesaflash and checked if it was still happy with a --readhmid. I get the (spoiler: completely empty) config attached below. What?! It already had the 7i76_7i85 bitfile on it when I reinstalled Linux, how could it get wiped by something that should have nothing to do with it?
I tried to write both the 7i76_7i85 and 7i76x2 bitfiles to it, but both failed with a "bad bootsector" or similar error, I didn't take a picture of that screen. When I power cycled it, only one LED (the rightmost, nearest the corner) flashed, and of course when I tried to load it into a realtime thread, I got an invalid cookie error with a "FPGA failed to initialize, or unexpected firmware?". So no firmware, no way of loading any, completely unusable.
That was when I started typing this screed. Since I've been typing and double-checking error texts as I go, it seems to have... Somewhat recovered? I went down to check the "bad bootsector" error wording, but it actually flashed the firmware! It still wouldn't let me --reload, but upon power cycle, both LEDs flashed. I still get the same 1007ms timeout and an exit error of -22, but it does successfully verify the bitfile, so I seem to be back where I've been since yesterday morning.
Any sense whatsoever to anyone?! I'd appreciate some kind of light in the dark here.
Please Log in or Create an account to join the conversation.
22 May 2019 17:05 #134591
by PCW
Replied by PCW on topic 5i25 Deterioration?
Sounds like either a bad 5I25 or a bad motherboard the wrote garbage to the 5I25...
Please Log in or Create an account to join the conversation.
22 May 2019 19:32 #134602
by RobTC
Replied by RobTC on topic 5i25 Deterioration?
Thanks for the fast response. According to dmidecode it's an "ASUSTek NODUSM3 v1.05", and I haven't seen any other issues with it. That's about the extent of what I can say about it, I'm not sure if there'd be a feasible way of diagnosing a possibly-existent mysterious issue either.
My other machines are both Windows 10 with ASUS Z87-PRO and X99-DELUXE II motherboards, and therefore only have PCI-Express slots. I managed to run Mesaflash from the supplied utilities via cmd, but without a way to install the card to test, I'm not sure what to do about it.
Seems like my options currently are:
1) Buy a PCIe-PCI adapter for this single use and hope it's one of the ones that works and doesn't take 10 weeks to arrive.
2) buy a different machine with a PCI socket to test the existing 5i25, may still have to get a new one regardless
3) bite the bullet, buy a new 5i25. Or RMA this one? Might be similarly expensive depending on the problem.
4) buy a 6i25 and turf the GPU since the fast thread is on the FPGA anyway
I'm not sure which seems like the most practical option!
How feasible is the latter, do you think? Latency test sans GPU was in the 63,000ns range as I recall, which is still plenty usable even for a modest fast thread, never mind the 1ms servo thread it would actually be running. It would also be usable/testable/flashable by all my other computers, and still programs with the same 5i25 binaries so I wouldn't need to change anything.
If it is the motherboard, it's probably easier to acquire a PCIe 1x/4x-only unit these days, and recycle this one into a USB-based bCNC/grbl project.
My other machines are both Windows 10 with ASUS Z87-PRO and X99-DELUXE II motherboards, and therefore only have PCI-Express slots. I managed to run Mesaflash from the supplied utilities via cmd, but without a way to install the card to test, I'm not sure what to do about it.
Seems like my options currently are:
1) Buy a PCIe-PCI adapter for this single use and hope it's one of the ones that works and doesn't take 10 weeks to arrive.
2) buy a different machine with a PCI socket to test the existing 5i25, may still have to get a new one regardless
3) bite the bullet, buy a new 5i25. Or RMA this one? Might be similarly expensive depending on the problem.
4) buy a 6i25 and turf the GPU since the fast thread is on the FPGA anyway
I'm not sure which seems like the most practical option!
How feasible is the latter, do you think? Latency test sans GPU was in the 63,000ns range as I recall, which is still plenty usable even for a modest fast thread, never mind the 1ms servo thread it would actually be running. It would also be usable/testable/flashable by all my other computers, and still programs with the same 5i25 binaries so I wouldn't need to change anything.
If it is the motherboard, it's probably easier to acquire a PCIe 1x/4x-only unit these days, and recycle this one into a USB-based bCNC/grbl project.
Please Log in or Create an account to join the conversation.
22 May 2019 20:05 #134604
by PCW
Replied by PCW on topic 5i25 Deterioration?
Without another PCI motherboard to try, its pretty hard to diagnose
On possibility assuming you are in the US is to send the card back for testing/verification.
Alternatively you could try another PC. Around here discarded PCs are a dime a dozen and it would be cheaper to acquire one than ship the card back.
On possibility assuming you are in the US is to send the card back for testing/verification.
Alternatively you could try another PC. Around here discarded PCs are a dime a dozen and it would be cheaper to acquire one than ship the card back.
Please Log in or Create an account to join the conversation.
22 May 2019 22:32 #134627
by RobTC
Replied by RobTC on topic 5i25 Deterioration?
Here in the boondocks of VA it's probably cheaper to acquire a junk car than a junk computer, despite our proximity to Tech. Cheapest option outside of a lucky Goodwill find seems to be around $50 or so. Considering the (pretty reasonable) price of your cards, I'm not sure that's a particularly viable number for a diagnosis and potentially more delay and expense. I'm more interested in production than hobby tinkering. It's kind of irritating considering the card was only a month old and I've been extremely cautious about power and static when dealing with it.
Sending the card back is certainly an option, what all would that entail? And what could the potential outcomes be? Though if shipping would be as expensive as acquiring another machine (and I suppose it is $15-20 each way, at least, plus whatever you charge for actual work done), maybe that's less of an option.
Outside of that, I'm more inclined to try a 6i25 and take the minor graphical performance hit from integrated graphics, but leaving my options open for diagnostics and a future mini-ITX integration into the main CNC enclosure.
Sending the card back is certainly an option, what all would that entail? And what could the potential outcomes be? Though if shipping would be as expensive as acquiring another machine (and I suppose it is $15-20 each way, at least, plus whatever you charge for actual work done), maybe that's less of an option.
Outside of that, I'm more inclined to try a 6i25 and take the minor graphical performance hit from integrated graphics, but leaving my options open for diagnostics and a future mini-ITX integration into the main CNC enclosure.
Please Log in or Create an account to join the conversation.
22 May 2019 23:12 #134629
by PCW
Replied by PCW on topic 5i25 Deterioration?
The card is still under warranty
(and cards that have been abused typically either have a crowbarred
FPGA = hot or bad I/O bits) So I would send it back for evaluation.
If it passes all tests its suggests a problem with your hardware,
but if there is anything wrong we will repair/replace it
(and cards that have been abused typically either have a crowbarred
FPGA = hot or bad I/O bits) So I would send it back for evaluation.
If it passes all tests its suggests a problem with your hardware,
but if there is anything wrong we will repair/replace it
The following user(s) said Thank You: RobTC
Please Log in or Create an account to join the conversation.
23 May 2019 10:25 #134683
by RobTC
Replied by RobTC on topic 5i25 Deterioration?
That sounds good. Any idea on an approximate turnaround time? Debating whether to buy a second/backup 6i25 to use in the meantime (which would transfer to a future 5-axis gantry router build, so I'm less concerned with that cost). If it's likely a week or less then there's probably not much point though.
What should I do to get the ball rolling on the evaluation?
What should I do to get the ball rolling on the evaluation?
Please Log in or Create an account to join the conversation.
24 May 2019 20:58 #134847
by RobTC
Replied by RobTC on topic 5i25 Deterioration?
I grabbed the 6i25 just to be on the safe side, should arrive Monday.
I also found a refurb i3 Dell for $38 shipped on eBay, so I nabbed it. That should take care of any amd64 issues (which, while apparently not reproducible, I have seen brought up elsewhere). I also threw in one of the ASMedia 1083-chipset-based PCIe-PCI risers mentioned in the other thread since apparently it works at least well enough for testing, maybe production. I'll have to deal with form factor and mounting shield issues at some point, but for now it should all at least go together.
If the 5i25 is still not liking either Windows or x64 (which is still "amd64", but...y'know) architecture, then I'll email the sales address on the Mesa store.
I also found a refurb i3 Dell for $38 shipped on eBay, so I nabbed it. That should take care of any amd64 issues (which, while apparently not reproducible, I have seen brought up elsewhere). I also threw in one of the ASMedia 1083-chipset-based PCIe-PCI risers mentioned in the other thread since apparently it works at least well enough for testing, maybe production. I'll have to deal with form factor and mounting shield issues at some point, but for now it should all at least go together.
If the 5i25 is still not liking either Windows or x64 (which is still "amd64", but...y'know) architecture, then I'll email the sales address on the Mesa store.
Please Log in or Create an account to join the conversation.
24 May 2019 20:59 #134848
by RobTC
Replied by RobTC on topic 5i25 Deterioration?
(Or Tuesday, rather, I don't keep up with all these holidays...)
Please Log in or Create an account to join the conversation.
01 Jun 2019 22:28 #135590
by Bari
Replied by Bari on topic 5i25 Deterioration?
I haven't ever used Intel for Linuxcnc in 15+ years. What amd64 issues are there with LCNC? Don't mean to hijack the thread or argue. Just wondering.
Please Log in or Create an account to join the conversation.
Time to create page: 0.081 seconds