Suggestions for debugging?

More
19 Oct 2020 02:53 #186544 by quarklark
Hi everyone,

My shop recently got a Precix Mesa CNC mill running, and I ran into these errors while running a test job:



The job was running just fine until the machine halted, but the computer seemed to think that it was still running. I've not been able to reliably replicate the issue, but it has happened three times. Twice, it halted about an hour into a long drag engrave job, and once it halted while jogging the machine after restarting from the previous crash.

Any suggestions on what to investigate? Or how to replicate the issue?

Our machine uses a Mesa PCI card with the 7i76 Daughter Card. I'm currently working on collecting the configuration files and more details on the machine itself. (Let me know if anything would be particularly helpful.)

I'm personally new to linuxcnc, and just learning the details of our machine. and any pointers would be greatly appreciated!

Thanks!
Attachments:

Please Log in or Create an account to join the conversation.

More
19 Oct 2020 03:40 - 19 Oct 2020 03:41 #186548 by PCW
Replied by PCW on topic Suggestions for debugging?
The cascade of sserial errors suggest that you have lost communication with the 7I76

Possible causes include:

Cabling problems between the 7I76 and the 5I25

Loss of 5V at the 7I76

Loss of field power at the 7I76
Last edit: 19 Oct 2020 03:41 by PCW.

Please Log in or Create an account to join the conversation.

More
19 Oct 2020 17:51 #186598 by quarklark
Thanks for the quick reply!

I'm surprised there's no feedback from the 7I76 to the 5I25, and the program kept running. Is there a way to have linuxcnc pause the program if communication is interrupted? (Rather than continuing on?)

Regarding cabling, we were thinking that our cable run may be a bit too long, and that noise may be a factor. Were you thinking along those lines, or just general damage / connection issues?

To determine if we're loosing power at the 7I76, I guess it's the sort of thing that we'll need to encounter the problem again, then probe to see if power is where we'd expect it to be?

Also, does linuxcnc have a centralized logging system? I happened to take that picture of the errors, but I imagine it would be much easier to look at a log file to see what's going on.

Please Log in or Create an account to join the conversation.

More
19 Oct 2020 18:29 #186600 by PCW
Replied by PCW on topic Suggestions for debugging?

Thanks for the quick reply!

I'm surprised there's no feedback from the 7I76 to the 5I25, and the program kept running. Is there a way to have linuxcnc pause the program if communication is interrupted? (Rather than continuing on?)


There is feedback but you would need to connect it in hal:

hm2_5i25.0.sserial.port-0.run

reports the status of the sserial port

Regarding cabling, we were thinking that our cable run may be a bit too long, and that noise may be a factor. Were you thinking along those lines, or just general damage / connection issues?


Noise issues will generally show up as CRC errors. You got a sserial break error
which means a whole character time noise impulse which is very unlikely.
Loss of 5V power at the 7I76 is my best guess as to what happened.

To determine if we're loosing power at the 7I76, I guess it's the sort of thing that we'll need to encounter the problem again, then probe to see if power is where we'd expect it to be?


Yes, and check that its OK in normal operation. Marginal supplies or external shorts in say encoder
or stepgen +5V wiring can cause this kind of issue.

Also, does linuxcnc have a centralized logging system? I happened to take that picture of the errors, but I imagine it would be much easier to look at a log file to see what's going on.


For RTAI based systems logging is in the kernel log (view with dmesg).
for Preempt-RT systems the log is dumped to the console device,
so if you start linuxcnc from the command line you can watch the log there,

Please Log in or Create an account to join the conversation.

More
19 Oct 2020 19:11 #186607 by quarklark
Perfect! Thanks so much. I'll investigate and give an update on what I find.

I'll see if I can hook up the feedback in the HAL, as you mentioned.

Please Log in or Create an account to join the conversation.

More
21 Oct 2020 03:08 - 21 Oct 2020 03:16 #186785 by quarklark
I found that the errors in the initial report can be replicated by (accidentally) powering off the control box while running the program. The program continues, but the errors appear exactly as before.

However, after re-running the program, I encountered the same machine stop problem but this time there were no errors. It appears as though the 7I76 is still working, but motor stepgen controllers, it seems like they've lost power. I confirmed that 12V is still being supplied to them, and restarting the system brought everything back online.

For reference, I've posted the config files here: github.com/quarklark/GSS-CNC/tree/master/configs
Last edit: 21 Oct 2020 03:16 by quarklark.

Please Log in or Create an account to join the conversation.

Time to create page: 0.079 seconds
Powered by Kunena Forum