Python Interface: stat.g5x_index returns 0 at startup
- fletch
- Topic Author
- Offline
- Premium Member
Less
More
- Posts: 131
- Thank you received: 69
23 Jul 2023 11:06 #276066
by fletch
SOLVED (maybe): Python Interface: stat.g5x_index returns 0 at startup was created by fletch
This is a follow up forum post to the github issue I raised here:
#2590
. First thanks to Chris and Krzysztof for helping.
A brief recap: the Python Interface would return the g5x_index as 0 (rather than 1) at startup - this affected gmoccapy, qtdragon and the Manualmatic pendant. However, it would only happen on a restart of LinuxCNC, so I would comment out the call to the Manualmatic component and the bug would still exist in a wholly vanilla sim. On qtdragon the WCS would be (wrongly) displayed as G59.3 and on both qtdragon and gmoccapy choosing/selecting G54 would not change the g54_index reported by the Python Interface unless you selected a different WCS first. So the issue was affecting *something* within LinuxCNC (or at least, the Python Interface).
As it turned out, the issue was triggered by the Manualmatic component, despite the component only ever reading both g5x_index and (at startup) only g5x_offsets (and not being loaded on a restart). The Manualmatic class doesn't do any magic or trickery - it just uses the provided Python Interface as intended.
The 'fix' was equally curious: adding a time.sleep(0.1) between the creation of the lambdas (that ultimately but not immediately read g5x_index and g5x_offsets) appeared to solve the problem. How? To me that makes absolutely no sense but it is consistently repeatable. How can an external Python class definition (because the time.sleep(0.1) is in the lamda creation, not in the call to stat.g5x_index) affect the LinuxCNC state?
I have now moved the lambda creation to the __init__ of the Manualmatic Python class (without the time.sleep(0.1) and this also appears to resolve the issue (apart from on one very fast stop/restart - and I can't get it to happed again).
So why am I logging this here? Mainly I'd like to understand how a fairly noddy Python class triggered the issue - I don't quite trust my 'fix' without understanding the cause but also because this took me a lot of hours to track down and don't to inflict that on anyone else. I don't know the innards of .pyc creation (nor LinuxCNC for that matter!) but is it possible this is how the issue is 'forwarded' to the next startup?
I don't expect this issue to affect anyone else unless perhaps you are writing a Python component. If though your interest is piqued, the version of Manualmatic.py that triggers the issue is from this commit (and prior). I will be refactoring, cleaning up and documenting the class after this commit:
github.com/Stutchbury/Manualmatic-Pendan...bcd22056cd536c98f921
The version of LinuxCNC I'm on is Debian 12 and 2.9.0-pre1-1011-gebf3cb2c6 (from the buildbot debs on ~20 Jul 2023).
I've posted this to the general topic but could we please have a 'Python Interface' topic under 'User Interfaces' as it is a core part LinuxCNC (ie it is not an 'Other' user interface)?
A brief recap: the Python Interface would return the g5x_index as 0 (rather than 1) at startup - this affected gmoccapy, qtdragon and the Manualmatic pendant. However, it would only happen on a restart of LinuxCNC, so I would comment out the call to the Manualmatic component and the bug would still exist in a wholly vanilla sim. On qtdragon the WCS would be (wrongly) displayed as G59.3 and on both qtdragon and gmoccapy choosing/selecting G54 would not change the g54_index reported by the Python Interface unless you selected a different WCS first. So the issue was affecting *something* within LinuxCNC (or at least, the Python Interface).
As it turned out, the issue was triggered by the Manualmatic component, despite the component only ever reading both g5x_index and (at startup) only g5x_offsets (and not being loaded on a restart). The Manualmatic class doesn't do any magic or trickery - it just uses the provided Python Interface as intended.
The 'fix' was equally curious: adding a time.sleep(0.1) between the creation of the lambdas (that ultimately but not immediately read g5x_index and g5x_offsets) appeared to solve the problem. How? To me that makes absolutely no sense but it is consistently repeatable. How can an external Python class definition (because the time.sleep(0.1) is in the lamda creation, not in the call to stat.g5x_index) affect the LinuxCNC state?
I have now moved the lambda creation to the __init__ of the Manualmatic Python class (without the time.sleep(0.1) and this also appears to resolve the issue (apart from on one very fast stop/restart - and I can't get it to happed again).
So why am I logging this here? Mainly I'd like to understand how a fairly noddy Python class triggered the issue - I don't quite trust my 'fix' without understanding the cause but also because this took me a lot of hours to track down and don't to inflict that on anyone else. I don't know the innards of .pyc creation (nor LinuxCNC for that matter!) but is it possible this is how the issue is 'forwarded' to the next startup?
I don't expect this issue to affect anyone else unless perhaps you are writing a Python component. If though your interest is piqued, the version of Manualmatic.py that triggers the issue is from this commit (and prior). I will be refactoring, cleaning up and documenting the class after this commit:
github.com/Stutchbury/Manualmatic-Pendan...bcd22056cd536c98f921
The version of LinuxCNC I'm on is Debian 12 and 2.9.0-pre1-1011-gebf3cb2c6 (from the buildbot debs on ~20 Jul 2023).
I've posted this to the general topic but could we please have a 'Python Interface' topic under 'User Interfaces' as it is a core part LinuxCNC (ie it is not an 'Other' user interface)?
Please Log in or Create an account to join the conversation.
- fletch
- Topic Author
- Offline
- Premium Member
Less
More
- Posts: 131
- Thank you received: 69
24 Jul 2023 09:56 #276141
by fletch
Replied by fletch on topic Python Interface: stat.g5x_index returns 0 at startup
At about 2am last night I believe I found the root cause. Attached is a very small test component - a distillation of the linuxcnc.command call that triggers the issue.
On the Manualmatic, the estop switch is monitored and if necessary the estop state is changed from STATE_ESTOP to STATE_ESTOP_RESET. There is no doubt an argument made that there is a better way to handle estops and I'm happy to hear them (the implementation is a 'backstop' if no other estop handler is customised) but the Python Interface allows this command and should handle it without getting a brain full of wasps.
Setting the estop_reset = True in the attached test_component file will trigger the issue - you may get lucky and it will be triggered on the first attempt or it may take 4 or 5 restarts but it will eventually trigger. If you're really lucky you may also get the shmem (shared memory?) rash all over your output. I've attached sample outputs for each of the above scenarios.
Regardless of the particular use case (unless I've missed something fundamental, in which case I'll blush & apologise profusely claiming I'm tired & emotional) I believe this is a 'bug' in the Python Interface and the issue #2590 should definitely, maybe, possibly be reopened.
On the Manualmatic, the estop switch is monitored and if necessary the estop state is changed from STATE_ESTOP to STATE_ESTOP_RESET. There is no doubt an argument made that there is a better way to handle estops and I'm happy to hear them (the implementation is a 'backstop' if no other estop handler is customised) but the Python Interface allows this command and should handle it without getting a brain full of wasps.
Setting the estop_reset = True in the attached test_component file will trigger the issue - you may get lucky and it will be triggered on the first attempt or it may take 4 or 5 restarts but it will eventually trigger. If you're really lucky you may also get the shmem (shared memory?) rash all over your output. I've attached sample outputs for each of the above scenarios.
Regardless of the particular use case (unless I've missed something fundamental, in which case I'll blush & apologise profusely claiming I'm tired & emotional) I believe this is a 'bug' in the Python Interface and the issue #2590 should definitely, maybe, possibly be reopened.
Please Log in or Create an account to join the conversation.
Time to create page: 0.044 seconds