KDE (Plasma) desktop freezes, but machine remains operable

More
12 Nov 2020 11:16 #189100 by seuchato
Sorry to read that. Was not my intention to get you from bad to worse.
Kindly state what machine you are on.
My PCs/Laptops that did strange things erratically, then more and more often had some hardware issues. Hints I can give:
  • memtest
  • if it has a GPU (nouveau and nv50 are indications to an nvidia GPU), I'd exchange that
  • As RodW pointet out: remove and reinsert all pci/Pciexpress cards plus USB stuff. I'd remove everything not needed for milling anyway
  • What about disc integrity? smartctl /dev/sdX -a should give you some clue on the discs health status. the file systems integrity can be checked using fsck plus options corresponding to your setup
  • A dying machine would probably signal what you observe

hth
chris

Please Log in or Create an account to join the conversation.

More
14 Nov 2020 05:51 #189298 by JetForMe
Probably coincidental, but as I was installing memtest on the computer, with the mill active and homed but not running anything, LinuxCNC got this error:
hm2/hm2_7i76e.0: error finishing read! iter=18471737
hm2/hm2_7i76e.0: error finishing read! iter=18471737

Unexpected realtime delay on task 0 with period 1000000
This Message will only display once per session.
Run the Latency Test and resolve before continuing.
Unexpected realtime delay on task 0 with period 1000000
This Message will only display once per session.
Run the Latency Test and resolve before continuing.

I've installed tons of stuff with LinuxCNC running before and never ran into that (on Debian 9, though).

Anyway, that's got me worried. I vaguely recall running a latency test when I was first getting started with LinuxCNC.

I'll try the other things you suggest. I already re-seated everything.

Computer is a fairly cheap thing I built myself a couple years ago from stuff I bought at Fry's. Hardwired Ethernet (no Wi-Fi), nVidia GT710 graphics, 8 GB RAM.
# dmidecode -t 2
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.1.1 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
	Manufacturer: Gigabyte Technology Co., Ltd.
	Product Name: Z370N WIFI-CF
	Version: x.x
	Serial Number: Default string
	Asset Tag: Default string
	Features:
		Board is a hosting board
		Board is replaceable
	Location In Chassis: Default string
	Chassis Handle: 0x0003
	Type: Motherboard
	Contained Object Handles: 0

Graphics:
01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 710] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. [MSI] GK208B [GeForce GT 710]
	Flags: bus master, fast devsel, latency 0, IRQ 136
	Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e8000000 (64-bit, prefetchable) [size=128M]
	Memory at f0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at e000 [size=128]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: nouveau
	Kernel modules: nouveau

Disk:
# smartctl /dev/nvme0 -a
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-12-rt-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Patriot Scorch M2
Serial Number:                      CD8E0787049E92033077
Firmware Version:                   E8FM11.5
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Total NVM Capacity:                 128,035,676,160 [128 GB]
Unallocated NVM Capacity:           0
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          128,035,676,160 [128 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            6479a7 0b74628135
Local Time is:                      Fri Nov 13 21:48:24 2020 PST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     94 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     3.00W       -        -    0  0  0  0        0       0
 1 +     2.00W       -        -    1  1  1  1        0       0
 2 +     2.00W       -        -    2  2  2  2        0       0
 3 -   0.1000W       -        -    3  3  3  3     1000    1000
 4 -   0.0050W       -        -    4  4  4  4   400000   90000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         1
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          50%
Percentage Used:                    0%
Data Units Read:                    194,187 [99.4 GB]
Data Units Written:                 2,251,423 [1.15 TB]
Host Read Commands:                 4,660,555
Host Write Commands:                5,156,882
Controller Busy Time:               58
Power Cycles:                       99
Power On Hours:                     2,391
Unsafe Shutdowns:                   45
Media and Data Integrity Errors:    0
Error Information Log Entries:      1
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 2:               45 Celsius

Error Information (NVMe Log 0x01, max 16 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          1     0  0x000c  0x021b      -   2147483776     0     -

Please Log in or Create an account to join the conversation.

More
14 Nov 2020 06:03 #189299 by JetForMe
I can't actually get memtest86+ to work. I'll have to install it on a thumb drive and try that.

Please Log in or Create an account to join the conversation.

More
14 Nov 2020 06:13 #189300 by JetForMe
So, I ran latency-test, opened up a couple YT videos, ran glxgears, and spent time just resizing one of the youtube windows. The machine froze. I rebooted, did the YouTube videos again without latencytest running, couldn't get it to freeze again. Started latency test, abused it some more, got some numbers, did a screenshot. Killed latency-test, ran latency-plot, and it spit out a similar message as LinuxCNC before (Unexpected realtime delay on task 0 with period 25000), and the screen froze again (I can still ssh in).

The latency-test showed:

Servo thread (1ms): Max Int 1503322, Max Jitter 503322
Base thread (25 µs): 381956, 356956

Not sure what to make of those numbers.

I think the weird window-server-freezing thing only ever happens if LinuxCNC stuff is running. But it's almost always running, so it's hard to be sure.

Please Log in or Create an account to join the conversation.

More
14 Nov 2020 09:52 #189310 by tommylight
Check the processor cooler, remove it and clean and add some thermal paste then reseat the cooler, check if the fan spins freely.
Had similar issues with Cinnamon version of Mint that would freeze for no reason, the PC's are all good and Mate never freezes. So might be something with the DE.
You should change the graphic card, Nvidia are not a good choice for LinuxCNC due to many power saving stuff that can not be disabled.

Please Log in or Create an account to join the conversation.

More
14 Nov 2020 21:56 #189388 by JetForMe
It seems unlikely to me it's anything like the processor or RAM failing, or the video card entering some kind of power saving, because it's always exactly the same thing: the display simply freezes. The machine otherwise continues to function perfectly. I can log in, running Gcode continues to run, etc.

I can have this happen while I'm dragging a window, so I doubt the graphics card is sleeping. I suppose it's possible the graphics card itself is bad and just stops updating the frame buffer. The machine has on-board graphics I can try.

Please Log in or Create an account to join the conversation.

More
14 Nov 2020 22:23 #189393 by tommylight
That actually sounds like the DE is freezing (desktop environment or KDE ).

Please Log in or Create an account to join the conversation.

More
14 Nov 2020 22:58 - 14 Nov 2020 22:59 #189400 by JetForMe
Agreed.

However: removing the Nvidia card may have solved all my problems. Not only can I not make it freeze (running linuxcnc-plot and playing a video seemed to do it pretty reliably), I get much better latency figures now:



So, thanks for suggesting that!
Attachments:
Last edit: 14 Nov 2020 22:59 by JetForMe.

Please Log in or Create an account to join the conversation.

More
14 Nov 2020 23:10 #189405 by tommylight
The following user(s) said Thank You: JetForMe

Please Log in or Create an account to join the conversation.

More
15 Nov 2020 03:16 #189424 by seuchato
Good to see you get over the problems.
You said you built the system by yourself. Nothing wrong with that, on the contrary. As Tommylight suggested: heating of CPU and thus probably subsequent throtting. I had a HP 8300 SFF doing well and a 8300 MT doing bad, same CPU, Ram etc. Then I did replace the thermal conductive on the MT. All good again.

To monitor that, you could run the script here .

Greez
chris
The following user(s) said Thank You: tommylight, JetForMe

Please Log in or Create an account to join the conversation.

Time to create page: 0.102 seconds
Powered by Kunena Forum