Custom laptop with Ubuntu crashes unexpectedly

rbarriuso

Member
Hi,

I got a custom-configured laptop from PCSpecialist a few months ago and installed Ubuntu 18. It works quite well but for some reason it hangs from time to time (not often but around 3 times a week). It just freezes without any reason (I haven't found a pattern to reproduce the issue) and gives no chance to investigate about the crash (e.g. no info on "/var/crash").

This is my configuration:
- Chasis: Lafité aluminum 15,6" Mate Full HD IPS LED (1920x1080)
- CPU: Intel® Core™ i7 8565U (1,80 GHz, 4,6 GHz Turbo)
- RAM: 16 GB Corsair 2400 MHz SODIMM DDR4 (1 x 16 GB)
- Graphics: INTEL® HD GRAPHICS (depends on CPU) - 1,7 GB RAM video DDR4 max. -
- Main drive: M.2 SSD - 1 TB ADATA SX6000 Pro PCIe M.2 2280 (2100 MB/R, 1500 MB/W)
- Wireless: GIGABIT LAN & WIRELESS INTEL® AC-9260 M.2 (1.73 Gbps, 802.11AC) + BT 5.0
- Ports: 1 x USB 3.1 (type C) + 2 x USB 3.0 + 1 x USB 2.0
- Keyboard: Spanish with backlight.
- OS: Ubuntu 18.04.3 LTS

Please find attached extended hardware information (hwinfo, lshw, lspci, lsusb)

Has anybody experience similar issues?

Thanks in advance.

Regards,
Rafael.
 

Attachments

  • hwdata.zip
    78.2 KB · Views: 545

ubuysa

The BSOD Doctor
I don't speak Linux (well, not very well) but hangs and freezes on any OS are much more likely to be driver issues than hardware problems.
 

astiak

Member
Not promising results here, but it may be worth exploring these options to see whether we can get to the bottom of it.

What desktop environment or window manager are you using?

Is it freezing indefinitely or briefly? When it does freeze, have you tried switching tty?

Any errors in your Xorg log?

Can you post the output of your log files?

journalctl -S "10 days ago" -p 4

The above command will list everything from the last 10 days that are log level warning or lower. You can of course change 10 to a different number if you remember when it last occurred, or even add a -U "3 days ago", if you know it happened between 3 and 10 days ago for example.

There are also other hardware checks you can do, I noticed you're using an NVME SSD, take a look into Smartmontools. Are you using anything to monitor CPU temps?

EDIT: Formatting of tags
 
Last edited:

rbarriuso

Member
Thanks @astiak for your response. It had been quite some days without crashes but today I had a bad one. See the attached journalctl log and screen shots.
In addition let me answer your questions:

Q) What desktop environment or window manager are you using?
A) I'm using Gnome the default Ubuntu 18 desktop.

Q) Is it freezing indefinitely or briefly? When it does freeze, have you tried switching tty?
A) It freezing completely, even music stays in a fraction-of-second loop and there's nothing you can do except forcing power off by pressing the power button.

Q) Can you post the output of your log files?
A) Attached. Please notice the lines at sep 02 17:34:07 and sep 02 17:46:29. Between that times the computer crashed with the attached screen shot messages. After rebooting (Ctrl+Alt+Supr) around 5 times, Ubuntu finally started and cleaned some orphaned inodes in the SSD.

Q) There are also other hardware checks you can do, I noticed you're using an NVME SSD, take a look into Smartmontools. Are you using anything to monitor CPU temps?
A) Thanks, I'll try smartmontools. No, I'm not using CPU temperature monitor.

Any further help will be appreciated. I'm upgrading the Ubuntu packages (including the kernel) quite often but the crash is still happening.
 

Attachments

  • journalctl-02-09-2019.zip
    62.1 KB · Views: 510
  • IMG_20190902_174342073.jpg
    IMG_20190902_174342073.jpg
    707.2 KB · Views: 454
  • IMG_20190902_174136646.jpg
    IMG_20190902_174136646.jpg
    693.3 KB · Views: 475

rbarriuso

Member

astiak

Member
Hi, apologies for the late reply.

So, I am no expert! I couldn't see anything out of ordinary in the log file, even prior to the reboot, everything is fairly normal. Most of those warnings/errors are negligible and as far as I can see shouldn't be causing system freezes. This then leads to potential hardware issues, but it may also be worth exploring on a linux forum where people with more knowledge can have a look for potential software issues at (i.e. r/linux or linuxquestions).

I'd definitely recommend checking S.M.A.R.T for errors. Use the following resources:
https://ownyourbits.com/2018/11/10/monitor-your-hard-drive-health-with-smart-daemon/
https://wiki.archlinux.org/index.php/S.M.A.R.T.
Would also be nice to see the output of this.

Furthermore, when you do get freezes and if you are not already, I would recommend using Magic SysRq to prevent corrupting your drive. If you are unable to switch TTY by pressing Ctrl + Alt + F2 then perform the following actions, giving a few seconds between each action:
1. Press Alt+SysRq+R - Regain keyboard control from X
2. Try switching TTY again
3. Press Alt+SysRq+E - Send SIGTERM to all proceses, except init
4. Press Alt+SysRq+I - Send SIGKILL to all processes, except init
5. Press Alt+SysRq+S - Attempt to sync all mounted filesystems
6. Press Alt+SysRq+U - Attempt to remount all mounted filesystems read-only
7. Press Alt+SysRq+B - Immediately reboot the system without syncing or unmounting your disks

If you do manage to get into the TTY at step 2, it may be a good point to look at logs or even see if there are any rogue processes using all your RAM/CPU. Stephen may be right, in that if there is a piece of software suffering from a memory leak it may be causing your system to freeze up.

As I am not so familiar with Ubuntu, you will need to check whether magic SysRq is enabled. At the very least reboot using SysRq+S+U+B. On my laptop keyboard there is no visible SysRq key. On most full-size keyboards it's the Print Screen key. In my case, I have to do Fn + Alt + HOME, release Fn, then press the sequence of letters REISUB (can be remembered as BUSIER backwards).
 

rbarriuso

Member
@astiak Thanks for your response. I've been monitoring the issue these days and this morning it happened again. I tried Alt+SysRq+REISUB while it was frozen but the computer didn't respond at all (it reboots if I do it on a "normal" state). What does this mean?

Regarding the SMART state of the HDD, this is the output I get:

Code:
sudo smartctl -a /dev/nvme0
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-5.0.0-27-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       ADATA SX6000PNP
Serial Number:                      2J1520105184
Firmware Version:                   V9001b31
PCI Vendor/Subsystem ID:            0x10ec
IEEE OUI Identifier:                0x00e04c
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1.024.209.543.168 [1,02 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Mon Sep  9 11:15:17 2019 CEST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0014):     DS_Mngmt Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     118 Celsius
Critical Comp. Temp. Threshold:     150 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    50.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        34 Celsius
Available Spare:                    100%
Available Spare Threshold:          32%
Percentage Used:                    0%
Data Units Read:                    11.286.610 [5,77 TB]
Data Units Written:                 14.486.514 [7,41 TB]
Host Read Commands:                 87.958.410
Host Write Commands:                86.189.730
Controller Busy Time:               0
Power Cycles:                       192
Power On Hours:                     480
Unsafe Shutdowns:                   21
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, max 8 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          1     0  0x0000  0x0000  0x000            0     0     -
  6 1219368206019409729     0  0x0000  0x0000  0x000            0     0     -

But I don't see any problem.

Do you have any more ideas on how to identify the source of the issue?
Thanks again
 

rbarriuso

Member
I've changed to the latest Ubuntu 18 GA kernel (4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux) to see if the freeze still happens.
 

astiak

Member
SMART output appears fine, so you can rule out failure of your SSD.

With regards to magic SysRq, you may need to look into the Ubuntu docs to see how to enable it. Some distros by default disable most functions of SysRq as it can be used to kill lock screens and therefore gain access to the system.

It could well be a software issue, will be interesting to see if the updated kernel version helps. I would personally opt for a 5.x kernel version simply because you can expect better hardware compatibility. From some brief googling, it appears to be the HWE (Hardware enablement) kernel. The GA is intended for servers where you don't want to be rebooting each time the kernel is updated.
 

astiak

Member
Glad to hear it the change of kernel has helped.

Just a shame the crash was so sporadic. It would be interesting to see if the next iteration of the 5.x kernel for your distribution has the same issue. Happy that its solved for now.
 

quicky

Member
I've changed to the latest Ubuntu 18 GA kernel (4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux) to see if the freeze still happens.

Hi @rbarriuso , I experiment the same issue on My lafite III and Ubuntu 18.04.3 LTS
I just downgrade my kernel from 5.x to 4.15.0-70-generic but when restarting the laptop I do not have Wifi anymore.
Was it the same for you ? If yes what actions did you take to enable it ?

Thanks by advance
 

rbarriuso

Member
Hi @quicky ,

In my case I had no issues with the WiFi interface. I'm using /lib/modules/4.15.0-70-generic/kernel/drivers/net/wireless/intel/iwlwifi/iwlwifi.ko and works kell.

Probably you should check your WiFi interface hardware and find the appropriate kernel module (driver).

Regards,
Rafa.
 

quicky

Member
Strange, I have the same Wifi chipset...
In directory /lib/modules/4.15.0-70-generic/kernel/drivers/net/ I don't have any wireless directory
To switch to this kernel I did the following:
sudo apt-get install linux-headers-4.15.0-70-generic linux-headers-4.15.0-70 linux-image-4.15.0-70-generic
and then I reboot in advanced mode to choose this kernel
How did you proceed ?
 
Top