Extracting various useful information from a crash dump (crashinfo)

The crashinfo tool can be used to quickly fetch various details such as per-cpu tasks, errors and call traces in the kernel message buffer, reason for panic, status of various workqueues, in-flight I/Os, and more.

Options provided by ‘crashinfo’:

crash> crashinfo -h
Usage: crashinfo [options]

Options:
  -h, --help            show this help message and exit
  -v                    verbose output
  -q                    quiet mode - print warnings only
  --fast                Fast mode - do not run potentially slow tests
  --sysctl              Print sysctl info.
  --ext3                Print EXT3 info.
  --blkreq              Print Block I/O requests
  --blkdevs             Print Block Devices Info
  --filelock            Print filelock info.
  --stacksummary        Print stacks (bt) categorized summary.
  --findstacks=FINDSTACKS
                        Print stacks (bt) containing functions that match the provided pattern
  --checkstacks          Check stacks of all threads for corruption
  --decodesyscalls=DECODESYSCALLS
                        Decode Syscalls on the Stack
  --keventd_wq          Decode keventd_wq
  --kblockd_wq          Decode kblockd_workqueue
  --lws                 Print Locks, Wait queues, and Semaphores
  --devmapper           Print DeviceMapper Tables
  --runq                Print Run Queues
  --semaphore=SEMA      Print 'struct semaphore' info
  --rwsemaphore=RWSEMA  Print 'struct rw_semaphore' info
  --mutex=MUTEX         Print Mutex info
  --umem                Print Userspace Memory Usage
  --ls=LS               Emulate 'ls'. You can specify either dentry address or full pathname
  --workqueues          Print Workqueues - just for some kernels
  --radix_tree_element=root offset
                        Find and print a radix tree element
  --pci                 Print PCI Info
  --version             Print program version and exit

 ** Execution took   0.31s (real)   0.31s (CPU)
crash>

Note: While we have tried to explain the usage of each of the above options with example outputs, it's difficult to find a meaningful output for all of these options in a single vmcore dump file. Thus we have used multiple vmcores collected from different hang/panic scenarios to demonstrate the usage of these options.

Getting a summary of crash/panic

By default, when crashinfo is invoked without any options, it prints some vital statistics and runs a number of tests. As a result of running these tests, it can print WARNING messages. These messages are usually (but not always) followed by more details.

For example, below is the full output of the crashinfo command without any of the above options:

crash> crashinfo
                         +==========================+
                         | *** Crashinfo v1.3.7 *** |
                         +==========================+

+++WARNING+++ PARTIAL DUMP with size(vmcore) < 25% size(RAM)
      KERNEL: 3.10.0-693.17.1.el7.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 4
        DATE: Sat Jan 19 23:35:05 2019
      UPTIME: 00:19:32
LOAD AVERAGE: 20.56, 15.12, 7.54
       TASKS: 466
    NODENAME: system2.mpg.testlab.com
     RELEASE: 3.10.0-693.17.1.el7.x86_64
     VERSION: #1 SMP Sun Jan 14 10:36:03 EST 2018
     MACHINE: x86_64  (3492 Mhz)
      MEMORY: 7.8 GB
       PANIC: "SysRq : Trigger a crash"

                         +--------------------------+
>------------------------| Per-cpu Stacks ('bt -a') |------------------------<
                         +--------------------------+

      -- CPU#0 --
PID=0  CPU=0 CMD=swapper/0
  #0   crash_nmi_callback+0x31
  #1   nmi_handle+0x87
  #2   do_nmi+0x15d
  #3   end_repeat_nmi+0x1e
  #-1  intel_idle+0x0, 507 bytes of data
  #4   intel_idle+0xf4
  #5   cpuidle_enter_state+0x40
  #6   cpuidle_idle_call+0xd8
  #7   arch_cpu_idle+0xe
  #8   cpu_startup_entry+0x14a
  #9   rest_init+0x77
  #10  start_kernel+0x43e
  #11  x86_64_start_reservations+0x24
  #12  x86_64_start_kernel+0x14f

      -- CPU#1 --
PID=6868  CPU=1 CMD=bash
  #0   machine_kexec+0x1fb
  #1   __crash_kexec+0x72
  #2   crash_kexec+0x30
  #3   oops_end+0xa8
  #4   no_context+0x280
  #5   __bad_area_nosemaphore+0x73
  #6   bad_area+0x43
  #7   __do_page_fault+0x3dc
  #8   do_page_fault+0x35
  #9   page_fault+0x28
  #-1  sysrq_handle_crash+0x0, 477 bytes of data
  #10  __handle_sysrq+0x107
  #11  write_sysrq_trigger+0x2f
  #12  proc_reg_write+0x3d
  #13  vfs_write+0xbd
  #14  sys_write+0x7f
  #15  system_call_fastpath+0x16, 477 bytes of data

      -- CPU#2 --
PID=0  CPU=2 CMD=swapper/2
  #0   crash_nmi_callback+0x31
  #1   nmi_handle+0x87
  #2   do_nmi+0x15d
  #3   end_repeat_nmi+0x1e
  #-1  intel_idle+0x0, 507 bytes of data
  #4   intel_idle+0xf4
  #5   cpuidle_enter_state+0x40
  #6   cpuidle_idle_call+0xd8
  #7   arch_cpu_idle+0xe
  #8   cpu_startup_entry+0x14a
  #9   start_secondary+0x1d6

      -- CPU#3 --
PID=0  CPU=3 CMD=swapper/3
  #0   crash_nmi_callback+0x31
  #1   nmi_handle+0x87
  #2   do_nmi+0x15d
  #3   end_repeat_nmi+0x1e
  #-1  intel_idle+0x0, 507 bytes of data
  #4   intel_idle+0xf4
  #5   cpuidle_enter_state+0x40
  #6   cpuidle_idle_call+0xd8
  #7   arch_cpu_idle+0xe
  #8   cpu_startup_entry+0x14a
  #9   start_secondary+0x1d6

                      +--------------------------------+
>---------------------| How This Dump Has Been Created |---------------------<
                      +--------------------------------+

   *** Dump has been initiated: with sysrq ***
     programmatically (via sysrq-trigger)

                               +---------------+
>------------------------------| Tasks Summary |------------------------------<
                               +---------------+

Number of Threads That Ran Recently
-----------------------------------
   last second      27
   last     5s      59
   last    60s      71

 ----- Total Numbers of Threads per State ------
  TASK_INTERRUPTIBLE                         439
  TASK_RUNNING                                 3
  TASK_UNINTERRUPTIBLE                        21

+++WARNING+++ There are 8 threads running in their own namespaces
    Use 'taskinfo --ns' to get more details

                           +-----------------------+
>--------------------------| 5 Most Recent Threads |--------------------------<
                           +-----------------------+

  PID  CMD                Age    ARGS
-----  --------------   ------  ----------------------------
 6868 bash                0 ms  (no user stack)
 7119 kworker/3:0         0 ms  (no user stack)
 4069 kworker/1:1H        0 ms  (no user stack)
  874 kworker/2:1H        0 ms  (no user stack)
    5 kworker/0:0H        0 ms  (no user stack)

                          +------------------------+
>-------------------------| Memory Usage (kmem -i) |-------------------------<
                          +------------------------+

                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM  1947245       7.4 GB         ----
         FREE  1294938       4.9 GB   66% of TOTAL MEM
         USED   652307       2.5 GB   33% of TOTAL MEM
       SHARED   337257       1.3 GB   17% of TOTAL MEM
      BUFFERS    98331     384.1 MB    5% of TOTAL MEM
       CACHED   306536       1.2 GB   15% of TOTAL MEM
         SLAB   151205     590.6 MB    7% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP  1998847       7.6 GB         ----
    SWAP USED        0            0    0% of TOTAL SWAP
    SWAP FREE  1998847       7.6 GB  100% of TOTAL SWAP

 COMMIT LIMIT  2972469      11.3 GB         ----
    COMMITTED   372838       1.4 GB   12% of TOTAL LIMIT

 -- Request Queues Analysis:     Count=734, in_flight=540
 -- Requests from SLAB Analysis: Count=202, STARTED=120 WRITE=163
      -- Since started: newest    85.66s,  oldest   172.84s
+++WARNING+++ there are outstanding blk_dev requests
+++WARNING+++ 7 processes in UNINTERRUPTIBLE state are committing journal
+++three oldest UNINTERRUPTIBLE threads
   ... ran 168s ago

PID=11451  CPU=0 CMD=rm
  #0   __schedule+0x3dc
  #1   schedule+0x29
  #2   schedule_timeout+0x239
  #3   io_schedule_timeout+0xad
  #4   io_schedule+0x18
  #5   bit_wait_io+0x11
  #6   __wait_on_bit+0x65
  #7   out_of_line_wait_on_bit+0x81
  #8   do_get_write_access+0x285
  #9   jbd2_journal_get_write_access+0x27
  #10  __ext4_journal_get_write_access+0x3b
  #11  ext4_reserve_inode_write+0x70
  #12  ext4_mark_inode_dirty+0x53
  #13  ext4_evict_inode+0x1f3
  #14  evict+0xa9
  #15  iput+0xf9
  #16  do_unlinkat+0x1ae
  #17  sys_unlinkat+0x1b
  #18  system_call_fastpath+0x16, 477 bytes of data
   ... ran 168s ago

PID=11446  CPU=2 CMD=rm
  #0   __schedule+0x3dc
  #1   schedule+0x29
  #2   schedule_timeout+0x239
  #3   io_schedule_timeout+0xad
  #4   io_schedule+0x18
  #5   bit_wait_io+0x11
  #6   __wait_on_bit+0x65
  #7   wait_on_page_bit+0x81
  #8   truncate_inode_pages_range+0x3bb
  #9   truncate_inode_pages_final+0x4f
  #10  ext4_evict_inode+0x11c
  #11  evict+0xa9
  #12  iput+0xf9
  #13  do_unlinkat+0x1ae
  #14  sys_unlinkat+0x1b
  #15  system_call_fastpath+0x16, 477 bytes of data
   ... ran 172s ago

PID=11452  CPU=3 CMD=rm
  #0   __schedule+0x3dc
  #1   schedule+0x29
  #2   schedule_timeout+0x239
  #3   io_schedule_timeout+0xad
  #4   io_schedule+0x18
  #5   bit_wait_io+0x11
  #6   __wait_on_bit+0x65
  #7   wait_on_page_bit+0x81
  #8   truncate_inode_pages_range+0x3bb
  #9   truncate_inode_pages_final+0x4f
  #10  ext4_evict_inode+0x11c
  #11  evict+0xa9
  #12  iput+0xf9
  #13  do_unlinkat+0x1ae
  #14  sys_unlinkat+0x1b
  #15  system_call_fastpath+0x16, 477 bytes of data

                       +-------------------------------+
>----------------------| Scheduler Runqueues (per CPU) |----------------------<
                       +-------------------------------+

  ---+ CPU=0 <struct rq 0xffff88021ea18a00> ----
     | CURRENT TASK <struct task_struct 0xffffffff81a02480>, CMD=swapper/0
  ---+ CPU=1 <struct rq 0xffff88021ea98a00> ----
     | CURRENT TASK <struct task_struct 0xffff8800c5fc2f70>, CMD=bash
    7598 kworker/1:0 0.32910
  ---+ CPU=2 <struct rq 0xffff88021eb18a00> ----
     | CURRENT TASK <struct task_struct 0xffff88017ce7af70>, CMD=swapper/2
  ---+ CPU=3 <struct rq 0xffff88021eb98a00> ----
     | CURRENT TASK <struct task_struct 0xffff88017ce7bf40>, CMD=swapper/3

                          +------------------------+
>-------------------------| Network Status Summary |-------------------------<
                          +------------------------+

TCP Connection Info
-------------------
        ESTABLISHED      6
             LISTEN      9
                    NAGLE disabled (TCP_NODELAY):     6
                    user_data set (NFS etc.):         1

UDP Connection Info
-------------------
  9 UDP sockets, 0 in ESTABLISHED

Unix Connection Info
------------------------
        ESTABLISHED    244
              CLOSE     48
             LISTEN     59

Raw sockets info
--------------------
        ESTABLISHED      1

Interfaces Info
---------------
  How long ago (in seconds) interfaces transmitted/received?
      Name        RX          TX
      ----    ----------    ---------
          lo       872.4         1171.8
      enp3s0       872.4            0.0
      virbr0       872.4          872.4
    virbr0-nic       872.4         1126.1


RSS_TOTAL=520628 pages, %mem=    5.1
+++WARNING+++ Possible hang
+++WARNING+++    Run 'hanginfo' to get more details

                                +------------+
>-------------------------------| Mounted FS |-------------------------------<
                                +------------+

     MOUNT           SUPERBLK     TYPE   DEVNAME   DIRNAME
ffff88017cd38180 ffff88017cd48800 rootfs rootfs    /
ffff88021e5d6300 ffff8802127db000 sysfs  sysfs     /sys
ffff88021e5d6480 ffff88017cd4b000 proc   proc      /proc
ffff88021e5d6600 ffff88017cfb0800 devtmpfs devtmpfs /dev
ffff88021e5d6780 ffff88021276c800 securityfs securityfs /sys/kernel/security
ffff88021e5d6900 ffff8802127db800 tmpfs  tmpfs     /dev/shm
ffff88021e5d6a80 ffff88017cf21800 devpts devpts    /dev/pts
ffff88021e5d6c00 ffff8802127dc000 tmpfs  tmpfs     /run
ffff88021e5d6d80 ffff8802127dc800 tmpfs  tmpfs     /sys/fs/cgroup
ffff88021e5d6f00 ffff8802127dd000 cgroup cgroup    /sys/fs/cgroup/systemd
ffff88021e5d7080 ffff8802127dd800 pstore pstore    /sys/fs/pstore
ffff88021e5d7200 ffff8802127df800 cgroup cgroup    /sys/fs/cgroup/net_cls,net_prio
ffff88021e5d7380 ffff8802127df000 cgroup cgroup    /sys/fs/cgroup/devices
ffff88021e5d7500 ffff8802127de800 cgroup cgroup    /sys/fs/cgroup/pids
ffff88021e5d7680 ffff8802127de000 cgroup cgroup    /sys/fs/cgroup/perf_event
ffff88021e5d7800 ffff880211c98000 cgroup cgroup    /sys/fs/cgroup/blkio
ffff88021e5d7980 ffff880211c98800 cgroup cgroup    /sys/fs/cgroup/freezer
ffff88021e5d7b00 ffff880211c99000 cgroup cgroup    /sys/fs/cgroup/cpuset
ffff88021e5d7c80 ffff880211c99800 cgroup cgroup    /sys/fs/cgroup/cpu,cpuacct
ffff88021e5d7e00 ffff880211c9a000 cgroup cgroup    /sys/fs/cgroup/memory
ffff880211c54000 ffff880211c9a800 cgroup cgroup    /sys/fs/cgroup/hugetlb
ffff880211c54180 ffff88017cfb3000 configfs configfs /sys/kernel/config
ffff88020f79af00 ffff88020ef1c000 xfs    /dev/mapper/rhel00-root /
ffff88020f790180 ffff88020ef1c800 autofs systemd-1 /proc/sys/fs/binfmt_misc
ffff8800355f5c80 ffff88017cd4d800 debugfs debugfs  /sys/kernel/debug
ffff88020e39ca80 ffff88020f223800 hugetlbfs hugetlbfs /dev/hugepages
ffff8800355f4d80 ffff88017cf23000 mqueue mqueue    /dev/mqueue
ffff880211627380 ffff88021276e800 nfsd   nfsd      /proc/fs/nfsd
ffff8800cd016780 ffff8800ce869800 xfs    /dev/mapper/rhel00-home /home
ffff880207a04480 ffff8800cf74f000 xfs    /dev/sda3 /boot
ffff880034509800 ffff8800c6fd7000 rpc_pipefs sunrpc /var/lib/nfs/rpc_pipefs
ffff8801fe774480 ffff8800c5e96800 tmpfs  tmpfs     /run/user/42
ffff8801fbac0f00 ffff8800c2825800 tmpfs  tmpfs     /run/user/0
ffff8800c644be00 ffff8800c5fb1800 ext4   /dev/mapper/prodvg1-lvdata0 /prd_data0
ffff8800c644a780 ffff8800c5fb0800 ext4   /dev/mapper/prodvg1-lvdata1 /prd_data1
ffff8801fbac1500 ffff880034e97800 ext4   /dev/mapper/prodvg1-lvdata2 /prd_data2
ffff8801f969b500 ffff8800c2824000 ext4   /dev/mapper/prodvg1-lvdata3 /prd_data3
ffff8800c644a300 ffff8800c5fb4800 ext4   /dev/mapper/prodvg1-lvdata4 /prd_data4
ffff8800c644bb00 ffff8800c5fb6800 ext4   /dev/mapper/prodvg2-prdbkplv0 /prd_bkp0
ffff8800c644b380 ffff8800cf28a000 ext4   /dev/mapper/prodvg2-prdbkplv1 /prd_bkp1
ffff8802002e8a80 ffff8800cf48c000 ext4   /dev/mapper/appvg-oraapps_vol /apps
ffff8800c2bb3c80 ffff8800c5fb6000 nfs4   172.25.0.11:/nfs_share1 /NFS-share1
ffff8801f969be00 ffff8800c5fb6000 nfs4   172.25.0.11:/nfs_share2 /NFS-share2
ffff88020ca45c80 ffff8800c5fb6000 nfs4   172.25.0.11:/nfs_share3 /NFS-share3

                       +-------------------------------+
>----------------------| Last 40 lines of dmesg buffer |----------------------<
                       +-------------------------------+

[ 1139.788068] qla2xxx [0000:01:00.0]-801c:0: Abort command issued nexus=0:0:2 --  1 2002.
[ 1140.789026] qla2xxx [0000:01:00.0]-801c:0: Abort command issued nexus=0:0:2 --  1 2002.
[ 1147.712623] qla2xxx [0000:01:00.1]-801c:7: Abort command issued nexus=7:0:1 --  1 2002.
[ 1150.788760] qla2xxx [0000:01:00.0]-801c:0: Abort command issued nexus=0:0:10 --  1 2002.
[ 1158.713019] qla2xxx [0000:01:00.1]-801c:7: Abort command issued nexus=7:0:10 --  1 2002.
[ 1161.788834] qla2xxx [0000:01:00.0]-801c:0: Abort command issued nexus=0:0:16 --  1 2002.
[ 1169.713389] qla2xxx [0000:01:00.1]-801c:7: Abort command issued nexus=7:0:11 --  1 2002.
[ 1172.313862] SysRq : Trigger a crash
[ 1172.313886] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1172.313923] IP: [<ffffffff81400816>] sysrq_handle_crash+0x16/0x20
[ 1172.313942] PGD 80000000c2ac1067 PUD c2b49067 PMD 0
[ 1172.313958] Oops: 0002 [#1] SMP
[ 1172.313968] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ext4 mbcache jbd2 xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ppdev mei_wdt intel_powerclamp coretemp intel_rapl kvm_intel kvm irqbypass dm_service_time crc32_pclmul ghash_clmulni_intel aesni_intel lrw pcspkr gf128mul glue_helper ablk_helper sg joydev snd_soc_rt5640 snd_soc_ssm4567 snd_soc_rl6231
[ 1172.314228]  cryptd snd_hda_codec_hdmi snd_soc_core snd_compress regmap_i2c parport_pc parport snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm i2c_designware_platform i2c_designware_core acpi_pad mei_me mei tpm_infineon snd_soc_sst_acpi snd_soc_sst_match snd_timer shpchp snd soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath ip_tables xfs libcrc32c sr_mod sd_mod cdrom i915 lpfc ahci libahci i2c_algo_bit drm_kms_helper qla2xxx crc32c_intel crc_t10dif libata serio_raw syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_generic drm crct10dif_pclmul crct10dif_common r8169 scsi_transport_fc mii scsi_tgt sdhci_acpi iosf_mbi sdhci mmc_core video i2c_hid i2c_core dm_mirror dm_region_hash dm_log dm_mod
[ 1172.314474] CPU: 1 PID: 6868 Comm: bash Not tainted 3.10.0-693.17.1.el7.x86_64 #1
[ 1172.314492] Hardware name: Gigabyte Technology Co., Ltd. H97M-D3H/H97M-D3H, BIOS F6 04/21/2015
[ 1172.314512] task: ffff8800c5fc2f70 ti: ffff8800c5db4000 task.ti: ffff8800c5db4000
[ 1172.314532] RIP: 0010:[<ffffffff81400816>]  [<ffffffff81400816>] sysrq_handle_crash+0x16/0x20
[ 1172.314568] RSP: 0018:ffff8800c5db7e58  EFLAGS: 00010246
[ 1172.314581] RAX: 000000000000000f RBX: ffffffff81ac1140 RCX: 0000000000000000
[ 1172.314598] RDX: 0000000000000000 RSI: ffff88021ea938b8 RDI: 0000000000000063
[ 1172.314614] RBP: ffff8800c5db7e58 R08: ffffffff81d98dfc R09: ffffffff81ddb1cb
[ 1172.314631] R10: 0000000000000ac4 R11: 0000000000000ac3 R12: 0000000000000063
[ 1172.314647] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000
[ 1172.314664] FS:  00007faf2543a740(0000) GS:ffff88021ea80000(0000) knlGS:0000000000000000
[ 1172.314683] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1172.314705] CR2: 0000000000000000 CR3: 00000000c2b9e000 CR4: 00000000001607e0
[ 1172.314732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1172.314762] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1172.314793] Call Trace:
[ 1172.314808]  [<ffffffff81401037>] __handle_sysrq+0x107/0x170
[ 1172.314835]  [<ffffffff814014af>] write_sysrq_trigger+0x2f/0x40
[ 1172.314860]  [<ffffffff8127257d>] proc_reg_write+0x3d/0x80
[ 1172.314886]  [<ffffffff81202ced>] vfs_write+0xbd/0x1e0
[ 1172.314909]  [<ffffffff81203aff>] SyS_write+0x7f/0xe0
[ 1172.314929]  [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
[ 1172.314943] Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 0f 1f 44 00 00 55 48 89 e5 c7 05 41 e4 62 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 be
[ 1172.315057] RIP  [<ffffffff81400816>] sysrq_handle_crash+0x16/0x20
[ 1172.315074]  RSP <ffff8800c5db7e58>
[ 1172.315082] CR2: 0000000000000000

******************************************************************************
************************ A Summary Of Problems Found *************************
******************************************************************************
-------------------- A list of all +++WARNING+++ messages --------------------
    PARTIAL DUMP with size(vmcore) < 25% size(RAM)
    There are 8 threads running in their own namespaces
    Use 'taskinfo --ns' to get more details
    there are outstanding blk_dev requests
    7 processes in UNINTERRUPTIBLE state are committing journal
    Possible hang
       Run 'hanginfo' to get more details
------------------------------------------------------------------------------

 ** Execution took  12.72s (real)  10.41s (CPU), Child processes:   1.77s
crash>

You can search for the +++WARNING+++ pattern to be sure you did not miss any warnings.

Some tests rely on built-in crash commands that can potentially run for a long time. For example, to check for memory fragmentation we do 'kmem -f'. On some vmcores this can take more than 20 minutes to complete even on a fast host. As in many cases we would like to obtain at least partial results fast, there is a timeout mechanism for such commands, which defaults to 2 minutes. If we time out, a warning message is printed:

+++WARNING+++ <foreach bt> failed to complete within the timeout period
of 120s

You can increase the timeout by using the option '--timeout=seconds'. If you are in a hurry, you can use the '--fast' option. This sets the timeout to 15 seconds and disables some potentially slow tests.

The tests check many conditions that are important when a system seems to be hung, such as load average, memory fragmentation, outstanding block requests, spinlocks held, activity on networking cards, and so on. Some tests are heuristic; see the below advanced options for more details. New tests are added all the time based on practical experience.

You can decrease verbosity by using the '-q' (quiet) option, in which case only warnings are printed. Using the '-v' (verbose) option makes crashinfo print additional information.

Check stacks of all threads for corruption (--checkstacks)

<WIP>

Decode Syscalls on the Stack (--decodesyscalls)

The '--decodesyscalls' option lets you decode the arguments passed to system calls. As these arguments originate from userspace, it is not always possible to decode them (as some pages might not be present in the vmcore). You can specify either PID or system call name. For example:

crash> bt
PID: 3060   TASK: ffff88200f118040  CPU: 33  COMMAND: "rsyslogd"
 #0 [ffff881fe4a837b8] schedule at ffffffff81558d6a
 #1 [ffff881fe4a838a0] schedule_hrtimeout_range at ffffffff8155b3a8
 #2 [ffff881fe4a83940] poll_schedule_timeout at ffffffff811b9329
 #3 [ffff881fe4a83960] do_select at ffffffff811b9c45
 #4 [ffff881fe4a83d40] core_sys_select at ffffffff811ba87a
 #5 [ffff881fe4a83ee0] sys_select at ffffffff811bac07
 #6 [ffff881fe4a83f50] system_call_fastpath at ffffffff81564357
    RIP: 00007f5926abb633  RSP: 00007ffe36f7b3c0  RFLAGS: 00010217
    RAX: 0000000000000017  RBX: 000055775aebef54  RCX: 00007f5926abb633
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000001
    RBP: 0000000000015180   R8: 00007ffe36f7b3c0   R9: 0000000000000bf4
    R10: 0000000000000000  R11: 0000000000000293  R12: 000000000000001e
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000001
    ORIG_RAX: 0000000000000017  CS: 0033  SS: 002b

crash> crashinfo --decodesyscalls 3060

PID=3060  CPU=33 CMD=rsyslogd
  #0   schedule+0x45a
  #1   schedule_hrtimeout_range+0xc8
  #2   poll_schedule_timeout+0x39
  #3   do_select+0x5d5
  #4   core_sys_select+0x18a
  #5   sys_select+0x47
  #6   system_call_fastpath+0x35, 477 bytes of data
    ....... Decoding Syscall Args .......
    sys_select (1
    (fd_set *) 0x0
    (fd_set *) 0x0
    (fd_set *) 0x0
    (struct timeval *) 0x7ffe36f7b3c0)
   nfds=1
   timeout=86400 s, 0 usec

 ** Execution took   0.05s (real)   0.05s (CPU)
crash>

Decode keventd_wq (--keventd_wq)

To view the worklist items pending in the work queue of each cpu, use '--keventd_wq':

crash> crashinfo --keventd_wq
 ----- CPU  0 <struct cpu_workqueue_struct 0xffff88009e21ba40>
        worklist:
 ----- CPU  1 <struct cpu_workqueue_struct 0xffff88009e25ba40>
        worklist:
            <struct work_struct 0xffff88009e257a40> func=<cache_reap>
            <struct work_struct 0xffff88009e257960> func=<vmstat_update>
 ----- CPU  2 <struct cpu_workqueue_struct 0xffff88009e29ba40>
        worklist:
            <struct work_struct 0xffff88009e297a40> func=<cache_reap>
            <struct work_struct 0xffff88009e297960> func=<vmstat_update>
 ----- CPU  3 <struct cpu_workqueue_struct 0xffff88009e2dba40>
        worklist:
            <struct work_struct 0xffff887103510108> func=<flush_to_ldisc>
            <struct work_struct 0xffff88009e2d7960> func=<vmstat_update>
            <struct work_struct 0xffff88009e2d7a40> func=<cache_reap>
 ----- CPU  4 <struct cpu_workqueue_struct 0xffff88009e31ba40>
        worklist:
 ----- CPU  5 <struct cpu_workqueue_struct 0xffff88009e35ba40>
        worklist:
[...]

Decode kblockd_workqueue (--kblockd_wq)

To view the worklist items pending in the kblockd_workqueue of each cpu, use '--kblockd_wq':

crash> crashinfo --kblockd_wq
 ----- CPU  0 <struct cpu_workqueue_struct 0xffff88009e21c200>
        worklist:
 ----- CPU  1 <struct cpu_workqueue_struct 0xffff88009e25c200>
        worklist:
 ----- CPU  2 <struct cpu_workqueue_struct 0xffff88009e29c200>
        worklist:
 ----- CPU  3 <struct cpu_workqueue_struct 0xffff88009e2dc200>
        worklist:
 ----- CPU  4 <struct cpu_workqueue_struct 0xffff88009e31c200>
        worklist:
 ----- CPU  5 <struct cpu_workqueue_struct 0xffff88009e35c200>
        worklist:
 ----- CPU  6 <struct cpu_workqueue_struct 0xffff88009e39c200>
        worklist:
            <struct work_struct 0xffff8820049cbba8> func=<cfq_kick_queue>
            <struct work_struct 0xffff8820098a5ba8> func=<cfq_kick_queue>
            <struct work_struct 0xffff8820070f73a8> func=<cfq_kick_queue>
            <struct work_struct 0xffff882009c0cba8> func=<cfq_kick_queue>
 ----- CPU  7 <struct cpu_workqueue_struct 0xffff88009e3dc200>
        worklist:
 ----- CPU  8 <struct cpu_workqueue_struct 0xffff88009e41c200>
        worklist:
 ----- CPU  9 <struct cpu_workqueue_struct 0xffff88009e45c200>
        worklist:
 ----- CPU  10 <struct cpu_workqueue_struct 0xffff8820b0c1c200>
        worklist:
            <struct work_struct 0xffff880d91fa6ba8> func=<cfq_kick_queue>
            <struct work_struct 0xffff88585d29dba8> func=<cfq_kick_queue>
[...]

Decode semaphores (--semaphore, --rwsemaphore)

The semaphores shown in '--lws' output or the ones retrieved through manual analysis can be decoded further using these options:

crash> crashinfo --rwsemaphore=0xffffffff81ad2be0
<struct rw_semaphore 0xffffffff81ad2be0>
    Write owner of this rw_semaphore: pid=0 cmd=

 ** Execution took   0.02s (real)   0.02s (CPU)
crash> crashinfo --rwsemaphore=0xffffffff81ad3c80
<struct rw_semaphore 0xffffffff81ad3c80>
   Reader is the owner

 ** Execution took   0.03s (real)   0.03s (CPU)
crash> crashinfo --rwsemaphore=0xffffffff81ad9560
<struct rw_semaphore 0xffffffff81ad9560>
   Reader is the owner

 ** Execution took   0.01s (real)   0.01s (CPU)
crash> crashinfo --rwsemaphore=0xffffffff81aded00
<struct rw_semaphore 0xffffffff81aded00>
   Reader is the owner

 ** Execution took   0.03s (real)   0.03s (CPU)
crash> crashinfo --rwsemaphore=0xffffffffc0470120
<struct rw_semaphore 0xffffffffc0470120>
   Reader is the owner

 ** Execution took   0.04s (real)   0.04s (CPU)
 crash>
 crash> crashinfo --rwsemaphore=0xffff88256f8d80b0
 <struct rw_semaphore 0xffff88256f8d80b0>
     Owner of this rw_semaphore: pid=13318 cmd=kworker/5:5
 crash>

Emulate 'ls' (--ls)

This option is useful to get a long listing of files, similar to the 'ls -l' output on shell prompt.

For example, the below process has various files open in its namespace. The '--ls' option can be used to get more details of these files:

crash> files
PID: 3060   TASK: ffff88200f118040  CPU: 33  COMMAND: "rsyslogd"
ROOT: /    CWD: /
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff884011fd72c0 ffff883ffe854240 ffff884005a4b6c8 SOCK
  1 ffff886b0b2b1ec0 ffff884109bb4740 ffff885a5ed87100 REG  /var/log/messages
  2 ffff886ad3f355c0 ffff885ff9ecbbc0 ffff884109bf1100 REG  /var/log/secure
  3 ffff88400b9e6a80 ffff883ffe854300 ffff88401380ad48 REG  /proc/kmsg
  4 ffff886610ae4b80 ffff8866b017d8c0 ffff8840033ea4e8 REG  /var/log/cron
  5 ffff885ffc3516c0 ffff886005add500 ffff88600ee1c148 SOCK
  6 ffff884b9203cec0 ffff884c7c4e7080 ffff884a8906c8d0 REG  /var/log/errlog
  7 ffff885ef34a4e80 ffff8840139a93c0 ffff88601154bd48 CHR  /dev/ttyS1
  8 ffff886eb1f12cc0 ffff8880043e2500 ffff8861ad09d100 REG  /var/log/maillog
crash>

crash> crashinfo --ls=/var/log/secure
 -rw-------     0     0   15855596 2020-07-25 07:05 /var/log/secure

 ** Execution took   0.01s (real)   0.02s (CPU)
crash> crashinfo --ls=ffff885ff9ecbbc0
 -rw-------     0     0   15855596 2020-07-25 07:05 secure

 ** Execution took   0.01s (real)   0.02s (CPU)
crash>

Find and print a radix tree element (--radix_tree_element)

When accessing a large file, the page cache may become filled with so many of the file’s pages that sequentially scanning all of them would be too time consuming. In order to perform page cache lookup efficiently, the Linux kernel makes use of a large set of search trees, one for each address_space object.

The page_tree field of an address_space object is the root of a radix tree, which contains pointers to the descriptors of the owner’s pages. Given a page index denoting the position of the page inside the owner’s disk image, the kernel can perform a very fast lookup operation in order to determine whether the required page is already included in the page cache. When looking up the page, the kernel interprets the index as a path inside the radix tree and quickly reaches the position where the page descriptor is—or should be—stored.

We can use the '--radix_tree_element' option to view the elements at various offsets in such radix trees.

For example, in the following output we first get the pointer for 'radix_tree_root' struct used for file '/var/log/messages' and then check the elements at different offsets in the radix tree:

PID: 5723   TASK: ffff880208a94f10  CPU: 0   COMMAND: "in:imjournal"
ROOT: /    CWD: /
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff8802116cdb00 ffff88017f827240 ffff88021eaa4850 CHR  /dev/null
  1 ffff880209fbc900 ffff88017f827240 ffff88021eaa4850 CHR  /dev/null
  2 ffff880209fbc900 ffff88017f827240 ffff88021eaa4850 CHR  /dev/null
  3 ffff8800345fab00 ffff880213e3fe40 ffff88020c33f230 SOCK UNIX
  4 ffff880034d93600 ffff880216047840 ffff880034a8f990 REG  /var/log/messages
  5 ffff880207f5fc00 ffff880035843780 ffff880035d3c850 REG  /run/log/journal/cfed3488b827492e8ed80518fe12223a/system.journal
  6 ffff8800cd0d5400 ffff8802160c1180 ffff880034a8ea90 REG  /var/log/secure
  7 ffff88020be44300 ffff8800c8db30c0 ffff88017fbee060 UNKN inotify

crash> p ((struct file *) 0xffff880034d93600)->f_mapping
$11 = (struct address_space *) 0xffff880034a8fae0

crash> p &((struct address_space *) 0xffff880034a8fae0)->page_tree
$13 = (struct radix_tree_root *) 0xffff880034a8fae8

The first argument to '--radix_tree_element' option is a pointer to the 'radix_tree_root' struct and the second argument is the numeric offset:

crash> crashinfo --radix_tree_element=0xffff880034a8fae8 0
  node=0xffffea0008253980, slot=0xffffea0008253980

 ** Execution took   0.10s (real)   0.03s (CPU)
crash>
crash> crashinfo --radix_tree_element=0xffff880034a8fae8 1
  node=0xffffea0008253a40, slot=0xffffea0008253a40

 ** Execution took   0.04s (real)   0.05s (CPU)
crash>
crash> crashinfo --radix_tree_element=0xffff880034a8fae8 2
  node=0xffffea0008253a00, slot=0xffffea0008253a00

 ** Execution took   0.04s (real)   0.04s (CPU)
crash>
crash> crashinfo --radix_tree_element=0xffff880034a8fae8 3
  node=0xffffea0008251d40, slot=0xffffea0008251d40

 ** Execution took   0.03s (real)   0.03s (CPU)
crash>
crash> crashinfo --radix_tree_element=0xffff880034a8fae8 4
  node=0xffffea0008251d00, slot=0xffffea0008251d00

 ** Execution took   0.02s (real)   0.02s (CPU)
crash>
crash> crashinfo --radix_tree_element=0xffff880034a8fae8 5
  node=0xffffea000825efc0, slot=0xffffea000825efc0

 ** Execution took   0.04s (real)   0.03s (CPU)
crash>
crash> crashinfo --radix_tree_element=0xffff880034a8fae8 6
  node=0xffffea0008253bc0, slot=0xffffea0008253bc0
crash>