Analyzing process/task details from vmcore (taskinfo)

The taskinfo program available in the PyKdump framework can be used to quickly analyze process/task details. It allows users to list processes by various options (memory usage, last execution time, or start time), run specific crash commands on processes in Running or Uninterruptible state, retrieve process namespace details, etc.

Options provided by 'taskinfo':

crash> taskinfo -h
Usage: taskinfo [options]

  -h, --help            show this help message and exit
  -v                    verbose output
  --summary             Summary
  --times               Task start times
  --lastsched           Last scheduled on each CPU
  --hang                Equivalent to '-r --task=UN' and prints just several newest and several oldest threads
  --maxpids=MAXPIDS     Maximum number of PIDs to print for --hang and --mem
  --pidinfo=PIDINFO     Display details for a given PID. You can specify PID or addr of task_struct
                        A list of 2-letter task states to print, e.g. UN
  --pstree              Emulate user-space 'pstree' output
  -r, --recent          Reverse order while sorting by ran_ago
  --cmd=CMD             For each listed task, display output of specified command, e.g. '--cmd files'
  --memory              Print a summary of memory usage by tasks
  --ns                  Print info about namespaces
  --version             Print program version and exit

 ** Execution took   0.03s (real)   0.03s (CPU)

Summary (--summary)

The '--summary' option prints a brief summary about recently executed tasks, the number of processes in different states, and threads running in their own namespaces:

crash> taskinfo --summary
Number of Threads That Ran Recently
   last second      27
   last     5s      59
   last    60s      71

 ----- Total Numbers of Threads per State ------
  TASK_INTERRUPTIBLE                         439
  TASK_RUNNING                                 3
  TASK_UNINTERRUPTIBLE                        21

+++WARNING+++ There are 8 threads running in their own namespaces
    Use 'taskinfo --ns' to get more details

************************ A Summary Of Problems Found *************************
-------------------- A list of all +++WARNING+++ messages --------------------
    There are 8 threads running in their own namespaces
    Use 'taskinfo --ns' to get more details

 ** Execution took   0.04s (real)   0.03s (CPU)

Show task start times (--times)

To show the start time of each task, use the '--times' option. This will show the date and time stamp using the current TZ setting. Tasks are displayed in oldest to newest order unless '-r' is also used:

crash> taskinfo --times | head -20
=== Tasks in order of start time ===

 PID          CMD         Start Time
-------    ------------  -------------------
      0       swapper/0  2022-07-30 12:06:35               UID=0
      1         systemd  2022-07-30 12:06:35               UID=0
      2        kthreadd  2022-07-30 12:06:35               UID=0
      3          rcu_gp  2022-07-30 12:06:35               UID=0
      4      rcu_par_gp  2022-07-30 12:06:35               UID=0
      6    kworker/0:0H  2022-07-30 12:06:35               UID=0
     10    mm_percpu_wq  2022-07-30 12:06:35               UID=0
     11     ksoftirqd/0  2022-07-30 12:06:35               UID=0
     12       rcu_sched  2022-07-30 12:06:35               UID=0

Show the last scheduled task on each CPU (--lastsched)

To show the last scheduled task on each CPU, use the '--lastsched' option. This will show the most recently scheduled tasks first unless '-r' is also used:

crash> taskinfo --lastsched
 PID          CMD       CPU   Ran ms ago   STATE
--------   ------------  --  ------------- -----
  25046 FlushResourcesH  12              0  RU  (tgid=24749) UID=26061
   2826      rte-sensor   2              0  IN   (tgid=2803) UID=0
> 10741       perfalarm  29              0  RU               UID=0
  26974      PoolThread   3              0  IN  (tgid=24749) UID=26061
> 24806 FlushResourcesT  17             13  RU  (tgid=24749) UID=26061
> 25901      PoolThread  31          59752  RU  (tgid=24408) UID=26061
>     1         systemd  19          59762  RU               UID=0
>    15     rcu_preempt  11         180020  RU               UID=0
>  1863        vmtoolsd  25         180257  RU               UID=0

Maximum number of PIDs to print for --hang and --mem (--maxpids)

The '--hang' option by default prints details for every process, but users can restrict the output to a specific number of processes by using the '--maxpids' option.

For example, using '--maxpids=5' will print the details of the first 5 and last 5 processes sorted by last execution time:

crash> taskinfo --hang --maxpids=5
=== Tasks in reverse order, scheduled recently first (11 tasks skipped) ===
 PID          CMD       CPU   Ran ms ago   STATE
--------   ------------  --  ------------- -----
  11643             dd   0              1  UN               UID=0
  11644             dd   0             10  UN               UID=0
  11646             dd   1             28  UN               UID=0
    308      scsi_eh_0   2            523  UN               UID=0
    355      scsi_eh_7   2           2598  UN               UID=0
  11428             rm   0         168084  UN               UID=0
  11450             rm   2         168106  UN               UID=0
  11451             rm   0         168140  UN               UID=0
  11446             rm   2         168172  UN               UID=0
  11452             rm   3         172493  UN               UID=0

 ** Execution took   0.07s (real)   0.07s (CPU)

Similarly, when the '--maxpids=N' option is used with '--mem', it will restrict the output to the specified number of processes:

crash> taskinfo --mem --maxpids=5
 ==== First 5 Tasks reverse-sorted by RSS+SHM ====
   PID=  6622 CMD=gnome-shell     RSS=0.089 Gb
   PID=  5770 CMD=firewalld       RSS=0.027 Gb
   PID=  6581 CMD=X               RSS=0.023 Gb
   PID=  6685 CMD=gnome-settings- RSS=0.021 Gb
   PID=  4624 CMD=multipathd      RSS=0.019 Gb

 ==== First 5 Tasks Reverse-sorted by RSS only ====
   PID=  6622 CMD=gnome-shell     RSS=0.089 Gb
   PID=  5770 CMD=firewalld       RSS=0.027 Gb
   PID=  6581 CMD=X               RSS=0.023 Gb
   PID=  6685 CMD=gnome-settings- RSS=0.021 Gb
   PID=  4624 CMD=multipathd      RSS=0.019 Gb

 === Total Memory in RSS  0.497 Gb
 === Total Memory in SHM  0.000 Gb

 ** Execution took   0.21s (real)   0.10s (CPU), Child processes:   0.10s

Display the details for a given PID (--pidinfo)

To view more detailed information about a particular process, use '--pidinfo'. It prints the address of the 'task_struct' associated with the given PID, 'uid' and 'gid' credentials with which the process was executed, and resource limits:

crash> taskinfo --pidinfo=355
----    355(UN) <struct task_struct 0xffff880211d16eb0> scsi_eh_7
   cpu 2
   -- Parent: 2 kthreadd
   -- Credentials
      uid=18446612133217246980   gid=18446612133217246984
      suid=18446612133217246988  sgid=18446612133217246992
      euid=18446612133217246996  egid=18446612133217247000
      fsuid=18446612133217247004 fsgid=18446612133217247008
     --user_struct <struct user_struct 0xffffffff81a345a0>
      processes=401 files=0 sigpending=0
     --group_info <struct group_info 0xffffffff81a3dd80>
   -- Rlimits:
    03 (RLIMIT_STACK) cur=8388608 max=INFINITY
    04 (RLIMIT_CORE) cur=0 max=INFINITY
    06 (RLIMIT_NPROC) cur=30294 max=30294
    07 (RLIMIT_NOFILE) cur=1024 max=4096
    08 (RLIMIT_MEMLOCK) cur=65536 max=65536
    11 (RLIMIT_SIGPENDING) cur=30294 max=30294
    12 (RLIMIT_MSGQUEUE) cur=819200 max=819200
    13 (RLIMIT_NICE) cur=0 max=0
    14 (RLIMIT_RTPRIO) cur=0 max=0
   --- thread_info <struct thread_info 0xffff880035200000>

  ** Execution took   0.12s (real)   0.12s (CPU)

Filter the list of processes by state (--taskfilter)

To get a list of processes filtered by state, use '--taskfilter'.

For example, the below command will list only the processes in running (RU) state:

crash> taskinfo --taskfilter=RU
=== Tasks in PID order, grouped by Thread Group leader ===
 PID          CMD       CPU   Ran ms ago   STATE
--------   ------------  --  ------------- -----
>     0      swapper/0   0        1172313  RU               UID=0
>  6868           bash   1              0  RU               UID=0
   7598    kworker/1:0   1            340  RU               UID=0

 ** Execution took   0.05s (real)   0.05s (CPU)

To get a list of processes in Uninterruptible (UN) state:

crash> taskinfo --taskfilter=UN
=== Tasks in PID order, grouped by Thread Group leader ===
 PID          CMD       CPU   Ran ms ago   STATE
--------   ------------  --  ------------- -----
    308      scsi_eh_0   2            523  UN               UID=0
    355      scsi_eh_7   2           2598  UN               UID=0
   7015   jbd2/dm-30-8   0         167657  UN               UID=0
   7018   jbd2/dm-31-8   2         162648  UN               UID=0
   7021   jbd2/dm-32-8   1         167641  UN               UID=0
   7024   jbd2/dm-33-8   0         167641  UN               UID=0
   7027   jbd2/dm-34-8   1         167641  UN               UID=0
   7030   jbd2/dm-35-8   1         162649  UN               UID=0
   7033   jbd2/dm-36-8   3         162650  UN               UID=0
   7314   kworker/u8:1   1         167673  UN               UID=0
   7317   kworker/u8:4   3         167642  UN               UID=0
  11428             rm   0         168084  UN               UID=0
  11446             rm   2         168172  UN               UID=0
  11449             rm   0         168080  UN               UID=0
  11450             rm   2         168106  UN               UID=0
  11451             rm   0         168140  UN               UID=0
  11452             rm   3         172493  UN               UID=0
  11459             rm   2         167950  UN               UID=0
  11643             dd   0              1  UN               UID=0
  11644             dd   0             10  UN               UID=0
  11646             dd   1             28  UN               UID=0

 ** Execution took   0.07s (real)   0.07s (CPU)

Emulate a user-space 'pstree' output (--pstree)

The '--pstree' option will print the parent child relationship between the processes.

This output is similar to the Linux 'pstree' command:

crash> taskinfo --pstree
           |                      `-2*[{NetworkManager}]
           |                       `-3*[{at-spi-bus-laun}]
           |              |               `-{audispd}
           |              `-{auditd}
           |           |-gdm-session-wor(6593)-+-gnome-session-b(6597)-+-gnome-settings-(6685)---4*[{gnome-settings-}]
           |           |                       |                       |-gnome-shell(6622)-+-ibus-daemon(6666)-+-ibus-dconf(6692)---3*[{ibus-dconf}]
           |           |                       |                       |                   |                   |-ibus-engine-sim(6729)---2*[{ibus-engine-sim}]
           |           |                       |                       |                   |                   `-2*[{ibus-daemon}]
           |           |                       |                       |                   `-6*[{gnome-shell}]
           |           |                       |                       `-3*[{gnome-session-b}]
           |           |                       `-2*[{gdm-session-wor}]
           |           `-3*[{gdm}]
           |              `-qmgr(6508)
           |            |-sshd(6824)---bash(6830)---journalctl(11478)
           |            |-sshd(6862)---bash(6868)
           |            |-sshd(6900)---bash(6906)
           |            `-sshd(6938)---bash(6952)

 ** Execution took   0.05s (real)   0.04s (CPU)

Use reverse order while sorting and printing task details (-r)

The 'taskinfo' program by default lists the process details sorted as per their PID. The '-r' option allows users to sort process details as per their last execution time. This option can also be used to reverse the sort order for the '--times' and '--lastsched' options:

crash> taskinfo |head -10
=== Tasks in PID order, grouped by Thread Group leader ===
 PID          CMD       CPU   Ran ms ago   STATE
--------   ------------  --  ------------- -----
>     0      swapper/0   0        1172313  RU               UID=0
      1        systemd   0           4303  IN               UID=0
      2       kthreadd   2         176550  IN               UID=0
      3    ksoftirqd/0   0            113  IN               UID=0
      5   kworker/0:0H   0              0  IN               UID=0
      7    migration/0   0           3044  IN               UID=0
      8         rcu_bh   0        1172282  IN               UID=0

crash> taskinfo -r|head -10
=== Tasks in reverse order, scheduled recently first ===
 PID          CMD       CPU   Ran ms ago   STATE
--------   ------------  --  ------------- -----
>  6868           bash   1              0  RU               UID=0
   7119    kworker/3:0   3              0  IN               UID=0
   4069   kworker/1:1H   1              0  IN               UID=0
    874   kworker/2:1H   2              0  IN               UID=0
      5   kworker/0:0H   0              0  IN               UID=0
     23    ksoftirqd/3   3              0  IN               UID=0
   5702           rngd   3              0  IN               UID=0

Run a specific command for each task (--cmd)

While analyzing a vmcore, it is often useful to run specific crash commands such as 'bt' or 'bt -f' on a list of processes. The '--cmd' option can be used to run a crash command on a list of processes.

For example, using 'taskinfo --cmd bt' will run the 'bt' command on each process in the list:

crash> taskinfo --cmd bt
=== Tasks in PID order, grouped by Thread Group leader ===
 PID          CMD       CPU   Ran ms ago   STATE
--------   ------------  --  ------------- -----
>     0      swapper/0   0        1172313  RU               UID=0

crash> bt 0
PID: 0      TASK: ffffffff81a02480  CPU: 0   COMMAND: "swapper/0"
 #0 [ffff88021ea08e48] crash_nmi_callback at ffffffff8104fd11
 #1 [ffff88021ea08e58] nmi_handle at ffffffff816b0c57
 #2 [ffff88021ea08eb0] do_nmi at ffffffff816b0e8d
 #3 [ffff88021ea08ef0] end_repeat_nmi at ffffffff816b00b9
    [exception RIP: intel_idle+244]
    RIP: ffffffff816adb04  RSP: ffffffff819efe28  RFLAGS: 00000046
    RAX: 0000000000000001  RBX: 0000000000000002  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: ffffffff819effd8  RDI: 0000000000000000
    RBP: ffffffff819efe58   R8: 00000000000003e3   R9: 0000000000000018
    R10: 00000000000003e2  R11: 0000014a0e39c880  R12: ffffffff819effd8
    R13: 0000000000000002  R14: 0000000000000001  R15: ffffffff81ab8a28
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffffffff819efe28] intel_idle at ffffffff816adb04
 #5 [ffffffff819efe60] cpuidle_enter_state at ffffffff81529e30
 #6 [ffffffff819efe98] cpuidle_idle_call at ffffffff81529f88
 #7 [ffffffff819efed8] arch_cpu_idle at ffffffff81034eee
 #8 [ffffffff819efee8] cpu_startup_entry at ffffffff810e9aba
 #9 [ffffffff819eff30] rest_init at ffffffff81694f17
#10 [ffffffff819eff40] start_kernel at ffffffff81b500e1
#11 [ffffffff819eff88] x86_64_start_reservations at ffffffff81b4f66b
#12 [ffffffff819eff98] x86_64_start_kernel at ffffffff81b4f7bc

PID: 0      TASK: ffff88017ce79fa0  CPU: 1   COMMAND: "swapper/1"
 #0 [ffff88017ceafe48] __schedule at ffffffff816ab2ac
 #1 [ffff88017ceafed0] schedule_preempt_disabled at ffffffff816ac7c9
 #2 [ffff88017ceafee0] cpu_startup_entry at ffffffff810e9afa
 #3 [ffff88017ceaff28] start_secondary at ffffffff81051b96

PID: 0      TASK: ffff88017ce7af70  CPU: 2   COMMAND: "swapper/2"
 #0 [ffff88021eb08e48] crash_nmi_callback at ffffffff8104fd11
 #1 [ffff88021eb08e58] nmi_handle at ffffffff816b0c57
 #2 [ffff88021eb08eb0] do_nmi at ffffffff816b0e8d
 #3 [ffff88021eb08ef0] end_repeat_nmi at ffffffff816b00b9
    [exception RIP: intel_idle+244]
    RIP: ffffffff816adb04  RSP: ffff88017ceb3e20  RFLAGS: 00000046
    RAX: 0000000000000001  RBX: 0000000000000002  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: ffff88017ceb3fd8  RDI: 0000000000000002
    RBP: ffff88017ceb3e50   R8: 00000000000003d4   R9: 0000000000000020
    R10: 00000000000007cd  R11: 0000014a0e2a8640  R12: ffff88017ceb3fd8
    R13: 0000000000000002  R14: 0000000000000001  R15: ffffffff81ab8a28
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff88017ceb3e20] intel_idle at ffffffff816adb04
 #5 [ffff88017ceb3e58] cpuidle_enter_state at ffffffff81529e30
 #6 [ffff88017ceb3e90] cpuidle_idle_call at ffffffff81529f88
 #7 [ffff88017ceb3ed0] arch_cpu_idle at ffffffff81034eee
 #8 [ffff88017ceb3ee0] cpu_startup_entry at ffffffff810e9aba
 #9 [ffff88017ceb3f28] start_secondary at ffffffff81051b96

PID: 0      TASK: ffff88017ce7bf40  CPU: 3   COMMAND: "swapper/3"
 #0 [ffff88021eb88e48] crash_nmi_callback at ffffffff8104fd11
 #1 [ffff88021eb88e58] nmi_handle at ffffffff816b0c57
 #2 [ffff88021eb88eb0] do_nmi at ffffffff816b0e8d
 #3 [ffff88021eb88ef0] end_repeat_nmi at ffffffff816b00b9
    [exception RIP: intel_idle+244]
    RIP: ffffffff816adb04  RSP: ffff88017ceb7e20  RFLAGS: 00000046
    RAX: 0000000000000000  RBX: 0000000000000002  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: ffff88017ceb7fd8  RDI: 0000000000000003
    RBP: ffff88017ceb7e50   R8: 0000000000000137   R9: ffff88021eb97a80
    R10: 7fffffffffffffff  R11: 7fffffffffffffff  R12: ffff88017ceb7fd8
    R13: 0000000000000001  R14: 0000000000000000  R15: ffffffff81ab89d0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff88017ceb7e20] intel_idle at ffffffff816adb04
 #5 [ffff88017ceb7e58] cpuidle_enter_state at ffffffff81529e30
 #6 [ffff88017ceb7e90] cpuidle_idle_call at ffffffff81529f88
 #7 [ffff88017ceb7ed0] arch_cpu_idle at ffffffff81034eee
 #8 [ffff88017ceb7ee0] cpu_startup_entry at ffffffff810e9aba
 #9 [ffff88017ceb7f28] start_secondary at ffffffff81051b96

This option can also be combinted with '--taskfilter' to run the specific crash command on the list of processes in Running or Uninterruptible state:

crash> taskinfo --cmd "bt -f" --taskfilter=RU
=== Tasks in PID order, grouped by Thread Group leader ===
 PID          CMD       CPU   Ran ms ago   STATE
--------   ------------  --  ------------- -----
>     0      swapper/0   0        1172313  RU               UID=0

crash> bt -f 0
PID: 0      TASK: ffffffff81a02480  CPU: 0   COMMAND: "swapper/0"
 #0 [ffff88021ea08e48] crash_nmi_callback at ffffffff8104fd11
    ffff88021ea08e50: ffff88021ea08ea8 ffffffff816b0c57
 #1 [ffff88021ea08e58] nmi_handle at ffffffff816b0c57
    ffff88021ea08e60: 0000000000000000 ffff88021ea08ef8
    ffff88021ea08e70: ffffffff81a1a700 a89424830a090e96
    ffff88021ea08e80: ffff88021ea08ef8 ffffffff819effd8
    ffff88021ea08e90: 0000000000000000 ffffffff819effd8
    ffff88021ea08ea0: 00000000ffffffff ffff88021ea08ee8
    ffff88021ea08eb0: ffffffff816b0e8d
 #2 [ffff88021ea08eb0] do_nmi at ffffffff816b0e8d
    ffff88021ea08eb8: 0000000000000000 0000000000000001
    ffff88021ea08ec8: 00007f451748c000 0000000000000001
    ffff88021ea08ed8: 000000009845a000 ffffffff81ab8a28
    ffff88021ea08ee8: ffffffff819efe58 ffffffff816b00b9
 #3 [ffff88021ea08ef0] end_repeat_nmi at ffffffff816b00b9
    [exception RIP: intel_idle+244]
    RIP: ffffffff816adb04  RSP: ffffffff819efe28  RFLAGS: 00000046
    RAX: 0000000000000001  RBX: 0000000000000002  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: ffffffff819effd8  RDI: 0000000000000000