Getting hung task details (hanginfo)

The hanginfo program available in the PyKdump framework can be used to quickly process various information about hung tasks, i.e. those tasks in kernel uninterruptible state. It displays the list of processes stuck on various function calls, tasks waiting for mutex locks, lock owner, etc.

Options provided by ‘hanginfo’:

crash> hanginfo -h
Usage: hanginfo [options]

Options:
  -h, --help         show this help message and exit
  -v                 verbose output
  --version          Print program version and exit
  --maxpids=MAXPIDS  Maximum number of PIDs to print
  --sortbypid        Sort by pid (the default is by ran_ago)
  --syslogger        Print info about hangs on AF_UNIX sockets (such as used by syslogd
  --tree             Print tree of resources owners  (experimental!)
  --saphana          Print recommendations for SAP HANA specific hangs

 ** Execution took   0.02s (real)   0.02s (CPU)
crash>

Maximum number of PIDs to print (--maxpids=MAXPIDS)

Running hanginfo without any options prints information about all of the hung tasks. These details include the last function call after which the process was stuck, how long it was in UN state, etc.:

crash> hanginfo
 *** UNINTERRUPTIBLE threads, classified ***

 ================== Waiting in io_schedule ==================
    ... 19 pids. Youngest,oldest: 11643, 11452  Ran ms ago: 1, 172493
        printing 10 out of 19
        sorted by ran_ago, youngest first
      [11643, 11644, 11646, 7018, 7030, <9 skipped>, 11428, 11450,
       11451, 11446, 11452]

 =============== Waiting in schedule_timeout ================
    ... 2 pids. Youngest,oldest: 308, 355  Ran ms ago: 523, 2598
        sorted by ran_ago, youngest first
      [308, 355]

*** System activities other threads are waiting for ***
  --- Doing schedule_timeout ---
{355, 308}
  --- Doing io_schedule ---
{11459, 7314, 11643, 7317, 11644, 11428, 7033, 7015, 7018, 7021, 7024, 7027, 11446, 7030, 11449, 11450, 11451, 11452, 11646}


+++WARNING+++ Possible hang

******************************************************************************
************************ A Summary Of Problems Found *************************
******************************************************************************
-------------------- A list of all +++WARNING+++ messages --------------------
    Possible hang
------------------------------------------------------------------------------

 ** Execution took   0.19s (real)   0.19s (CPU)
crash>

Users can choose to limit the number of processes displayed in output by using the '--maxpids=MAXPIDS' option. For example, use '--maxpids=5' to limit the sorted list of hung tasks to 5 processes only:

crash> hanginfo --maxpids=5
 *** UNINTERRUPTIBLE threads, classified ***

 ================== Waiting in io_schedule ==================
    ... 19 pids. Youngest,oldest: 11643, 11452  Ran ms ago: 1, 172493
        printing 5 out of 19
        sorted by ran_ago, youngest first
      [11643, 11644, <14 skipped>, 11451, 11446, 11452]             <--- Only 5 PIDs are printed from the sorted list of hung tasks

 =============== Waiting in schedule_timeout ================
    ... 2 pids. Youngest,oldest: 308, 355  Ran ms ago: 523, 2598
        sorted by ran_ago, youngest first
      [308, 355]

*** System activities other threads are waiting for ***
  --- Doing schedule_timeout ---
{355, 308}
  --- Doing io_schedule ---
{11459, 7314, 11643, 7317, 11644, 11428, 7033, 7015, 7018, 7021, 7024, 7027, 11446, 7030, 11449, 11450, 11451, 11452, 11646}


+++WARNING+++ Possible hang

******************************************************************************
************************ A Summary Of Problems Found *************************
******************************************************************************
-------------------- A list of all +++WARNING+++ messages --------------------
    Possible hang
------------------------------------------------------------------------------

 ** Execution took   0.19s (real)   0.19s (CPU)
crash>

Sort by pid (--sortbypid)

The hanginfo program by default sorts the processes by the amount of time they were in UN (uninterruptible) state. To sort the processes by their PIDs, use '--sortbypid':

crash> hanginfo --sortbypid
 *** UNINTERRUPTIBLE threads, classified ***

 ================== Waiting in io_schedule ==================
    ... 19 pids. Youngest,oldest: 11643, 11452  Ran ms ago: 1, 172493
        printing 10 out of 19
        sorted by pid
      [7015, 7018, 7021, 7024, 7027, ..., 11452, 11459, 11643, 11644,   <--- Sorted as per the PIDs
       11646]

 =============== Waiting in schedule_timeout ================
    ... 2 pids. Youngest,oldest: 308, 355  Ran ms ago: 523, 2598
        sorted by pid
      [308, 355]

*** System activities other threads are waiting for ***
  --- Doing schedule_timeout ---
{355, 308}
  --- Doing io_schedule ---
{11459, 7314, 11643, 7317, 11644, 11428, 7033, 7015, 7018, 7021, 7024, 7027, 11446, 7030, 11449, 11450, 11451, 11452, 11646}


+++WARNING+++ Possible hang

******************************************************************************
************************ A Summary Of Problems Found *************************
******************************************************************************
-------------------- A list of all +++WARNING+++ messages --------------------
    Possible hang
------------------------------------------------------------------------------

 ** Execution took   0.20s (real)   0.19s (CPU)