Getting hung task details (hanginfo) ==================================== The hanginfo program available in the PyKdump framework can be used to quickly process various information about hung tasks, i.e. those tasks in kernel uninterruptible state. It displays the list of processes stuck on various function calls, tasks waiting for mutex locks, lock owner, etc. Options provided by ‘hanginfo’:: crash> hanginfo -h Usage: hanginfo [options] Options: -h, --help show this help message and exit -v verbose output --version Print program version and exit --maxpids=MAXPIDS Maximum number of PIDs to print --sortbypid Sort by pid (the default is by ran_ago) --syslogger Print info about hangs on AF_UNIX sockets (such as used by syslogd --tree Print tree of resources owners (experimental!) --saphana Print recommendations for SAP HANA specific hangs ** Execution took 0.02s (real) 0.02s (CPU) crash> * `Maximum number of PIDs to print (--maxpids=MAXPIDS)`_ * `Sort by pid (--sortbypid)`_ * `Print info about hangs on AF_UNIX sockets (--syslogger)`_ * `Print tree of resource owners (--tree)`_ * `Print recommendations for SAP HANA specific hangs (--saphana)`_ Maximum number of PIDs to print (--maxpids=MAXPIDS) ----------------------------------------------------- Running hanginfo without any options prints information about all of the hung tasks. These details include the last function call after which the process was stuck, how long it was in UN state, etc.:: crash> hanginfo *** UNINTERRUPTIBLE threads, classified *** ================== Waiting in io_schedule ================== ... 19 pids. Youngest,oldest: 11643, 11452 Ran ms ago: 1, 172493 printing 10 out of 19 sorted by ran_ago, youngest first [11643, 11644, 11646, 7018, 7030, <9 skipped>, 11428, 11450, 11451, 11446, 11452] =============== Waiting in schedule_timeout ================ ... 2 pids. Youngest,oldest: 308, 355 Ran ms ago: 523, 2598 sorted by ran_ago, youngest first [308, 355] *** System activities other threads are waiting for *** --- Doing schedule_timeout --- {355, 308} --- Doing io_schedule --- {11459, 7314, 11643, 7317, 11644, 11428, 7033, 7015, 7018, 7021, 7024, 7027, 11446, 7030, 11449, 11450, 11451, 11452, 11646} +++WARNING+++ Possible hang ****************************************************************************** ************************ A Summary Of Problems Found ************************* ****************************************************************************** -------------------- A list of all +++WARNING+++ messages -------------------- Possible hang ------------------------------------------------------------------------------ ** Execution took 0.19s (real) 0.19s (CPU) crash> Users can choose to limit the number of processes displayed in output by using the '--maxpids=MAXPIDS' option. For example, use '--maxpids=5' to limit the sorted list of hung tasks to 5 processes only:: crash> hanginfo --maxpids=5 *** UNINTERRUPTIBLE threads, classified *** ================== Waiting in io_schedule ================== ... 19 pids. Youngest,oldest: 11643, 11452 Ran ms ago: 1, 172493 printing 5 out of 19 sorted by ran_ago, youngest first [11643, 11644, <14 skipped>, 11451, 11446, 11452] <--- Only 5 PIDs are printed from the sorted list of hung tasks =============== Waiting in schedule_timeout ================ ... 2 pids. Youngest,oldest: 308, 355 Ran ms ago: 523, 2598 sorted by ran_ago, youngest first [308, 355] *** System activities other threads are waiting for *** --- Doing schedule_timeout --- {355, 308} --- Doing io_schedule --- {11459, 7314, 11643, 7317, 11644, 11428, 7033, 7015, 7018, 7021, 7024, 7027, 11446, 7030, 11449, 11450, 11451, 11452, 11646} +++WARNING+++ Possible hang ****************************************************************************** ************************ A Summary Of Problems Found ************************* ****************************************************************************** -------------------- A list of all +++WARNING+++ messages -------------------- Possible hang ------------------------------------------------------------------------------ ** Execution took 0.19s (real) 0.19s (CPU) crash> Print info about hangs on AF_UNIX sockets (--syslogger) -------------------------------------------------------- The '--syslogger' option checks for tasks hung on AF_UNIX sockets, in particular the socket used by the system logging daemon, and displays them if found:: crash> hanginfo --syslogger +++WARNING+++ A problem with syslog daemon pid=6799 state=UN It ran 5878.06s ago and 9 processes are waiting for it -- Socket we wait for: /dev/log Youngest process with this socket pid=6799(UN) ran 5878.06s ago ... 9 tasks waiting for this socket If '-v' (verbose output) is also used, the command will list the tasks waiting on the socket:: crash> hanginfo --syslogger -v -- Socket we wait for: /dev/log Youngest process with this socket pid=6799(UN) ran 5878.06s ago ... 9 tasks waiting for this socket pid= 6807 CMD=klogd pid= 62979 CMD=cron pid= 62999 CMD=cron pid= 63009 CMD=cron pid= 63022 CMD=cron pid= 63034 CMD=cron pid= 63045 CMD=cron pid= 63055 CMD=cron pid= 93764 CMD=sshd Sort by pid (--sortbypid) --------------------------- The hanginfo program by default sorts the processes by the amount of time they were in UN (uninterruptible) state. To sort the processes by their PIDs, use '--sortbypid':: crash> hanginfo --sortbypid *** UNINTERRUPTIBLE threads, classified *** ================== Waiting in io_schedule ================== ... 19 pids. Youngest,oldest: 11643, 11452 Ran ms ago: 1, 172493 printing 10 out of 19 sorted by pid [7015, 7018, 7021, 7024, 7027, ..., 11452, 11459, 11643, 11644, <--- Sorted as per the PIDs 11646] =============== Waiting in schedule_timeout ================ ... 2 pids. Youngest,oldest: 308, 355 Ran ms ago: 523, 2598 sorted by pid [308, 355] *** System activities other threads are waiting for *** --- Doing schedule_timeout --- {355, 308} --- Doing io_schedule --- {11459, 7314, 11643, 7317, 11644, 11428, 7033, 7015, 7018, 7021, 7024, 7027, 11446, 7030, 11449, 11450, 11451, 11452, 11646} +++WARNING+++ Possible hang ****************************************************************************** ************************ A Summary Of Problems Found ************************* ****************************************************************************** -------------------- A list of all +++WARNING+++ messages -------------------- Possible hang ------------------------------------------------------------------------------ ** Execution took 0.20s (real) 0.19s (CPU) Print tree of resource owners (--tree) ----------------------------------------- The '--tree' option prints a graphical representation of processes waiting for specific operations:: crash> hanginfo --tree *** UNINTERRUPTIBLE threads, classified *** [...] +++WARNING+++ Possible hang ------------------------------------------------------------------------------ ┌───────────┐ │io_schedule│ └─┬─────────┘ │ ┌─────────────────────────────┐ │ │7015,7018,7021,7024,7027 │ │ │7030,7033,7314,7317,11428 │ └─┤11446,11449,11450,11451,11452│ │11459,11643,11644,11646 │ └─────────────────────────────┘ ------------------------------------------------------------------------------ ┌────────────────┐ │schedule_timeout│ └─┬──────────────┘ │ ┌───────┐ └─┤308,355│ └───────┘ ****************************************************************************** ************************ A Summary Of Problems Found ************************* ****************************************************************************** -------------------- A list of all +++WARNING+++ messages -------------------- Possible hang ------------------------------------------------------------------------------ ** Execution took 0.17s (real) 0.17s (CPU) crash> Print recommendations for SAP HANA specific hangs (--saphana) --------------------------------------------------------------- WIP (may or may not eventually be implemented)