Using the Watchdog Tool

Watchdog is a stand-alone tool that issues a command to generate a thread dump whenever memory exceeds the configured threshold after a full garbage collection (GC). This tool requires that the Java garbage collection log be turned on at startup.

Watchdog monitors the memory space through the GC log that the PPM Server generates. If the memory used after garbage collection is greater than a set threshold value, the Watchdog issues a command to generate a thread dump, and the thread dump is captured in the server log. You can configure the Watchdog tool to send out email notifications about this event.

The Watchdog tool does not affect the PPM functionality. It is platform-dependent because it uses different mechanisms to generate thread dumps on Windows than on other, UNIX-like platforms.

Note: Watchdog is not currently supported on AIX systems.

The memory used after a full GC is compared with the threshold. The Watchdog tool is interested in the following record in the GC log:

  • With the JVM -server option:

    7.138: [Full GC [PSYoungGen: 3016K->0K(229376K)] [PSOldGen: 0K->2956K(524288K)] 3016K->2956K(753664K) [PSPermGen: 9983K->9983K(20480K)], 0.1605436 secs]
  • Without the -server option:

    147.032: [Full GC 147.032: [Tenured: 30756K->34733K(227584K), 0.2966210 secs] 50507K->34733K(253184K), [Perm : 33487K->33487K(131072K)], 0.2967583 secs]

In the second example (without the -server option), the Watchdog reads the record and parses out the memory used before GC as 50507K, and memory used after GC as 34733K. The Watchdog then compares the memory used after GC, 34733K in this case, with the set threshold. If the threshold is set to 30, then the record triggers a thread dump. If the threshold is set to 35, it does not.

When the memory first exceeds threshold, PPM is considered to be entering a critical condition. A thread dump is triggered and a notification is sent.

After the next full GC, if the memory still exceeds the threshold (PPM remains in critical condition). No dump is generated as long as the memory is still higher after entering critical condition.

When the memory used falls below the threshold in subsequent GCs, PPM is considered to be exiting a critical condition. In this case, no thread dump is generated. You can configure the Watchdog tool to send out email notifications about this event.

If, after exiting a critical state, the memory used again exceeds the set threshold, a new critical condition starts. A thread dump is triggered and a notification is sent (if set up) every time PPM enters the critical condition.

Tip: To collect thread dumps when a threshold value is not desired, you can,

  • Use the Watchdog Tool and set memory_threshold to 0. Or,

  • (Recommended) Use Stack Trace tool jstack to create all thread dumps on all operating systems.
    For example, jstack pid >a.log
    The jstack tool is present in the <JDK_HOME>/bin directory.