Debugging application performance with perf4j and logback

Performance bugs can be difficult to diagnose. If the bug is intermittent, it gets even tougher.

Fortunately, Java provides tools like VisualVM for monitoring processes, but things get more complicated when you are dealing with performance issues in a distributed system. Our application consists of many "job processes" which distribute the work of building large reports. During our stress tests, we'd occasionally see a report run with very poor performance. We needed straightforward way to capture performance information for all of the processes at once.

Timing Events

perf4j provides a simple StopWatch mechanism. Wiring it into our project is straightforward—we added the dependency to our pom.xml.

<dependency>
    <groupId>org.perf4j</groupId>
    <artifactId>perf4j</artifactId>
</dependency>

then wired it up to log to a file in our logback configuration. The additivity flag on the perf4j logger ensures that these log messages will go only to perf.log and not the console or our regular application log.

Now at various points in the code, we can start and stop the watch to time events.

The result is log statements formatted as follows. The fields here are:

  • tag[] - the event
  • start[] - timestamp when the stopwatch started
  • time[] - time in milliseconds for the event

Centralized Logging

Now we have performance data in log files, a good first step. The perf4j jar comes with a parser to calculate the avg/min/max time for each tag, although this solution doesn't scale with multiple processs spread across servers.

Enter logback's DBAppender, which allows us to direct the timing logs to a central database. Drivers for every popular RDBMS are provided. For this application, we're using Oracle so the setup looks something like this (note the use of system properties for the connection info):

Conclusion

With the help of these tools we were able to quickly generate reports to isolate the performance issue.

It's worth noting that logging to a database itself hurts performance. For our scenario, this was ok because we were looking for relative performance timings inside a few known hotspots.

The logback documentation also suggests using pooled connections, but in general this is probably not something you'd want to keep on in production indefinitely. Fortunately, turning it off is straightforward.

<logger name="org.perf4j.TimingLogger" additivity="false">
    <level value="OFF"/>
</logger>

results matching ""

    No results matching ""