Monday, May 02, 2005

Reflections On Debugging

This rather pedagogically worded article is a collection of my thoughts on debugging Java software, the programming patterns I have used, some useful API and techniques.

What it is not - it is definitely not complete in terms of information on debugging, its techniques, styles, etc. It is primarily a list of things that have worked for me time and again and a few tools that I keep in my toolkit to use when the situation demands it. I think they will be of use to you as well.

I have been fortunate to work in environments where I touched upon various facets of Java, used various APIs and generally did extremely satisfying work. In all these years, debugging has stood out as an activity that everybody has to perform almost as much as they code or design. I have noticed time and again that being able to debug well is an extremeley useful skill. It can be learnt over time and honed and it is to a large extent the ability to match problem patterns to past issues. People like Rajiv can uncannily pinpoint a problem's cause when they hear its description. This ability comes from years of experience and the intent to learn from every new debugging experience.

Debugging

Debugging is the act of locating and fixing a flaw in software. A flaw can manifest itself in multiple ways. Sometimes it is apparent such as cases when the program crashes or does not do the intended action or does not return the intended result. Sometimes it is hard to say what's wrong when a program does not return or the CPU keeps processing something or when the program does something unexpected in addition to the right action. Debugging, of course, is the action we take post having seen a flaw.

Isolating the problem to code : Identifying where to look for a problem

The problem or flaw appears as a failure of the software to do something it should have. When you encounter a flaw, to debug it, you need to form a mental model of the code to identify where is the code that failed. Debugging largely follows the process of elimination and this process is helped by any symptoms that you can find.

When you have the piece of code that failed you try and find the cause by asking and answering questions - what is occuring? What possible causes could result in this problem? E.g. If something should've happened and it didn't perhaps the code was not reached. Why would the code not be reached? Maybe the if condition under which the method gets called did not evaluate to true or perhaps the if (something != null) check was called when something had the value null.

Another example - If there is an exception then there is additional information about the location of failure and the steps that led to it. The type of Exception will tell you the nature of failure. So if you see a ConcurrentModificationException thrown by an Iterator's next() method you will have to -
a) Find out under what conditions this happens.
b) How could these conditions have been created in your program?
c) Maybe you removed something using list.remove() in your loop, or perhaps you passed reference to the list to some other thread which is modifying it.
Once you have a mental picture of the surroundings of the problem and why it might be occuring, it’s a matter of eliminating the reasons one by one starting from the most likely cause.

Even if you intend to use a debugger, this is a necessary step. You have to always backtrack mentally from the point of failure to locate all possiible causes of failure. A lot of debugging skill relies on this one ability alone.

Reading an Exception

Java Exceptions have a lot of information in them and should be a very well understood to debug problems. Often, I have noticed programmers use the following template for exceptions.
try {
// do stuff here
} catch(Exception ex) {
System.out.println(ex);
}
If you wish to print the exception to know when a problem has occurred then you must consider using ex.printStackTrace(). There are multiple advantages -
A) When using System.out.println(ex), several times there is no message printed other than the class name of the exception that occurred. Imagine having this piece of code in multiple locations, how will you ever know which catch handler printed java.lang.NullPointerException.
B) An exception when printed stands out in a log file or the console. It is several lines long and just the pattern of an exception stack trace print is so different from other message, it is much easier to find than an exception message that looks like other logging statements.
C) Following JDK1.4, chained exceptions get printed as well and you don't have to manage them manually. Root cause gets carried along with the exception.
D) Lastly and most importantly, the stack trace contains a wealth of information that can be used to create a mental picture of what happened.

There have been lots of times when I have looked at a stack trace and said - it should not have come here and been able to trace the problem to a wrong check in an earlier part of the code. You must know how to read an exception. Here's a Java exception printed out.

: Output generated by System.out.println() :
java.lang.ArrayIndexOutOfBoundsException: 0


: Output generated by ex.printStackTrace() :
java.lang.ArrayIndexOutOfBoundsException: 0
at com.sonicsw.tools.test.ThrowException.processArgs(ThrowException.java:32)
at com.sonicsw.tools.test.ThrowException.main(ThrowException.java:21)



1) java.lang.ArrayIndexOutOfBoundsException: 0
The first part of printStackTrace is to do a print out the exception similar to the System.out.println so already you have gotten that for free. This part of the exception stack trace is formatted according to the type of exception and the information printed varies from exception to exception. Some exceptions print nothing more than the class of the exception. Some exceptions (specially custom ones) print a lot of context information that led to this exception.
2) at com.sonicsw.tools.test.ThrowException.processArgs(ThrowException.java:32)
The rest of the exception is the stack trace starting with the location that threw the exception at the top and the caller of the method in which the exception was thrown below it and so on until the executing thread's run method or the main method. The information provided on this line consists of -
the fully qualified class name - com.sonicsw.tools.test.ThrowException
the method - processArgs
the file - ThrowException.java
line number - 32.
Obviously, these are great nuggets of information. In some cases, when compilation does not include debugging information, you can end up with stack traces that do not have line numbers. That’s usually a bummer but atleast you have the stack of methods to locate where the problem occurred.

Other variations for method names are <clinit> for a static initializer - (this is also an example of exception chaining - note the "Caused by:")
java.lang.ExceptionInInitializerError
Caused by: java.lang.IllegalArgumentException
at com.sonicsw.tools.test.ThrowException.<clinit>
(ThrowException.java:21)

<init> for constructors and initializers
java.lang.IllegalArgumentException
at com.sonicsw.tools.test.ThrowException.<init>
(ThrowException.java:20)
at com.sonicsw.tools.test.ThrowException.main(ThrowException.java:24)


and the $number convention for anonymous classes -
java.lang.IllegalArgumentException
at com.sonicsw.tools.test.ThrowException$1.actionPerformed(ThrowException.java:25)
at com.sonicsw.tools.test.ThrowException.
(ThrowException.java:23)
at com.sonicsw.tools.test.ThrowException.main(ThrowException.java:31)


Ofcourse, if the line number is available it makes it a lot easier.

Sometimes, to understand why an exception occurred, you have to understand how control got to that point. The easy exceptions are always the ones which have a localized problem and you can easily catch the problem by the exception. But things can get really hard. E.g. The culprit could lie several methods below and might not even be caused in this thread of execution. A previous event might have generated a bad value which was stored in a field and was picked up by this thread of execution in which the exception occurred. To get to the bottom of these, you need to create a mental back map of methods and events that might have occurred. It always helps to see what methods were called to get to the point of failure.

Trapping Exceptions

Sometimes, something you expected to execute does not occur and there is no information on why it failed. For some reason, control got transferred out of your method. Once you have eliminated any if conditions that are failing it could be an exception getting thrown from somewhere deep inside your code. No exception gets printed though. This usually happens, when you are implementing a piece that fits into a framework. Perhaps, the framework has a logging switch that’s set too low for exceptions to get printed. Or perhaps, the framework is faulty and is not printing exceptions being thrown by overridden methods. Perhaps, the framework is failing because of the unexpected exception being thrown by your method. In such cases, it is best to eliminate this possibility by wrapping the entire method by a try-catch block. For such debugging situations, I prefer to wrap with a try … catch(Throwable t) block because you want to be sure that no Exception or Error is being thrown. Sometimes, when Errors get thrown (such as a NoClassDefFoundError because of a faulty classpath) it will slip through all catch(Exception) blocks.

Debugging is often an exercise in eliminating possibilities and locating the faulty piece of code. This is one technique to achieve that.

When to use Thread.dumpStack

Another neat tool to keep in your toolkit is using Thread.dumpStack() or equivalently new Exception().printStackTrace(). What either of these methods do is to do a printStackTrace() at this line without actually throwing an exception. The usual reason you do this is because you want to know what caused control of execution to come to this point. Problems caused by a method being called unexpectedly can be identified by this.

Using VM Thread dumps and understanding them

My biggest complaint with a lot of experienced Java developers is that they have never heard of the most amazing debugging tool called the VM Thread dump. You can use this technique in innovative ways -
A) To detect deadlocks
B) To diagnose UI hanging problems
C) To diagnose slow UI issues
D) To diagnose spinning/infinite loops
E) For quick and dirty profiling
F) To get an understanding of what the VM is doing at that instant

I cannot do better than this excellent article on this topic available here.

Classpath problems

Another class of problems are classpath issues. There are times when you are not sure if there is another version of a class in the classpath that is getting picked up before yours. Usually a result of a bad environment setup. To eliminate this possibility a simple check is to add a print statement to see if your new code gets picked up. If it isn't getting picked up you need to locate the other class that is getting picked up. One neat API in Java that allows you to locate where a class is being picked up is

Class.getProtectionDomain().getCodeSource().getLocation()

In most cases, depending on the class loader being used you will get the location of the class which is being executed and you can correct your environment setup.

When to use a debugger and when to print

There are times when you should use a debugger and there are times when a print statement is more useful. When you can use what is very well described by the brilliant pieces of work mentioned in the references section. The point I want to make here is that you must know what can be achieved using a debugger. It is an extremely powerful tool and can reduce debugging time quite a bit. It is applicable in many cases but is not suitable for a certain set of problems such as threading issues or issues that show up in long running tasks in which case a log with good print statements is essential.

Good debug printing

Once you have a mental check-list of what the causes might be, you need to eliminate them. Debug prints can tell you quickly if your assumption is right or wrong. Ensure that your debug prints are not causing any side effects inadvertantly. E.g.

System.out.println("value.getCode() = " + value.getCode());
if (value == null)
return;
switch(value.getCode()) {

}

It is important to print sufficient information about the object you are interested in. Perhaps your code is falling through a switch statement without firing any of the case clauses. You will need to print the value of the switch condition. Think a little before you decide to type in any debug print statements. Often when using good debug print statements, the code gets peppered at useful locations with debug statements which can be switched off with a boolean for future use. An easy to use pattern is

private static boolean DEBUG = Boolean.getBoolean("<classname>.debug").booleanValue();

So e.g. if the class is com.test.ArgumentsProcessor then you would write
private static boolean DEBUG = Boolean.getBoolean("com.test.ArgumentsProcessor.debug").booleanValue();

The advantage is that you can switch on debugging for this class without recompiling anything by specifying -Dcom.test.ArgumentsProcessor.debug=true when starting your VM.
E.g. java -Dcom.test.ArgumentsProcessor.debug=true <main-class> <args>

However, the compiler will not remove your debug statements during code optimization in this case.

Using Logging

There is not much that I can add on using Logging that is not already covered by a vast amount of material. You can take a look at the References section below for other information on this subject. The point I would like to make here is that when you are writing print statements that go to a log for debugging purposes, maybe as a patch to a customer to diagnose a particularly tricky problem, think about how the log file might look when it is sent to you. What we think the output would be like when we write log print statements changes dramatically in a live system with multiple threads executing the same log messages. Maybe you need to print the Thread id to bunch all logs or an operation together or perhaps some business data structure id needs to printed with each statement to understand what's happening. You may need to print out the execution path leading to the suspected problem location to know what conditions caused control flow to get there. Another thing to keep in mind is that log files can get so verbose and have so many messages that it becomes very difficult to scan them for problems later.

References

The Art of Programming (Brian Kernighan, Rob Pike) Highly recommended reading
How to be a Programmer (Robert L Read) The first chapter is on debugging and is really well written.