Randoop: Automatic unit test generation for Java

Randoop is an automatic unit test generator for Java (and .NET). Randoop is written in Java and is available either from its project page or GitHub page. It is available under the MIT license. As of 2017-12-24, the project seems to be quite alive, although most of the commits are authored by a single developer (but the project accepts occasional PRs). Randoop appears to be driven by a research group at the University of Washington, but the overall quality of the project structure, supporting documentation, build system and other project artifacts is excellent.

Basics

According to its documentation, Randoop generates tests using feedback-directed random test generation. It randomly (but smartly) generates sequences of constructor and method invocation for input classes. These sequences are executed, and the results are used to create assertions. This means the tests can mostly only capture the actual behavior of the tested class (possibly for future regression testing), not reveal many new bugs. There is an exception to this, though – Randoop can detect when the class under test does not conform to basic Java contracts (Object.equals() and the like) and several other likely-buggy behaviors, such as NullPointerException being thrown when no null values are passed as params to a method. The documentation states that it is possible to add more contracts for checking.

Installation and usage

I have cloned the Git repository and followed the manual to build Randoop from source using Gradle. The build went for about five minutes and produced a JAR file. I have tried to execute Randoop on a little library I developed when working on the static analysis of C programs, smg.

I started with generating tests for the simple SMGRegion class. After a little fiddling with params, Randoop ran for a while, generating 9 files about 2MB each, with 4286 tests (so about 18MB total, which looks a bit excessive a ~60 lines long class). No “error revealing” tests were generated, just the regression tests. I have tried to execute the tests, and they all passed. Their total runtime was 0.105 seconds, which is good. I tried to introduce a change in the tested class and rerun the tests, and now 2506 tests failed as a result.

Afterwards, I have tried to include all public classes, and the results were about the same – about ~4200 tests, no error-revealing (but still, Randoop can find just basic Java contract violations).

The generated tests are straightforward (just constructor and method invocation sequences) but quite long, with the usual appearance of generated code (numbered variable names, etc.). I was able to investigate the fails quickly, but of course, the generated code has no real semantic meaning that would hint the programmer about why the bug is there, not other than “this worked before”.

When would I use it?

Randoop seems quite useful to me. It is mature enough, well-documented and quite easy to use. I also did not encounter any problems with the tool. Its error-revealing mode could be run as part of CI, being basically a simple fuzzer for Java contracts (but I think existing static analyzers could do the same job).

The generated tests usefulness is slightly more questionable. They could serve as regression tests, as they can only alert you later when you, perhaps mistakenly, change observable behavior. The good thing is Randoop can indeed create tests that you possibly need but did not write. You could generate a testsuite at a particular point in time and keep executing it: this way you would have a nice regression suite, but you would not test any code added after the suite was generated. Regenerating the suite after each change seems too expensive, but has some merit (of course, only if the original suite was run and passed first). Perhaps some discard-few-old, generate-few-new strategy might be employed there (I guess these strategies are probably discussed somewhere in the related scientific papers, such as the authors’ Scaling Up Automated Test Generation ASE’2011 paper.

I can also imagine situations where Randoop generates tests that capture “undefined” behavior, like ordering or specific values that may change between execution. The user manual briefly discusses this, and the tool provides few techniques that can be applied to prevent such behavior.

Written on December 25, 2017