John Sanda's blog: refactoring

In his book, xUnit Test Patterns, Gerard Meszaros provides an in-depth, analytical discourse on unit testing patterns. Meszaros provides prerequisite information before getting into the patterns. There is a section on test fixtures that I found particularly useful. He defines a test fixture as everything that we need to exercise the system under test (SUT) - in other words, the pre-conditions of the test. Let's suppose we are testing an XML parser. Our test fixture will include an XML document that will be fed to the parser. The fixture setup is the part of the test logic that is executed to set up the test fixture. Continuing with our parser example, our fixture setup might require reading an XML document from the file system, or it may involve constructing a document in memory at runtime. After defining some terminology, Meszaros goes through common test fixture strategies. These strategies lay the ground work for the patterns discussed in the book. In fact, it quickly becomes apparent that understanding these strategies plays a big role in getting the most out of these patterns.

Transient Fresh Fixture
A transient fresh fixture exists only in memory and only during the test in which it is used. It does not outlive the test. Fixture tear down is implicit (assuming a language that provides garbage collection). The fixture is created at the start of the test, and it is discarded at the end of the test. Each test creates its own fixture. In other words, The test creates the objects that it needs. Creation of those objects might be delegated to some helper object, but it is the test itself that is initiating the creation. The test does not re-use any part of a pre-built fixture or a fixture from another test. If we elect to use a transient fixture for our XML parser, then the test must create the document that will be fed to the parser. The primary disadvantafge of a transient fresh fixture is that it must be created for each and every test. In some situations this may lead to performance degradation. Despite this potential drawback, transient fresh fixtures offer the best avenue for keeping fixture logic clear and simple and thus, resulting in tests as documentation. The benefits of not having to deal with tear down logic simply cannot be overstated.

Persistent Fresh Fixture
A persistent fresh fixture lives beyond the test method in which it is used. It requires explicit tear down at the end of each test. We often wind up using this fixture when we are testing objects that are tightly coupled to a database. Let's revisit our parser example. Suppose We need to add a test that verifies that the parser can handle consuming documents from the file system. For the fixture set up, the test creates an XML document and then writes it to disk so that we can exercise our parser for this scenario. So far, our test is pretty similar to one that is using a transient fresh fixture. The difference however, reveals itself with tear down. The test using a transient fresh fixture does not have to worry about doing any tear down - it is implicit. Our test on the other hand, must explicitly tear down the fixture. We could implement this easily enough by deleting the document from the file system. It is worth mentioning that this is a pretty straightforward example of tearing downing a persistent fresh fixture. Things can quickly get more complicated, particularly when dealing with database. In these situations, we can easily wind up with obscure tests. Another test smell that is often encountered with persistent fresh fixtures is slow tests. This usually occurs as a reuls of the fixture having a high-latency dependency. For example, if we have to create our XML document on a remote file system over the network, we will likely experience high latency. High latency is commonly encountered when a database is involved.

Shared Fixture
A shared fixture is deliberately reused across different tests. Let's say that our parser has special requirements for handling very large documents. A shared fixture may seem like a logical approach in this situation. The advantage is improved execution time of tests since we cut out a lot of the set up and tear down work required. The primary disadvantage of this strategy is that is easily leads to interacting tests. Interacting tests is an anti-pattern in which there is an inter-dependency among tests. Let's suppose that are parser needs to support both reading from and writing to XML documents. We could very quickly wind up with interacting tests. One test modifies the document while another test reads the document. If the document is expected to be in a particular state, then the latter test could easily break as a result of the former test (which modifies the document) running first. When using a shared fixture, a couple questions should be considered:

To what extent should the fixture be shared?
How often do we rebuild the fixture?

Should we reuse our XML document across multiple test cases? Across the entire test suite? In general, we want to minimize the extent to which we share our fixture. As for how often we should rebuild the fixture, that may depend on a number of factors. In the case of an immutable fixture, we might very well be able altogether forgo rebuilding the fixture. Let's revisit a scenario in which we need to test both read and write operations for our parser. If we can guarantee the order of tests, then we can arrange for all of the read-only tests to run in sequence. Then for those tests we do not have to worry about rebuilding the fixture in between runs.

Conclusion
In most circumstances, a transient fresh fixture is the best strategy because it simply does not have to deal with the challenges presented by the other fixture strategies, namely fixture tear down. There are times when it is all but impossible to avoid using either a persistent fresh fixture or a shared fixture. Data access tests involving a database is the most prevalent example. Understanding the ramifications of the other fixture strategies is crucial to writing effective tests when they must be used; otherwise, we inevitably fall victim to the anti-patterns presented by Meszaros. Just as an understanding of the more mainstream patterns like the widely embraced GoF pattrerns leads to better designed software, an understanding of the sundry testing patterns leads to more effective tests, which in turn ultimately leads to better software.

What is and what is not a unit test is a hotly debated subject. At one end of the spectrum you have people who argue that a unit test replaces all depended-on objects with mock or fake objects so that the system under test (SUT) is tested in complete isolation. At the other end of the spectrum you have people who contend that anything written with an xUnit framework like JUnit is a unit test. And then we have everything else that falls in between the two ends of that spectrum. Rather than trying to arrive at a universally accepted definition of a unit test, I think that it may be more productive to talk about what we expect from a unit test. If we can agree upon a set of goals that we aim to achieve through the practice of unit testing, then we do not need to concern ourselves with whether or not the test that we are writing is a true unit test. Instead we can instead focus on using automated testing to facilitate the development of our software.

Rapid Feedback
A unit test should provide immediate feedback. Unit tests need to execute quickly since we (hopefully) run them over and over during development. While we work on a particular piece of code, we may choose to run a subset of the tests. We might do this through the IDE test runner. And then prior to committing code, we run the full test suite. The tests for the code we are currently working on need to be fast since we are running those tests frequently as we are writing the code. Since we want commits to be small and frequent, the full test suite needs to be fast as well if we expect it to be run prior to commits. Not only should the test suite run fast for a developer build, but it should also run fast during integration builds so that we receive timely feedback during integration.

Defect Localization
When a test fails, we should know exactly what part of the SUT caused the test to fail. There are a couple of things that can be done to help promote effective defect localization. Only one condition should be verified per test. To that end there is a school of thought in which a test should only contain a single assert statement; however, the most important thing is that the test is not overly aggressive in trying to check multiple conditions. There are times when testing a condition may require multiple assert statements. Whether you choose to break out each assert into a separate test or you choose to keep all of the asserts for that condition in the same test is really a matter of preference.

The second thing that comes into play for adequate defect localization is how well you isolate the SUT. Even if we only verify a single condition or even if we only use one assert per test, there may be times when it is not immediately obvious what part of the SUT caused the test to fail. This is often a direct result of not sufficiently isolating the SUT. There are a plenty of articles, papers, and books that discuss strategies and techniques for isolating the SUT. Some tools like mock object libraries allow you to complete isolate an object by mocking all of its neighboring objects. Here is a good rule of thumb to start with for determining on an appropriate level of isolation - the cause of the test failure can be determined without having to rely on a debugger or additional logging statements.

Executable Documentation
Tests can and should serve as documentation. They can provide a living, executable specification. Tests demonstrate how an object is expected to be used, what conditions must be satisfied for invoking a method on the object, and what kind of output to expect from that object. Because they can be such a powerful form of documentation, tests should be written in a clear, self-documenting style. Intuitive variable and method names should be used to make obvious what is being tested. Test method names should be intent-revealing. testUpdateTicket() for example does not reveal intent nearly as well as testUpdateTicketShouldAddComment().

Avoid putting complex set-up or verification code in test methods as it may obscure the intent of the tests. Instead, complex logic should be relegated to test utility methods and objects. This has a few benefits. First and most importantly, it prevents the test from getting littered with complex logic, thereby making it easier for the reader to see what the test is doing. Secondly, putting the complex logic in a test utility library makes it accessible and easy to reuse in other tests. Lastly, we can put our test utility objects in their own test harness to ensure that they have been implemented properly.

Regression Safeguard
The primary goal of testing in general is to validate that our software behaves as expected under prescribed conditions. Having a set of automated tests thast we can continually run against our software provides a great safety net for catching regressions that are introduced into our code. While unit testing alone is typically not sufficient for validating our code, it provides an excellent first level of defense that should be capable of catching most errors.

Refactoring
Unit tests should enable us to be aggressive with refactoring. Refactoring is the practice of changing the implementation of code while preserving its behavior. Our unit tests should give us the confidence that refactoring will not alter the intended behavior of our code, at least not unexpectedly. If the tests do not instill that confidence, then we need to consider whether or not the tests are reliable, thorough and effective enough. Code coverage tools can help provide some measure of the effectiveness of a test suite, although a coverage tool alone should not be be used to determine the quality and effectiveness of a test suite.

Repeatable and Reliable
What exactly does it mean for a test to be repeatable and reliable? Suppose we run a test and it passes. Then we run it again without making any changes, but this time the test fails. This would be an example of a test that is not repeatable. It could also be a strong indication that the test is using a persistent fixture that is outliving the test. With a persistent fixture, we need to be especially careful about cleaning up before/after the test so that the fixture is in a consistent state for each test run. Now consider a test that starts failing as a result of changes being made to code other than the SUT. This would be an example of an unreliable test. In these situations we need to ensure that we properly isolate the SUT so that external changes do not affect our tests.

Easy and Fast to Implement
Unit tests should be relatively easy to implement without adding a significant amount time and overhead to the overall development effort. Code that is particularly difficult to get under test may indicate a larger design issue. We should take the opportunity to look for potential design problems. And maybe the easiest, most effective way to ensure that we design for testability is to write our tests first.

As with any other software, it is imperative to refactor our test code. Using test utility methods and libraries as previously discussed will significantly reduce the amount of code that we have to write for tests as well as make tests more reliable since our test utility code can have its own test harness. Testing a single condition per test method will result in smaller tests as well. This will in turn lead to a faster turn around time with going back and forth between the main code and test code.

Conclusion
These are sound, reasonable things to expect from unit tests; however, the exact expectations may vary from team to team. For example, some teams may prefer to make extensive use of mock objects and libraries like jMock, while other teams may prefer not to use mock objects at all. The most important things are that the goals are clearly stated and agreed upon within the team and that the tests aide rather than inhibit development efforts.

John Sanda's blog

Friday, March 28, 2008

Test Fixture Strategies

Post a Comment

Wednesday, February 13, 2008

What to Expect from a Unit Test

Post a Comment