Thursday, December 20, 2012

Testing More by Testing Less


Introduction


Recently I have been thinking about unittests an their utility. I've had the opportunity to try a different approach, which I want to talk about in this post.

We are all familiar with unittesting by now. For most of us, that means that for each class we write, a corresponding test is produced, preferably beforehand. However, I find this method brings a drawback: if you refactor that code, you must refactor your unittests as well. But as we all know, refactoring is good and should be encouraged.

Refactoring is needed because getting the design of the code right the first time is notoriously difficult. Sometimes this is due to poor judgement or time constraints. But the fact is that in practice it's hard to solve complex problems and have good design at the same time. And by 'good' I mean that the various classes and their assigned responsibilities are such that the code is easily understood and easily extensible.

The latter is particularly hard to achieve because, while we often think otherwise, we usually don't know what changes will be brought upon our application. That means spending more time refining the design will not help us, what would help is finding ways to change the code quickly and confidently when the need actually arises.

I feel this is not being addressed by the unitest-per-class method, because these kind of changes typically break the original APIs and/or introduce new ones in the form of new classes. That means your unittests are hit as well so they need to be updated, slowing you down and introducing more scope for errors. You are also less confident in the modified unittests than you would be if they could be reused unaltered.

That creates a barrier to refactoring code. Not only is there more code to refactor, but also you cannot rely on your unittests to protect you from errors. Which is ironic really, because this is a big reason why people write them in the first place. Meanwhile the bad code lingers, and when it must be touched because of defects, you would rather only fix that and get out as soon as possible. And each time you do that, the bad code grows. Wouldn't this be easier if the refactoring wasn't so risky or required such effort? 

There is another disadvantage: waste. Writing tests for all classes in your codebase is, in my opinion, not a good way to use resources. Because unittests are a lot like boilerplate: they don't really add any real value. They are only there to help us developers do our job, but they do not fulfil any business needs. In other words it is an activity which destracts us from achieving business value, and should therefore be limited.

Testing more by testing less

I propose that to effectively address these issues instead of writing a unittest per class, we test multiple classes collectively. This would give freedom in refactoring them, because your testcode would not use all classes directly, only a few necessary to input testdata and collect output.

I have pictured this below. The squares are classes, and the arrows denote the direction of inter class dependencies. The blue squares inside the dashed lines are the classes under test, or 'the system'. Squares outside the dashed lines are left out of the test, they effectively define the test boundary. 


The yellow classes need to be mocked, so that the tested classes have someone to talk to at runtime and won't crash. But also so that we can provide test data, and capture data sent from the system in order to assert their correctness and by extension the correctness of the system itself. There are many frameworks designed to help with these tasks, which we can reuse.

This is all quite similar to ordinary unittesting. You could even pretend that the classes being tested are just one class. Actually, we can do more than just pretend: we can introduce an intermediate class, which we use in writing tests instead of the real classes. I call this a TestProxy. It should be responsible for wiring the objects under test to the mock objects, and provide easy access methods for invoking real functionality on the objects under test, on a coarse grained level.

Let's consider a simple example based on the Model-View-Presenter (MVP) pattern. In this example the Presenter uses a a BusinessService for accessing business functions, this service in turn   accesses a resource service that fetches data from a remote location, like a webservice.

The View and the Resource class are good candidates for keeping out of the test: the Resource class needs a webservice running to work properly which makes testing cumbersome. And the View is usually not programmatically testable, often the reason for using the MVP pattern.

The picture below shows the classes involved in the test:




Note that I have drawn mocks for the View and Resource. These mocks need to be instantiated and wired to the actual Presenter and BusinessService, which happens in the TestProxy. It also makes these mockservices accessible to the TesterClass, because that is where the actual test logic is. The TesterClass selects mockdata, invokes functions through the TestProxy and does assertions.

That means that instead of talking to the Presenter or BusinessService in your tests, you only talk to mocks. This totally insulates the testlogic from the actual object hierarchy making this logic reusable should the hierarchy change. You should only have to change your test logic if there is some change in the mocks, which would only happen if the data shown on the view or fetched from the webservice changed.

Another advantage is that the TestProxy can provide an API that makes testlogic more readable and compact. We can achieve this in this particular case by mirroring the actions a user does in our API, because we mock views that interact with users directly. For instance, this is what some test pseudo code might look like:


function TestDeleteEntry(){
 
    staringEntriesList = { ... }
    endEntriesList = { ..... }
 
    startOverviewData = { ... }
    endOverviewData = { ... }

 
    TestProxy.ResourceMock.setEntries( startingEntriesList ); 

    TestProxy.start();
    TestProxy.MainMenuMock.assertHasFocus();

    TestProxy.MainMenuMock.chooseOverview();
    TestProxy.OverviewMock.assertHasFocus();
    TestProxy.OverviewMock.assertData( startOverviewData );

    TestProxy.OverviewMock.selectEntry(2);
    TestProxy.DetailViewMock.assertHasFocus();
    TestProxy.DetailViewMock.assertData( startingEntriesList.get(2) );

    TestProxy.DetailViewMock.chooseDelete();
    TestProxy.OverviewMock.assertHasFocus();
    TestProxy.OverviewMock.assertData( endOverviewData );


    TestProxy ResourceMock.assertEntries( endEntriesList );     

}

The exact API doesn't matter, and may well depend on the type of mocking libraries at your disposal. What matters is that the test logic does not depend on the internal wiring of the classes being actually tested. So maybe at first you implemented this using three views but the same presenter. Later you switch to three presenters. Than you decide to refactor the BusinessService, or the Model. Your test remains the same, and can therefore be trusted to catch any errors.

What matters also is that methods like chooseOverview and selectEntry relate directly to user actions. This makes it easy to write tests based on functional specifications, and working with testers in order to validate and design good tests.

We could further tweak this API, for instance turn the groupings of actions and assertions into methods themselves. But more importantly is reuse on another level. Suppose instead of deleting an entry, we would like to edit it? In order to get to the edit page, we would need to repeat some steps, may reuse some mock data. By creating methods which accomplish a reusable part of a flow, we can avoid any copy-and-paste. We can likewise structure our mock data into extensible sets.

Conclusion

I have argued that conventional unittesting methods have some flaws, and described a method for testing groups of classes instead of individual classes in order to address them. While there are some areas to work on, particularly reuse, after actually trying this method certain advantages are already clear to me.

As we have seen in the example, this method makes it possible to write tests on a level much closer to functionality observable to users, business level if you will. You can then work more closely with testers. Besides saving time in thinking of test data and test cases, this should also result in a higher quality of tests. This also means recreating defects found by testers is easier.

You can confidently refactor code with extensive freedom and more speed. Freedom and speed are good, but the confidence is also important. The tests are untouched, so they can be trusted to protect you in case of regression. If you had modified the original tests, you would have to verify that they work first.

I have also found that it gets relatively easier to reach a higher code coverage. This is due to the fact that a given input can trigger a cascade of objects calling each other, instead of you manually writing tests to call methods. But it is also because thinks like constructurs, getters/setters are tested automatically, and you would normally not test those.

This seams somewhat distorting and I besides the point. And to met testing constructors and getters/setters does not add much value. But I can think of one case where it might be usefull: dynamic languages. With dynamic languages you can't rely on the compiler to check for typing errors, so sloppy programming could easily cause a bug. The way to protect yourself is through a high unittest code coverage, where more of the code is exercised and such errors are exposed in the unitest.

The threshold to start testing does get higher, however, because you need to write more code in order to 'get testing' since we need to abstract the class hierarchy. While I think in the long run you need less code to achieve the same coverage, this does require some extra discipline. And as we all know this is often not easy given tight schedules and limited resources.

Finally, I think this method works best when you already have a modular architecture. That creates natural islands of related classes, and introduces loose coupling with other such islands. That means the amount of mocking gets minimized in relation to code tested, and you can truly 'test more by testing less'. You should ideally mock views, or a webservice call: these are more or less stable boundaries that won't change if the internal design of the code is altered.

I believe the testapi is not quite ready yet and warrants some more thought, in order to fully exploit the similarities with UC scenarios and promote reuse. Separating the test logic from the test data enables you to play different scenarios by changing data sets, this seems like a good strategy to increase test productivity so it's worth exploring further also.