I've been evaluating the JUnit test generation product by Agitar in my spare time and want to spit out a few thoughts on it. Most of this will be criticism, but I'd like to first say that I think it has the potential to be an absolutely great product. If you don't know much about Agitar, they have a product that will automatically generate JUnit tests for your code. You can check out
http://www.agitar.com/ and
http://www.junitfactory.com for more information.
OK onto the important content.
Many of the tests created were lacking meaning. This is OK however, because you can provide test helper classes that are read at test creation time to help create more meaningful tests. This is pretty easy during regular development, and it makes sense. The program can't know everything about your code. Unfortunately though, for large legacy code bases this is damn near impossible.
For example, lets say you have a code base with 5000 classes. 5000 classes means 5000 test classes. Lets give Agitar the benefit of doubt (there is little doubt however since I've done it many times, but lets give it to them anyway) and say that 80% of the classes are meaningful (Once again, the number is smaller in practice.) 80% meaningful coverage means 1000 test classes that aren't useful. This is a problem because:
- Its likely to be your most important, or complex classes who's tests are lacking meaning.
- How do you really know which ones are lacking meaning? Do you have to look through 5000 test classes?
- Do you have to write test helpers for every test class?
- What happens if you have no idea what your own classes are doing? (Think this is crazy? Its not. What if you inherit the code? What if you wrote the class four years ago? What if your team is large and people just hack things together?) How could you even write test helpers for this?
- What if writing test helpers requires instantiating dependencies that are difficult to instantiate, which is why you haven't bothered writing tests to begin with?
You see, this is quickly becoming an overhead nightmare. This is especially true for a team who has a large untested code base. They have an untested code base because they don't know how to test. The only way to get them to start testing is with very low overhead. They don't want to have to wade through thousands of new classes.
Agitar is going to say to this, "But you have 80% test coverage at this point, which is far superior than what you had before". They might even say don't bother looking through the test code (I'm not sure if they'd really say this, but its possible). And maybe this works. Or maybe it only works for suckers. I'm not sure.
My feeling is that the product is simply not ready for legacy code bases. And in fact, by the nature of these problems, I'm not quite sure a product could EVER solve them. When I think back to when I first heard about the product, that was indeed my first thought. But, after seeing some of the videos I was very optimistic, especially with Kent Beck involved. I approached it with an open mind and was very hopeful. I'm not sure now.
The generated tests were difficult to read
I really should provide some examples here. What I'll likely do is finish this blog, get my examples at a later date and post them in a comment. So don't trash me because I haven't given examples. I don't have them on me.
Some of the code was pretty hard to read. Some of it was nice and easy to read. That is OK. It's what I expected and its livable on new projects. If a generated test is hard to read, its likely that your code isn't as clean as it needs to be.
The problem with this is pretty clear though. Its likely that
most of the tests generated for large legacy code bases will be difficult to read. This isn't the fault of Agitar, Garbage in Garbage out. But, it just adds fuel to the maintenance fire. Not only do developers have to wade through garbage code, but now they have to decipher difficult to read test code that they've never seen before, just to make sure its a valid test. They might be better off just writing a test themselves, except that they might not know how.
Brief Use Case Example
I'm going to give a simple little example that demonstrates how test generation might be used, and some problems with it. Lets say a developer changes some code somewhere in a difficult to read class. He then runs the Agitar unit tests and some of them fail. Should he be worried? Should he look into it and fix the problem, or should he ignore it and regenerate the Agitar tests? If he does look into it then he might have to wade through hard to read code. If he doesn't then he might be ignoring a potential problem. Is it possible that he breaks something in an area that he doesn't even seem to be working in? That could be frustrating. Legacy code is frustrating to work with. Period.
There is an approach in line with Fowler's Refactoring that does help a bit though. Only read the tests in the area that you are working in. You might have to read through a couple hard to read test classes ... but its only a couple. Additionally, it'll give you some examples and incentive to clean up the code in that area.
If hand written tests are failing in any areas, fix them. If Agitar tests are failing in your current area, read them and try to figure out why. You might have a legitimate bug. After you've looked at the generated tests in your area and fixed up your code go ahead and regenerate all the tests.
QuestionsTheir website says something like "Reduce the drag from fragile Java code by 50% or more." What does that mean really? Does that tie into what I was saying before about the 80% meaningful tests? Does it mean you'll get your work done 50% faster? What is code Drag? Yes, yes, its just a marketing slogan, and I couldn't do much better, but what does it mean?
It also says 80% test coverage guaranteed. This is a bold statement IMO. I'm very curious to see what kind of awful code bases they've worked with, and what coverage they got. What about on something like a Hideous EJB2 project that can only be tested in the container?
What about new projects?
Once again I want to reinforce that this criticism is designed as constructive criticism in order to make the product better. I'm not trying to bash the product at all. I do think the product could work Great with new TDD projects. You write some tests, write some code to make your tests pass, use AgitarOne to generate some extra tests to get some thoughts about your code, refactor, and repeat. I think its a great supplement to the faulty human mind who can't see everything. The problems that exist with legacy code simply don't exist with green code. Of course you have to write test helpers, but its easy when you have a nice clean codebase and you're writing new code.
I would recommend this product to NYSE for new developement, but we don't have much new development. I'd certainly try to use it for any open source project that I'm working on.
Product IdeasThere are a number of small issues I have in the area of new development too.
- Why not use JUnit 4?
- Why not TestNG?
There are also a number of ideas that I have for the product:
- How about annotations in the code that provide useful hints to the test generator?
- How about annotations to tell the test generator not to generate test for a class or a method?
- How about tying in or influencing JSR 305? This JSR is working to define annotations for Bugs, and is primarily being run by FindBugs and JetBrains. Agitar should certainly get involved. For example, @Nullable or whatever name they give a method that might return null. Little tips like this could certainly assist in test generation.
DebateI'd really like to talk to the Agitar guys some more on this subject, and Kent Beck himself.
I openly invite anyone from Agitar to comment on this blog demonstrating how I'm wrong. I want to be wrong here. I want to have 80% meaningful test coverage on bad code. That would make my life so much easier. Please explain how this doesn't turn into an overhead nightmare when working with large legacy code bases. Please tell me how this helps get developers who've unfortunately never written tests on board.