Jack Cough on Software: When To Call a Constructor Part 1

I've it said before (and people a lot smarter than me like Gilad Bracha), and I'll say it again: Constructors are Evil. Unfortunately, in most common situations, they are impossible to avoid. Rather than beating a dead horse, I'm going to take a slightly different approach. In this post I'll focus on when it is ok to call a constructor, and how to do so effectively. This information can be applied to any number of OO languages.

(Note: It is of course sometimes possible to get away from calling constructors by using a DIF like Guice. Sometimes its just not possible. For example, you have a legacy code base that you are maintaining/extending. It may be possible to switch it over to a DIF, but its unlikely to happen all at once and you probably don't want to end up with a code base that is somewhere halfway between. That can make code even more difficult to reason about. Regardless, even if you are using a DIF you still want to call new sometimes, which I'll explain later.)

I'll start with a simple example.


public class PersonCache {

  private final Map<Name, Person> storage;
  
  public PersonCache(){
    storage = new HashMap<Name, Person>();
  }
  
  public void addPerson(Person p){
    storage.put(p.getName(), p);
  }

  public Person getPerson(Name name) {
    return storage.get(name);
  }
  
  public void removePerson(Person p){
    storage.remove(p.getName());
  }
}

This class looks reasonable, and in fact it is. But as you'll see later, simple classes like this lull developers into a false sense of security with the "new" statement. Here, calling new on HashMap is ok because:

It is a trusted/tested class
It has no significant dependencies

It doesn't reference any static state
It doesn't do any IO

Theres a bit of a theme here. If you are going to call a constructor, you need to have a solid understanding of the class you're instantiating. Because it's so well documented, we know HashMap is safe to instantiate. Unfortunately, most code isn't so well documented. Most developers don't understand each class they instantiate.

So why isn't it ok to instantiate an unknown, untrusted class, or a class with dependencies?

To answer that lets first briefly look at the design forces. There are at least four (and probably more) design forces at play here.

Encapsulation
Readability
Static Dependencies
Testability

The forces certainly push and pull on each other a bit. As encapsulation goes up so do readability and static dependencies, while and testability goes down. While we can't always have the best of each, it's important to understand when to choose one over the other.

Consider the following example.


public class PersonCacheWithDatabase {

  private final Database storage;
  
  public PersonCacheWithDatabase(){
    storage = new Database();
  }
  
  public void addPerson(Person p){
    storage.put(p.getName(), p);
  }

  public Person getPerson(Name name) {
    return storage.get(name);
  }
  
  public void removePerson(Person p){
    storage.remove(p.getName());
  }  
}

This certainly doesn't look a whole lot different than the first example. But, its infinitely worse. Doubly do because it cleverly tricks you into thinking that its ok by looking so similar.

First, DatabasePersonCache may appear to be encapsulated, but in reality its not. It forces you to know about the database whether you like it or not. If you're going to call this constructor, you had better have a database set up somewhere. Otherwise, try to use it and you're going to get exceptions left and right. For the exact same reason, its difficult to read and test this code. Reading it alone is simply not enough. You need to understand the database class as well. Additionally, this class is forever statically bound to the Database class. If you somehow want to store your people in a more convenient way, well, you just can't. If you want to test a class that uses PersonCache, good luck.

There is a way around this - pass in a StorageStrategy into PersonCache.


public interface StorageStrategy {

  public void put(Name name, Person p);
 
  public Person get(Name name);
 
  public void remove(Name name);
}


public class PersonCacheWithStorageStrategy {

  private final StorageStrategy storage;
  
  public PersonCacheWithStorageStrategy(StorageStrategy storage){
    this.storage = storage;
  }
  
  public void addPerson(Person p){
    storage.put(p.getName(), p);
  }

  public Person getPerson(Name name) {
    return storage.get(name);
  }
  
  public void removePerson(Person p){
    storage.remove(p.getName());
  }
}

PersonCacheWithStorageStrategy is much better than the PersonCacheWithDatabase. As long as StorageStrategy is an interface, PersonCache is now nice and reusable, it can be used with a HashMap, a Database, anything. It isn't statically bound to any implementation. It's definitely more readable as you are safe assume that the StorageStrategy passed in works fine. It's far more testable on the whole - you don't have to set up a database to test it.

However, even though its better than the second example, it does suffer problems that the original PersonCacheWithHashMap does not suffer - poor encapsulation. You have to know something about StorageStrategy in order to use it. What if you only ever want to use this as a quick in-memory helper object? The first example would be far better. What if you only ever needed the in-memory storage capability while using PersonCache? A client is still forced to create a StorageStrategy. Ick.

This is the point where it would be really nice to have a forth example and say this is how to do it. Unfortunately there isn't one magic scenario that solves every problem. Developers need to understand the design forces and the code objects they are clients of in order to make reasonable decisions about their code. If its not entirely safe to call new, based on the rules above, then you must pass your dependencies in. You trade some encapsulation for another design force: sanity.

Now, you may be thinking, well great, in the example 3 you just deflected the problem of calling new upward, but that doesn't help me much, since I still have to call new in the clients of PersonCache. You would be correct. I haven't addressed that issue just yet. But for that you'll have to stay tuned for part two of this mini series.

Jack Cough on Software

Saturday, May 17, 2008

When To Call a Constructor Part 1

No comments:

Post a Comment