Saturday, May 17, 2008

When To Call a Constructor Part 1

I've it said before (and people a lot smarter than me like Gilad Bracha), and I'll say it again: Constructors are Evil. Unfortunately, in most common situations, they are impossible to avoid. Rather than beating a dead horse, I'm going to take a slightly different approach. In this post I'll focus on when it is ok to call a constructor, and how to do so effectively. This information can be applied to any number of OO languages.

(Note: It is of course sometimes possible to get away from calling constructors by using a DIF like Guice. Sometimes its just not possible. For example, you have a legacy code base that you are maintaining/extending. It may be possible to switch it over to a DIF, but its unlikely to happen all at once and you probably don't want to end up with a code base that is somewhere halfway between. That can make code even more difficult to reason about. Regardless, even if you are using a DIF you still want to call new sometimes, which I'll explain later.)

I'll start with a simple example.

public class PersonCache {

private final Map<Name, Person> storage;

public PersonCache(){
storage = new HashMap<Name, Person>();
}

public void addPerson(Person p){
storage.put(p.getName(), p);
}

public Person getPerson(Name name) {
return storage.get(name);
}

public void removePerson(Person p){
storage.remove(p.getName());
}
}

This class looks reasonable, and in fact it is. But as you'll see later, simple classes like this lull developers into a false sense of security with the "new" statement. Here, calling new on HashMap is ok because:

  1. It is a trusted/tested class
  2. It has no significant dependencies
    1. It doesn't reference any static state
    2. It doesn't do any IO

Theres a bit of a theme here. If you are going to call a constructor, you need to have a solid understanding of the class you're instantiating. Because it's so well documented, we know HashMap is safe to instantiate. Unfortunately, most code isn't so well documented. Most developers don't understand each class they instantiate.

So why isn't it ok to instantiate an unknown, untrusted class, or a class with dependencies?

To answer that lets first briefly look at the design forces. There are at least four (and probably more) design forces at play here.
  • Encapsulation
  • Readability
  • Static Dependencies
  • Testability
The forces certainly push and pull on each other a bit. As encapsulation goes up so do readability and static dependencies, while and testability goes down. While we can't always have the best of each, it's important to understand when to choose one over the other.

Consider the following example.

public class PersonCacheWithDatabase {

private final Database storage;

public PersonCacheWithDatabase(){
storage = new Database();
}

public void addPerson(Person p){
storage.put(p.getName(), p);
}

public Person getPerson(Name name) {
return storage.get(name);
}

public void removePerson(Person p){
storage.remove(p.getName());
}
}

This certainly doesn't look a whole lot different than the first example. But, its infinitely worse. Doubly do because it cleverly tricks you into thinking that its ok by looking so similar.

First, DatabasePersonCache may appear to be encapsulated, but in reality its not. It forces you to know about the database whether you like it or not. If you're going to call this constructor, you had better have a database set up somewhere. Otherwise, try to use it and you're going to get exceptions left and right. For the exact same reason, its difficult to read and test this code. Reading it alone is simply not enough. You need to understand the database class as well. Additionally, this class is forever statically bound to the Database class. If you somehow want to store your people in a more convenient way, well, you just can't. If you want to test a class that uses PersonCache, good luck.

There is a way around this - pass in a StorageStrategy into PersonCache.

public interface StorageStrategy {

public void put(Name name, Person p);

public Person get(Name name);

public void remove(Name name);
}


public class PersonCacheWithStorageStrategy {

private final StorageStrategy storage;

public PersonCacheWithStorageStrategy(StorageStrategy storage){
this.storage = storage;
}

public void addPerson(Person p){
storage.put(p.getName(), p);
}

public Person getPerson(Name name) {
return storage.get(name);
}

public void removePerson(Person p){
storage.remove(p.getName());
}
}

PersonCacheWithStorageStrategy is much better than the PersonCacheWithDatabase. As long as StorageStrategy is an interface, PersonCache is now nice and reusable, it can be used with a HashMap, a Database, anything. It isn't statically bound to any implementation. It's definitely more readable as you are safe assume that the StorageStrategy passed in works fine. It's far more testable on the whole - you don't have to set up a database to test it.

However, even though its better than the second example, it does suffer problems that the original PersonCacheWithHashMap does not suffer - poor encapsulation. You have to know something about StorageStrategy in order to use it. What if you only ever want to use this as a quick in-memory helper object? The first example would be far better. What if you only ever needed the in-memory storage capability while using PersonCache? A client is still forced to create a StorageStrategy. Ick.

This is the point where it would be really nice to have a forth example and say this is how to do it. Unfortunately there isn't one magic scenario that solves every problem. Developers need to understand the design forces and the code objects they are clients of in order to make reasonable decisions about their code. If its not entirely safe to call new, based on the rules above, then you must pass your dependencies in. You trade some encapsulation for another design force: sanity.

Now, you may be thinking, well great, in the example 3 you just deflected the problem of calling new upward, but that doesn't help me much, since I still have to call new in the clients of PersonCache. You would be correct. I haven't addressed that issue just yet. But for that you'll have to stay tuned for part two of this mini series.

My First Scala Presentation

I gave my first talk on Scala today, to my team at NYSE. It was an entirely informal, BYOL (Bring your own lunch) talk that I hadn't prepared for at all (I was hoping someone else would speak, but since no one else ever does, its always me, prepared or not). Anyway, there are some lessons learned from the talk.

The talk didn't really go over that well, and mostly because I didn't hit them hard with a great example up front. Next time I will. I finally won them over when I showed a List example, which I'll show here.

Say you want to create a List of integers containing the elements 1, 2, and 3. In Java there are a few ways to go about it, none of them very easy. I'll start with the most common example.


List<Integer> ints = new ArrayList<Integer>();
ints.add(1);
ints.add(2);
ints.add(3);

Like I said, there may be easier ways to do this, but I don't think many people will argue that this would be by far the most commonly used approach. There are several things wrong with it.

  • It's about a billion characters long.
  • The redundant type information in the first line is so frustrating.
  • The next three lines of code are almost identical.
  • The semi colons are pretty much pointless.

Here's how you do the same thing in Scala.

val ints = List(1,2,3)

Thats it.

  • Its 20 characters total (not including the unneeded spaces). 20 characters vs. A Billion! I pick 20.
  • There is no need whatsoever to put an absurd amount of type information. The compiler is perfectly capable of figuring that out thank you. As are humans; any second year college student could tell that thats a list of integers. Heck, any 7 year old could too.
  • There is absolutely no redundant code here.
  • There are no semi colons.
This example has probably been posted on the internet about a million times by now, and its not the point of this post. The point is this: If you want to give a talk on a language, hit the audience hard with a solid example immediately. Don't dilly-dally and give examples that are only slightly different than their current language and then give them the good stuff. You'll meet too much opposition up front. I thought I was doing them a favor by easing them into Scala but what really happened was quite the opposite. For some terrible reason Java developers are quite territorial. I was providing fuel for them to say, "I'll stick with Java."

Next time, I give the good example up front, then transition to the easy stuff once I've peaked their interest, and then make sure to finish up with a solid example too. And of course next time I'll be quite a bit more prepared.


Ok. Thats the gist of what I was wanting to talk about, now some sideline commentary.

One particularly odd complaint IMO was that most of this was just syntactic sugar. First off, I whole-heartedly disagree, but I can see why some people could incorrectly think that way. My colleague happily responded:

If you think it's just syntactic sugar, then I have a perfect language for you. It only contains 2 characters, 1, and 0. Using anything else, well thats just syntactic sugar.
Of course anyone can think that higher level languages are just prettier syntax, but they would be entirely missing the point. The point is to not have our primitive human minds bombarded with useless information so that we can better and more easily understand the meaning of each line of code. Thats not sugar, thats evolution, baby.