Tuesday, December 14, 2010

Code Styles and Readability

Today at work we went on a quest to make a piece of code as readable as possible. I'm not fully convinced that we've succeeded entirely, but we learned a few things along the way.

The problem we're trying to solve is pretty simple. Given an activation record which represents the last call on the call stack, generate a List of Strings to be used in building the stack trace. This will require going all the way up to the first call on the stack by following parent pointers, grabbing the name of each procedure.

Before we begin, we need to understand a few things about the system.
One could imagine these simple classes:

case class Procedure_V1(name: String)
case class Activation_V1(procedure: Procedure, parent: Activation)

If that's all there was, there probably wouldn't be much to write about. I'll give a small bit more detail. In our world, a Procedure holds a List of Commands that it executes one by one until it is done. Any Command in this list could be a primitive, or a call to another procedure. If the Command is a primitive, it can just be executed on the spot. One example of a primitive is +. If the command is a call to another procedure then upon execution of this Command, a new Activation gets created. So the objects now look like this:

case class Command_V2(name:String)
case class Procedure_V2(name: String, commands: List[Command])
case class Activation_V2(procedure: Procedure, parent: Activation)

There's one minor catch that makes this whole thing interesting: any particular Activation might need to put two different entries into the stack trace. 1) Its own procedures name, and possibly 2) The name of its parent's current command. I wish I had time and space enough to explain why, but that's simply out of scope. For now, I'll just explain some simple rules.

A command should show up in the stack trace only if it calls other code. Activations have a return address which an index into it's parents commands List. It points to currently running command in the parent Activation. These rules allows us to write the final versions of the classes:

case class Command(name:String, callsOtherCode: Boolean)
case class Procedure(name: String, commands: List[Command])
case class Activation(procedure: Procedure, parent: Activation, returnAddress:Int)

Lets also say that these classes are sealed (or that we can't modify them). With these, we can take a stab at the original problem.

def callStack_V1(act: Activation): List[String] = {
val parentCommandMaybe: Option[String] = Option(act.parent).flatMap(p => {
val c = p.procedure.commands(act.returnAddress)
if(c.callsOtherCode) Some(c.name) else None
})
List(Some(act.procedure.name), parentCommandMaybe).flatten :::
callStack_V1(act.parent)
}

This was the first version I wrote. I liked it then, I still like it now. But, I wanted to see if I could make it better. First, lets see if we can clean up that last line. The list creation is a little confusing.

def callStack_V2(act: Activation): List[String] = {
val parentCommandMaybe: Option[String] = Option(act.parent).flatMap(p => {
val c = p.procedure.commands(act.returnAddress)
if(c.callsOtherCode) Some(c.name) else None
})
act.procedure.name :: parentCommandMaybe.toList ::: callStack_V1(act.parent)
}

Only slightly better. Maybe if we remove the recursion things will be nicer.

def callStack_V3(act: Activation): List[String] = {
Iterator.iterate(act)(_.parent).takeWhile(_ != null).toList.flatMap { a =>
val proc = act.procedure.name
val parentCommandMaybe: Option[String] = Option(a.parent).flatMap(p => {
val c = p.procedure.commands(a.returnAddress)
if (c.callsOtherCode) Some(c.name) else None
})
act.procedure.name :: parentCommandMaybe.toList
}
}

That first bit before the first flatMap turns the stack into a List[Activation]. While that idea is simple enough, the code to do it really isn't. And, the remainder of the code is really hard to follow too. I still manage to struggle to fully understand a single flatMap sometimes, let alone two. Sure the type annotations help explain what is happening, but I still felt I could do better. Let's keep the same idea, but refactor a bit and see if that helps.

def unfold(act: Activation) =
Iterator.iterate(act)(_.parent).takeWhile(_ != null).toList

def callStack_V4(act: Activation): List[String] = {
unfold(act).flatMap { a =>
val parentCommandMaybe: Option[String] = Option(a.parent).flatMap(p => {
val c = p.procedure.commands(a.returnAddress)
if (c.callsOtherCode) Some(c.name) else None
})
act.procedure.name :: parentCommandMaybe.toList
}
}

Maybe thats ok, it gets some of that clutter out of the way and let's us think. But lets take a complete step back and try an imperative version. All the flattening is still hurting my head.

def callStackImperative(act: Activation): Seq[String] = {
val buf = collection.mutable.ListBuffer[String]()
for(a <- unfold(act)) {
buf += a.procedure.name
if (a.parent != null) {
val c = a.parent.procedure.commands(a.returnAddress)
if (c.callsOtherCode) buf += c.name
}
}
buf
}

This is a case where an imperative style actually does quite well. I usually try to avoid this style like the plague, but in this case theres little doubt that this code is simple.

I still thought I could do better, so I continued. This time with a Racket friend.

def callStackSchemer(act: Activation): List[String] = {
def callStackInner(act: Activation): List[String] = {
if (act == null) Nil
else {
val rest = callStackInner(act.parent)
def parCmd = act.parent.procedure.commands(act.returnAddress)
act.procedure.name ::
(if (act.parent != null && parCmd.callsOtherCode) parCmd.name :: rest
else rest)
}
}
callStackInner(act)
}

The advantage of this approach is that it could be read by a 10 year old. Ok maybe not 10. But, there are no for comprehensions, options, or flattening bits here. However, I do think there is a little bit of mixing up what is happening with how its happening. That is, the result building is mixed up with the structural recusion on the Activation. For this little function, it almost certainly doesn't matter that much. For something larger, it might.

Let explore a different topic for a bit. Sometimes, even with inner defs, pulling something outside of the body of a function helps clear out the clutter and makes it more readable. We did this already with unfold. There are tradeoffs of course, such as you then have a function in the top level that is only being used in one place. But, lets try it again anyway and see where it gets us..

def parentCommandMaybe_V1(act: Activation): Option[String] =
Option(act.parent).flatMap(p => {
val c = p.procedure.commands(act.returnAddress)
if (c.callsOtherCode) Some(c.name) else None
})

Pulling that out also helps me realize just how complicated that bit is. Maybe we can do better just on this part. Here are two attempts.

def parentCommandMaybe_V2(act: Activation): Option[String] =
if(act.parent == null) None
else{
val command = act.parent.procedure.commands(act.returnAddress)
if(command.callsOtherCode) Some(command.name) else None
}

def parentCommandMaybe_V3(act: Activation): Option[String] =
for{p <- Option(act.parent)
c = p.procedure.commands(act.returnAddress)
if c.callsOtherCode}
yield c.name

To be quite honest, none of these solutions are glorious. I'm tempted to use V2 because any knuckehead could read it, and its of comparable size. I could have easily decided on V1 or V3 because we don't have any knuckeheads on our team. We do however, occasionally have students on the team and we are attempting to open source our code. Is that a good reason to dumb down the code? No, certainly not. But, if a particular function is of comparable size, then maybe it is a good idea. Or maybe not, maybe leaving the more advanced constructs (like for comprehensions) would make code readers learn more about the language that they are using. I'm undecided on this. Given the lack of wanting to make a decision here, I'll just choose V3.

Better solutions to this are most welcome.

Now that we've pulled that out, we can rewrite our function.

def callStack_V7(act: Activation): List[String] = {
unfold(act).flatMap { a =>
act.procedure.name :: parentCommandMaybe_V3(act).toList
}
}

I think this is a really nice version. But...now that we've cleaned up some of the helper methods, lets see what things look like if we put them back in.

def callStack_V8(act: Activation): List[String] = {
def unfold(act: Activation) =
Iterator.iterate(act)(_.parent).takeWhile(_ != null).toList
def parentCommandMaybe_V3(act: Activation): Option[String] =
for{p <- Option(act.parent)
c = p.procedure.commands(act.returnAddress)
if c.callsOtherCode}
yield c.name
unfold(act).flatMap { a =>
a.procedure.name :: parentCommandMaybe(a).toList
}
}

Not bad. It may be a few lines longer than version 1, and it took quite a while to get here, but I do think this version is the best we can do. 'We' being my team. Finally, I'll add in some comments and call it a day. It would be nice if the code was so blatently obvious that this wasn't needed, but it's ok. Comments are helpful.

/**
* Return a List representing the call stack.
* The list includes procedure names and any commands
* that call other code.
* @param act - the final call on the stack.
**/
def callStack(act: Activation): List[String] = {
// unfold turns an activation into
// List(act, act.parent, act.parent.parent ... )
def unfold(act: Activation) =
Iterator.iterate(act)(_.parent).takeWhile(_ != null).toList
// if an activations parent is executing a command that can trigger
// other code, it also needs to be added to the stack trace.
def parentCommandMaybe(act: Activation): Option[String] =
for{p <- Option(act.parent)
c = p.procedure.commands(act.returnAddress)
if c.callsOtherCode}
yield c.name
// flatMap because each activation can result in 1 or 2 entries
unfold(act).flatMap { a =>
a.procedure.name :: parentCommandMaybe(a).toList
}
}


If you can write this better, please do. Send it to me and I'll learn something.