Wednesday, July 22, 2009

What JVM Language Do You Write Tests In?

 
At a recent NYC Scala meetup, I got chatted up and basically was asked the following:

I know you've been writing multi-threaded testing API's in Scala, but we're still writing tests in Java, so do you know any good Java multi-threaded testing API's?


(As a matter of fact I do, but that's not the point of this post.)

I couldn't help but immediately feel very bad. I feel genuinely sorry. Overwhelmingly sorry, in fact. Why? Because people in this world are still writing tests using Java. So, I'm calling on people to help me build an argument that can be used by people still trapped in a cave writing tests in Java. Most of the rest of this post will be a rant to start generating ideas, I'll try to formalize the argument by reading comments and then put it into another post.

Random Ranting To Generate Ideas


Scala and Groovy and JRuby and Jython and Clojure and so many more are by far better alternatives for testing on the JVM, and some of them have been for a long, long time. I started testing using Scala almost two years ago. I submit that if you can't get your organization to switch your testing language, you should leave. It's not even production code!

You can still run your old tests anyway, even if just in your current runner. ScalaTest also gives you many alternatives for running all your old tests and new tests very easily in the same runner.

You might argue that now you have to maintain two different code bases, two different languages for your tests. Honestly, I say Come On!. First, the new code tests will be far easier to maintain than any new tests you would have written in Java. Second, were professional software developers. If you can't manage one extra directory of code, you probably shouldn't be in this business.

I think of more later.

Poll


Also, I'm creating a new poll. What JVM Language Do You Write Tests In? If you are reading this from a feed, you might have to click to get in: http://jackcoughonsoftware.blogspot.com/

Saturday, July 04, 2009

Testing Multi Threaded Scala Code

 
Recently I did a (somewhat brute force) port of MultithreadedTC. All of this work stems directly from it, and I deserve basically no credit. However, I've done a bunch of interesting work around it, and I wanted to talk about it a bit. First I want to mention a few important points.

  1. We all know that the concurrency model used in Java has proven to be far too complex. About 5 people in the world can actually do it right. While this test api can be used to test code written in that style pretty easily, I'm not advocating writing anything in that style.
  2. java.util.concurrent is still very useful. This api can be used to test j.u.c, Scala libraries that wrap it, and Scala libraries that use it.
  3. This code will be in an upcoming release of ScalaTest, and (pending discussions) could make its way into Specs as well.
  4. In the very near future this library should be very useful for testing actor code as well. Discussions and experimental implementations are ongoing. Expect a post about that very soon. Additionally, any Actor testing api will most definitely make its way into ScalaTest.


Onward to an example! I'm going to dump it out, and explain after.

package scala.util.concurrent.locks


class PimpedReadWriteLockTest extends MultiThreadedSuite with
MustMatchers with MustBeSugar {

val lock = new java.util.concurrent.locks.ReentrantReadWriteLock
import Implicits.RichReentrantReadWriteLock

thread("writer thread") {
waitForTick(1)
lock.write {tick mustBe 2}
}

thread("reader thread") {
lock.read { waitForTick(2) } }
}

5.threads("reader thread("+_+")") {
lock.read { waitForTick(2) }
}

10 anonymousThreads {
lock.read { waitForTick(2) }
}
}

There's a huge amount of stuff going on here, and I'll do my best to explain it one line at a time. Also, for the purposes of understanding and clarity, I might intentionally leave out some details that would only confuse. Okay, go.

class PimpedReadWriteLockTest extends MultiThreadedSuite with
MustMatchers with MustBeSugar {

MultiThreadedSuite is a ScalaTest suite that contains all the goodies for multithreaded testing. Those goodies will be covered as we go. The matchers give me my must methods as seen in: tick mustBe 2.

val lock = new java.util.concurrent.locks.ReentrantReadWriteLock
import Implicits.RichReentrantReadWriteLock

Create a read/write lock, and then import the code to Pimp it. The rich version allows you to say lock.read{...}, and lock.write{...}. These do basically what you would imagine, but here's a shallow example for lock.read (its not exact, but you get the idea):

readLock.lock()
f()
readLock.unlock()

Creating a test Thread


thread("writer thread") {
...
}

On this line, a Thread is created with the name "writer thread". The thread method also returns the Thread, so you could assign it to a val and use it later on like so: val writer = thread("writer thread"){ ... } When the thread is started, it will run the code in between the curly brackets. I'll explain that code now.

Wait For Tick


Internally, MultiThreadedSuite maintains a clock. The clocks time starts at 0 and only advances when the following conditions hold:

  1. ALL test threads are blocked.
  2. At least one of those threads is waiting for the clock to tick.

Why the last condition? If all threads are blocked, and no one is waiting for a clock tick, you have a deadlock!

waitForTick(n) blocks the current thread until time n is reached on the clock. If the current time on the clock is greater than n, then the thread does not block.

Revisiting the code:

thread("writer thread") {
waitForTick(1)
lock.write {tick mustBe 2}
}

The writer thread immediately blocks, waiting for tick 1. When time 1 is reached, the thread unblocks, and attempts to acquire the write lock. As we'll find out later shortly, it will be unable to do so just yet. That's the important part of this test.

Next:

thread("reader thread") {
lock.read { waitForTick(2) } }
}

Just as with the writer thread, this code creates a thread that will run during the test. When the thread runs it will acquire the read lock, and then wait for tick 2. This thread is guaranteed to get the read lock before the writer thread gets the write lock. How? Because the write thread immediately blocks waiting for tick 1. Recall that the clock doesn't tick until ALL threads are blocked. The reader thread doesn't block until it says waitForTick(2), at which point it already has the read lock. :)

Now, the curious reader might wonder, why waitForTick(2). Let me explain. The test notices that someone is waiting for a tick, and if all threads are blocked, it advances the clock. The thread waiting for a tick only unblocks if the current time is equal to the time its waiting for, otherwise it stays blocked. The test system sleeps for a little bit, wakes up, realizes that all threads are still blocked and someone is still waiting for a tick, and advances the clock again.

With that explained, you can see that waitForTick(2) is just shorthand for:

waitForTick(1)
waitForTick(2)

Breaking it up this way will help us analyze what is happening in the test, and I'll do that shortly. First though, lets take a look at the rest of the code.

Creating LOTS of threads


5.threads("reader thread("+_+")") {
lock.read { waitForTick(2) }
}

This code simply creates 5 threads. They all happen to be reader threads, and they are given the names "reader thread(1)" ... "reader thread(5)". It's a convenience method for creating many threads that you want to do exactly the same thing.


10 anonymousThreads {
lock.read { waitForTick(2) }
}

Creates 10 anonymous threads that also happen to be reader threads. They are actually given names, because all threads have names. The name is "Thread N", where N corresponds to the number of threads created so far. So these guys will actually get named something like Thread 7 ... Thread 16. I don't really recommend using anonymous threads, but the style is a little bit less verbose, so maybe someone will like it.

Now, these methods return List[Thread]. I've chosen to ignore the return value here, but I can imagine cases where its very useful to have. The code could easily be rewritten:

val readers: List[Thread] = 5.threads("reader thread("+_+")") {
lock.read { waitForTick(2) }
}
val namelessReaders: List[Thread] = 10 anonymousThreads {
lock.read { waitForTick(2) }
}

Step by Step


  1. waitForTick(1) blocks the writer thread until tick 1 happens.
  2. The reader threads all get the read lock.
  3. waitForTick(1) blocks the reader thread until tick 1 happens. The writer thread is also waiting for tick 1. They will all get the read lock, and then all block waiting for tick 1. When they all get to that point, the clock is advanced, and the threads continue.
  4. The write thread will wake up and try to grab the write lock, but it wont be able to, because the read lock is held 16 times over! It will block.
  5. waitForTick(2) causes the reader threads to block waiting for tick 2. The clock can advance because all the reader threads are blocked waiting for tick 2, and the writer thread is blocked waiting for the write lock.
  6. After tick 2, all the reader threads wake up. The writer thread is still blocked, waiting for the read lock.
  7. The reader threads all give up the read lock in concert.
  8. The writer thread finally gets the write lock. When it does, it makes sure that the current tick is 2. Indeed, it is.

Summary


This should be the foundation of some really interesting work to come. I'll keep everyone informed.

Wednesday, June 24, 2009

Pimp vs Just Plain Fix my Library

 
I assume most people familiar with Scala are familiar with Pimp My Library. It's just a fun and useful thing to be able to add a missing method onto an API, or to sometimes be able to treat an object like something else.

As fun as it is (especially with the word Pimp), I kind of want to take the fun out of it a little bit. I want to say that its not just about adding that one great feature. Let me make this boring and annoying assertion: Pimping is most useful for fixing crappy, terrible, miserable API. And while that's cool, and useful, it kind of sucks. There's so much terrible code out there that is a nightmare to work with. I feel like I shouldn't have to be fixing up other people's crap, but at least I can.

Now for an example. Ever dealt with Java's ThreadGroup? Ever want to just get the Threads in a ThreadGroup? Sounds reasonable enough right? Holy Mother.... Couldn't be more wrong. Check out this gem that I lovingly stole straight from the Javadoc:

enumerate

public int enumerate(Thread[] list)
Copies into the specified array every active thread in this
thread group and its subgroups.

First, the checkAccess method of this thread group is
called with no arguments; this may result in a security exception.

An application might use the activeCount method to
get an estimate of how big the array should be, however if the
array is too short to hold all the threads, the extra threads are
silently ignored.
If it is critical to obtain every active
thread in this thread group and its subgroups, the caller should
verify that the returned int value is strictly less than the length
of list.

Due to the inherent race condition in this method, it is recommended
that the method only be used for informational purposes.

Parameters:
list - an array into which to place the list of threads.
Returns:
the number of threads put into the array.


WHAT?!? This API is just plain broken. It desperately needs fixing. All I want to do is get all the Threads in the Group...why should I have to deal with creating my own Array (and guessing the size), having it work entirely off side effects...And does it really return the number of threads it put into the Array? I'm afraid so. Broken. And does it really SILENTLY IGNORE things when there are too many threads to fit in the array!?! Horrible disaster.

Not even that fun to fix to be honest, but the resulting code is far more manageable. But here's a solution.


object PimpedThreadGroup {
implicit def threadGroupToPimpedThreadGroup(tg: ThreadGroup) = {
new PimpedThreadGroup(tg)
}
}

class PimpedThreadGroup(threadGroup: ThreadGroup) {

def getThreads: List[Thread] = getThreads(true)

def getThreads(recursive: Boolean): List[Thread] = {
def getThreads(sizeEstimate: Int): Seq[Thread] = {
val ths = new Array[Thread](sizeEstimate)
if (threadGroup.enumerate(ths, recursive) == sizeEstimate)
getThreads(sizeEstimate +10)
else for (t <- ths; if (t != null)) yield t
}
getThreads(threadGroup.activeCount() + 10).toList
}

def filter(state: State): List[Thread] = {
getThreads.filter(_.getState == state)
}

def exists(state: State): Boolean = {
getThreads.exists(_.getState == state)
}

def any_threads_alive_? = {
getThreads.exists(t => t.getState != NEW && t.getState != TERMINATED)
}

def any_threads_running_? = {
getThreads.exists(_.getState == RUNNABLE)
}

def any_threads_in_timed_waiting_? = {
getThreads.exists(_.getState == TIMED_WAITING)
}
}


Most important: def getThreads: List[Thread]. Now I can simply call threadGroup.getThreads and get back a List[Thread]. That's all I ever wanted. Is that too much to ask?

I can also add something to simply treat a ThreadGroup as a List[Thread], if I want. I'm not sure I like this because it could get me into some trouble (and it always does recursive search), but I do like the power it gives - I can call any method on List directly on the ThreadGroup.


implicit def ThreadGroupToList(tg:ThreadGroup): List[Thread] = {
new PimpedThreadGroup(tg).getThreads
}


By the way, I'm not going to explain it. I'm assuming you already know how it works (I might add a few more examples of how it can be used though, but I'm sure most people already understand). If you don't, you can click the Pimp My Library link above, or Google it. There's plenty out there.

In conclusion, was this a bit of a rant? I guess. Here's what we should take away from this though: Pimp My Library is a very effective tool not just for adding nice things to API's, but fixing broken ones. If it's our duty to refactor our broken code, to always make our code better when we have to modify it, then it's just as much our duty to fix up API's. In this case, we just don't have the original source. Refactoring without source code. It's actually pretty awesome.

Sunday, June 14, 2009

Scala MultithreadedTC Port

I haven't posted in quite a while now, but only because I've been working really hard. I asked Doug Lea if I could help him on his upcoming Scala concurrent data structures endeavor, especially on testing and to my surprise he sounded quite happy about the idea. He's going to get started sometime in July, and I figured from now until then would be a great time to get caught up on testing concurrent data structures, and have a nice framework in place before any code actually gets written. To that end, I've decided to port MultithreadedTC to Scala.

The reasons for doing so are quite simple - we can take advantage of Scala's flexible syntax, HOF's, yada yada...I don't think that that needs to be explained yet again. And, I don't think this particular example demonstrates something like, "but here those features are particularly valuable". The resulting code is definitely nicer than the original Java code, and that's simply all.

Over the next few days I'll probably update this post giving more examples. I'll start with one tonight. Also, you can also learn more about how it all works by clicking the link above.

I'll start with an example from the original Java code from the MultithreadedTC source, and I'll try to give a reasonable explanation of what is going on.


1 class MTCTimedOffer extends MultithreadedTestCase {
2 ArrayBlockingQueue<Object&rt; q;
3
4 @Override public void initialize() {
5 q = new ArrayBlockingQueue<Object>(2);
6 }
7
8 public void thread1() {
9 try {
10 q.put(new Object());
11 q.put(new Object());
12
13 freezeClock();
14 assertFalse(q.offer(new Object(), 25, TimeUnit.MILLISECONDS));
15 unfreezeClock();
16
17 q.offer(new Object(), 2500, TimeUnit.MILLISECONDS);
18 fail("should throw exception");
19 } catch (InterruptedException success){ assertTick(1); }
20 }
21
22 public void thread2() {
23 waitForTick(1);
24 getThread(1).interrupt();
25 }
26}


This code tests the interactions of two threads. The second thread interrupts the first thread (line 24) as its attempting to offer an object to the queue on line 17. One challenge that the library means to solve is - How can we be sure that thread2 calls interrupt at the appropriate time? It does so by maintaining and internal metronome (or clock). The clock ticks forward only when all threads are blocked and at least one thread is waiting for the clock to tick.

On line 23 thread2 waits for the clock to tick. Remember, the clock doesn't tick until all threads are blocked. So when does thread1 become blocked? Well, since the queue can only contain two elements, it becomes blocked on line 14, when it tries to offer a third object. However, in this case, we've frozen the clock, so the clock will not tick, and thread2 will continue to wait.

Finally on line 17 thread1 becomes blocked offering a third object to the queue, and the clock is not frozen. Behind the scenes the framework sees that all threads are blocked, and indeed someone is waiting for the clock to tick (thread2). The clock ticks, and thread2 advances. He then interrupts thread1 using the getThread method. thread1 enters its catch block on line 19, checks to see that the clock has indeed reached 1, and everyone is happy.

Sort of...

There are a few things we can do better. Most importantly, of course - we can get rid of a lot of semicolons! Ok maybe that's not most important, but I really hate semicolons. Here's a more serious list of the deficiencies in the code above (oh and, its really not that bad at all).


  1. The call to getThread is not type safe at all, and break on a refactoring if you decided to rename a thread. How does that work anyway? Answer: the threads are named by the method names. On line 8 we have "public void thread1" and so we have a thread named thread1. This was originally done for a good reason - creating threads in Java is very verbose. This cleans that up a lot, at the minor price of some type safety.
  2. When we freeze the clock we could easily forget to unfreeze it, leading to quite a bit of headache.

There are more, and I will address them here, but let us focus on those two for now. I'll give the Scala equivalent first, then explain how I've addressed those issues.


1 class MTCTimedOffer extends MultiThreadedSuite {
2
3 val q = new ArrayBlockingQueue[String](2)
4
5 val producer = thread{
6 q put "w"
7 q put "x"
8
9 withClockFrozen {
10 q.offer("y", 25, TimeUnit.MILLISECONDS) mustBe false
11 }
12
13 intercept[InterruptedException] {
14 q.offer("z", 2500, TimeUnit.MILLISECONDS)
15 }
16
17 tick mustBe 1
18 }
19
20 thread{
21 waitForTick(1)
22 producer.interrupt()
23 }
24}


Notice that the code isn't much smaller, 24 vs 26 lines. I'm not touting a giant absurd improvement in any way. However, lets look at the issues above.


  1. On line 5 ( val producer = thread{ ) an actual Thread object is returned, and that thread can be referenced by any of the other threads in the system by name, in an intuitive, type safe way. I do this simply, by having the thread method take a HOF and wrapping that HOF in a Thread. Easy. Notice on line 22 the second thread references the producer thread directly.

    Also, If I don't feel like it, I don't have to assign the thread to a val, and I don't have to reference it anywhere. The Thread created in line 20 isn't really needed by anyone else, so it remains anonymous (behind the scenes it actually gets the name thread2, and can still be gotten using the getThread method seen above).
  2. The second issue was a very simple one where we could forget to unfreeze the clock. Simple is ok, because all these simple fixes added up big in the long run, IMO. This issue is solved by using the withClockFrozen which appears on line 9. This is also a method that takes a function. It then does the work of freezing the clock for you, running the function, then unfreezing the clock. Simple, but effective.


Ok, its really late, and I have to work tomorrow. But, expect this post to be updated regularly in the near future. I'll likely be working with Bill Venners and Eric T. on getting this stuff into ScalaTest and/or Specs, and DL on improving this stuff, and hopefully Bill Pugh, the original author. Any ideas, comments, improvements, etc would be greatly appreciated! I'll have the source code somewhere where people can see it very soon. Bye!

Monday, June 01, 2009

Comedy: Scala vs. Ruby vs. Objective C

 
Please submit your favorite (or least favorite) language!

Added: Java, Clojure, JavaScript

Scala


this.computers = Array(
Map("Name" -> "MacBook", "Color"->"White"),
Map("Name" -> "MacBook Pro", "Color"->"Silver"),
Map("Name" -> "iMac", "Color"->"White"))

Ruby


self.computers = [
{:Name=>"MacBook", :Color=>"White"},
{:Name=>"MacBook Pro", :Color=>"Silver"},
{:Name=>"iMac", :Color=>"White"}]

Objective C


NSDictionary *row1 =
[[NSDictionary alloc] initWithObjectsAndKeys:
@"MacBook", @"Name", @"White", @"Color", nil];
NSDictionary *row2 =
[[NSDictionary alloc] initWithObjectsAndKeys:
@"MacBook Pro", @"Name", @"Silver", @"Color", nil];
NSDictionary *row3 =
[[NSDictionary alloc] initWithObjectsAndKeys:
@"iMac", @"Name", @"White", @"Color", nil];

NSArray *array =
[[NSArray alloc] initWithObjects: row1, row2, row3, nil];

self.computers = array;

[row1 release];
[row2 release];
[row3 release];
[array release];

Java


Assuming this field:

Map<String, String>[] computers;

Then:

Map[] maps = {
new HashMap(){{
put("Name", "MacBook");
put("Color", "White");
}},
new HashMap(){{
put("Name", "MacBook Pro");
put("Color", "Silver");
}},
new HashMap(){{
put("Name", "iMac");
put("Color", "White");
}}
};
this.computers = maps;

Notice you can't use generics with the array creation, and, 4 space indenting! Yeah!

Clojure


(def computers [
{:Name "MacBook"}
{:Name "MacBook Pro" :Color "Silver"}
{:Name "iMac" :Color "White"}])

JavaScript


this.computers =
[{'name':'MacBook', 'color':'White'},
{'name':'MacBook Pro', 'color':'Silver'},
{'name':'iMac', 'color':'White'}];

Sunday, May 31, 2009

Good Use Case for Named Arguments

 
I found what I consider a really great use case for using keyword arguments. I'm sure there are tons of them, I just want to list this one.

Here I have some code that I had in CPU Simulator that I was using to test a FlipFlopChain, which is essentially just an N Bit Memory (maybe I'll rename it that).

val data = AllGeneratorNumber(8)
val chain = FlipFlopChain(data, Generator.on)
...

Notice that the second parameter to FlipFlopChain is Generator.on. What is this parameter though? We have to navigate into FlipFlopChain just to find out that it represents the write bit. If we didn't have the source, we might have no idea.

So I changed the code to make it more explicit, by declaring a writeBit val.

val data = AllGeneratorNumber(8)
val writeBit = Generator.on
val chain = FlipFlopChain(data, writeBit)
...

Note that I never use the writeBit again in the test code, its just there to help the reader understand what that second parameter is.

With keyword arguments, there is a much better solution:

val data = AllGeneratorNumber(8)
val chain = FlipFlopChain(data, writeBit=Generator.on)
...

Win!

Saturday, May 30, 2009

Some Simple Scala Refactorings

 
I constantly go back and tinker with my old Scala code, especially in my CPU Simulator project. When I'm in the old code I notice something that I haven't really noticed before - the code is still good. In a previous life when I wrote Java code, I'd go back and look at old code and want to throw up. Scala is so concise that this just doesn't happen. Yes, I do find room for improvement here and there, but overall I'm still really happy with the code (there was an exception, when I didn't know anything about functional programming, and refactored from imperative to functional...but that doesn't count).

Anyway, I have a few refactorings that I wanted to mention.

Refactoring to Case Classes


I refactored all of my LogicGate classes to be case classes instead of regular classes. The reason? The code is just prettier. I'm not a huge fan of the new keyword in general, and I especially don't like it littering up my code.

Old Code:

class XorGate(inputA: PowerSource, inputB: PowerSource)
extends LogicGate {
val output: PowerSource =
new AndGate(
new OrGate(inputA, inputB),
new NandGate(inputA, inputB))
}

New Code:

case class XorGate(inputA: PowerSource, inputB: PowerSource)
extends LogicGate {
val output: PowerSource =
AndGate(OrGate(inputA, inputB), NandGate(inputA, inputB))
}

The difference is small, but I literally had hundreds of new calls littered throughout my code. Now, I could also mention the extra power that case classes give as well - nice toString, pattern matching, hash code and equals... I happened to not really be using most of those things in the CPU simulator, so I don't have an immediate example. Oh well. In my opinion/experience, favor case classes over regular classes.

This does violates encapsulation somewhat though. Previously, inputA and inputB weren't accessible from outside the class. I can get around that easily enough by adding private to my vals:

case class XorGate(private val inputA: PowerSource,
private val inputB: PowerSource)
extends LogicGate {
val output: PowerSource =
AndGate(OrGate(inputA, inputB), NandGate(inputA, inputB))
}

Now I'm not exposing those fields, I still have all the power mentioned above, and I don't have the pesky "new" statements hanging around.

Refactoring to Fewer Files


I noticed something a little unsettling. I had separate files LogicGate.scala, AndGate.scala, OrGate.scala, NandGate.scala NorGate.scala, XorGate.scala. One for each type of gate. Several files, and they were all very very small, less than 10 lines each, all with a common package statement, and similar imports.

So I tried something - putting them all into one file. This is something I normally do by default now. I'm not sure when I started putting lots of classes into one file and not spreading them out...Anyway, in my opinion the result was a lot better. Instead of 6 files roughly 5-10 lines long, I have one file less than 40 lines long. I got rid of the redundant package and import statements. But most importantly, now I can see everything there is to know about my logic gates on the screen at one time.


package com.joshcough.cpu.gates

import electric.{Relay, Inverter, Wire, PowerSource}

object LogicGate{
implicit def logicGateToPowerSource( lg: LogicGate ): PowerSource = lg.output
}

trait LogicGate{
val inputA: PowerSource
val inputB: PowerSource
val output: PowerSource
}

case class AndGate(val inputA: PowerSource,
val inputB: PowerSource) extends LogicGate {
val output: PowerSource = Relay(inputA, Relay(inputB))
}

case class NandGate(val inputA: PowerSource,
val inputB: PowerSource) extends LogicGate{
val output = new Wire
Inverter(inputA)-->output
Inverter(inputB)-->output
}

case class NorGate(val inputA: PowerSource,
val inputB: PowerSource) extends LogicGate{
val output: PowerSource = Inverter(inputB, Inverter(inputA))
}

case class OrGate(val inputA: PowerSource,
val inputB: PowerSource) extends LogicGate{
val output = new Wire
inputA-->output
inputB-->output
}

case class XorGate(val inputA: PowerSource,
val inputB: PowerSource) extends LogicGate {
val output: PowerSource =
AndGate(OrGate(inputA, inputB), NandGate(inputA, inputB))
}


I know some people who have said they will never do this, never having more than one class in a file. I think that's wrong. When you have several small classes, seeing everything at once overrules most arguments.

More


I probably should suck it up and turn my logic gates into actual functions at some point in the near future. That should provide some really interesting material on refactoring. Until then, cya.

Simple Scala Keyword Parameters

Everything here is mostly intuitive, but I'm putting it here for personal reference. Should be helpful to a lot of people though. There's probably a whole ton of interesting cases I'm leaving out, and I'll try to keep this updated when I think of them. For now I'm covering Default Values, No Default Values, and Overloading. (Note, this doesn't come out until 2.8, I'm working off trunk)

Default Values


  • Define a class that takes two keyword args, name and lives, and provide default values.

    scala> case class Cat(name:String="kitty", lives:Int=9)
    defined class Cat

  • Instantiate the cat providing no arguments.

    scala> new Cat
    res1: Cat = Cat(kitty,9)

  • Instantiate a cat providing both params, unnamed.

    scala> Cat("Java", 1)
    res2: Cat = Cat(Java,1)

  • Instantiate a cat providing just the first param, unnamed.

    scala> Cat("Scala")
    res3: Cat = Cat(Scala,9)

  • Instantiate a cat providing both params, named.

    scala> new Cat(name="newspeak", lives=20)
    res4: Cat = Cat(newspeak,20)

  • Instantiate a cat providing both params in reverse order. (Only works if names the argument names are given.)

    scala> new Cat(lives=20, name="newspeak")
    res5: Cat = Cat(newspeak,20)

  • Instantiate a cat providing the first param, named.

    scala> new Cat(name="newspeak")
    res6: Cat = Cat(newspeak,9)

  • Instantiate a cat providing the second param, named.

    scala> Cat(lives=4)
    res7: Cat = Cat(kitty,4)

  • Instantiate a cat providing the first argument unnamed, and the second argument named!

    scala> new Cat("Lua", lives=1)
    res8: Cat = Cat(Lua,1)

  • Attempt to instantiate a cat providing the first argument named, and the second argument unnamed. You can't do it! After you name a parameter, the parameters that follow must be named.

    scala> new Cat(name="Lua", 1)
    :7: error: positional after named argument.
    new Cat(name="Lua", 1)
    ^

No Default Values


  • Redefine class Cat without supplying default values. This means values must be provided when instantiating a cat (more generically, calling a function).

    scala> case class Cat(name:String, lives:Int)
    defined class Cat

  • Instantiate a cat providing both params, unnamed.

    scala> Cat("Java", 1)
    res9: Cat = Cat(Java,1)

  • Instantiate a cat providing both params, named.

    scala> new Cat(name="Douglas", lives=42)
    res10: Cat = Cat(Douglas,42)

  • Attempt to instantiate a cat, not providing values for the arguments.

    :7: error: not enough arguments for constructor Cat:
    (name: String,lives: Int)Cat, unspecified parameters:
    value name, value lives
    new Cat
    ^

Overloading


  • Redefine the class, overloading the constructor.

    scala> case class Cat(name:String="kitty", lives:Int=9){
    | def this(name:String) = this(name, -54)
    | }

  • Instantiate a cat providing just the first argument, unnamed. Since the compiler does find a method with the exact signature (in this case - one String), it calls it.

    scala> new Cat("Martin")
    res11: Cat = Cat(Martin,-54)

  • Here's a couple of other similar, and interesting cases. First, overload the constructor giving the argument a different name than the one defined in the primary constructor.

    scala> case class Cat(name:String="kitty", lives:Int=9){
    | def this(x:Int) = this("hello", x+99)
    | }
    defined class Cat

  • Instantiate a cat, using the keyword 'lives'. Since the overloading method uses the name x, and x!=lives, and the original constructor uses 'lives', that is the method that is invoked.

    scala> new Cat(lives=8)
    res12: Cat = Cat(kitty,8)

  • Instantiate a cat - same as the String overloading case above.

    scala> new Cat(8)
    res13: Cat = Cat(hello,107)

Tuesday, May 19, 2009

Teaching Functional Programming To Kids

I started teaching some of the basic concepts of functional programming to my 8 year old son yesterday, and wanted to write a little about it. The wonderful thing about it is that kids really are ready to learn the concepts at a very young age. I'm not actually teaching him programming, just concepts, but when the time comes, he'll basically already know how to program.

I have what I think is an absolutely perfect example. It's one that all parents and kids can identify with: The Dr. Seuss Star Belly Sneetch machine.




This is a simple machine that takes in a Sneetch without a star on its belly, and spits out a Sneetch with a star on its belly. I'm just going by memory here to say that I think kids can probably understand this concept at the age of 3. In this post I'm going to use this style to represent machines:

-----------
| |
Sneetch ---> | * | ---> Star Bellied Sneetch
| |
-----------

This is the same as the actual picture above, but works for all cases since I don't have pictures for all the concepts I want to represent.

In the Dr. Seuss book there is also the opposite machine that removes the star from a star bellied Sneetch.

-----------
Star Bellied | |
Sneetch -> | -* | ---> Sneetch
| |
-----------

While that seems really simple, it's all we need to start teaching a wide range of concepts to kids. I started with this one, because of its similarity with the machines above (for all the things below, Jacy and I worked them out on a whiteboard. But, paper is just as good):

-----------
| |
10 ---> | +5 | ---> 15
| |
-----------

Here we have a machine that adds five to whatever you put into it. Very simple, very easy for kids to understand. It helps to run a few more inputs through (0, 10, a billion) just to let them know that the box doesn't just take 10 and give 15, it works with all numbers.

After this one I followed up with another very easy one.

-----------
| |
10 ---> | -5 | ---> 5
| |
-----------

At this point he pointed out, "Well it could be a divided by two machine instead." This was unexpected, and impressive, and at some point I'll talk about it further...but not yet. It was great to feel that he was understanding it though.

Now that he was getting it, it was time to change things up just a little bit. I introduced the plus and minus machines, which take two inputs instead of one.

-----------
7 ---> | |
| + | ---> 19
12 ---> | |
-----------

-----------
12 ---> | |
| - | ---> 5
7 ---> | |
-----------

These presented no challenge whatsoever. In fact (I guess rather surprisingly), nothing I taught him presented any sort of challenge. Next I introduced the biggest and smallest machines (which we programmers call max and min).

-----------
7 ---> | |
| Biggest | ---> 12
12 ---> | |
-----------

-----------
7 ---> | |
| Smallest | ---> 7
12 ---> | |
-----------

-----------
10 ---> | |
| Smallest | ---> 10
10 ---> | |
-----------

I guess he was a bit surprised when I showed him the last one. But, it only took showing him the answer once for him to fully understand.

I then added an equals machine that spits out YES! if the two numbers are equal, and NO! if they aren't (true and false, obviously). This is different because now we were no longer working with numbers as the inputs and outputs.

-----------
7 ---> | |
| = | ---> NO!
12 ---> | |
-----------

-----------
7 ---> | |
| = | ---> YES!
7 ---> | |
-----------

Simple, but effective.

Now, Jacy and I have done considerable work with logic gates, and I wanted to show him how logic gates are really just like machines. I also taught him the word Function at this point, but didn't push it. Kids can relate to machines, not functions.

------------
ON ---> | |
| AND GATE | ---> OFF
OFF ---> | |
------------

------------
ON ---> | |
| OR GATE | ---> ON
OFF ---> | |
------------

------------
ON ---> | |
| AND GATE | ---> ON
ON ---> | |
------------

While the logic gate examples seem simple, it tied two worlds that we've been working on together very nicely.

Fun


My son has a really short attention span, and all the while I'm doing this I have to think of different ways to make it fun. If it's not fun, hes just going to go play video games. Rightfully so, video games are fun. There were a few ideas I tinkered with before settling on the Dr. Seuss machine. One was a monster that stuffs stuff into his mouth and then spits out the answer. I thought that one was kind of neat. The point is, if you plan on teaching your child, think of something fun they can relate to.

Combining Machines



I could sense we needed some more fun at this point, and we'd learned enough basic machines that I thought it would be great to start combining them. After trying this, I recommend starting with all alike boxes. We did something a little more complicated and I ended up going too fast, and fell back on this:

-----------
7 --> | | -------
| + | --> 19 --> | |
12 --> | | | |
----------- | |
| + | --> 39
----------- | |
10 --> | | | |
| + | --> 20 --> | |
10 --> | | -------
-----------

After you get this first larger machine done, its pretty easy to add in more complicated machines. However, it might be good to wait until the next day, as Jacy was definitely getting it, but might have been getting a little fried. Here's an example though:

-----------
7 --> | | -------
| + | --> 19 --> | |
12 --> | | | |
----------- | |
| = | --> NO!
----------- | |
10 --> | | | |
| + | --> 20 --> | |
10 --> | | -------
-----------

And obviously, change the 7 to an 8 and get a YES! Different kinds of boxes doing and spitting out different kinds of things. In essence, this is really all we do as programmers.

Types


Being a lover of static type systems, I also talked to him about types, by saying "kind(s) of things". For example, I asked him, "What kind of thing does this machine take in (or what kind of thing do you put into this machine)?"

-----------
| |
10 ---> | +5 | ---> 15
| |
-----------

Answer: Number. I avoided Integer for now. What kind of thing does it spit out? Answer: Number.

I then showed him this next example, which should arguably have a section of its own:

-----------
| |
D ---> | +5 | ---> I
| |
-----------

This machine looks exactly the same as the machine above, except you put Letters into it, and it spits out letters. We also did months. Both are interesting because they have to loop around. I didn't have to teach him that, he just got it.

Then I introduced a formal notation for types:

-----------
7 ---> | |
| + | ---> 19
12 ---> | |
-----------

(Number,Number) -> Number

And introduced machines that change the type (he had seen it already, but only with YES! and NO! This, I think, is a better example):

-----------
| |
5 ---> | To Letter | ---> E
| |
-----------

Number -> Letter


He understands the notation and can write it if I give him slots to fill in like this:

______ -> ______

or

(______, ______) -> _______


Things I don't know how to teach, Yet


I certainly didn't try to teach him anything about machines that take in machines and spit out machines. Also, some of my boxes were polymorphic, but I don't think I know how to explain that to him.

For now, I think Jacy and I will just do this same stuff for a while, reinforcing it. I'm not sure what the best thing to teach him next is. Some of the stuff here I've skimped on writing up, and we actually spent more time on than it seems.

Anyway, this was all really, really fun, for both of us.

Thursday, May 14, 2009

Refactoring in Scala

Another really long post, but why not!

My Lexer is now approaching 500 lines with roughly 350 lines of test code. Maybe that's still trivial, but it doesn't feel trivial. Maybe it doesn't feel trivial because it does a lot. Those 500 lines of code cover so many features. Yesterday night I started to add a bunch more features, and realized a few spots needed some heavy refactoring to achieve what I want. Here's what I wanted to add:

  • New line support. Yes, my previous lexer could only lex one line at a time. Syntax errors indicated the offset, but not the line. A parser should know what line and file the token occurred in.

  • Better handling and testing of syntax errors.

  • Separate Scala lexing from Lexer core, providing a reusable Lexer framework for Lexing any language.


I had a few other goals as well. I wanted to have the tests be much more organized, mostly arranging for testing individual Finders. So far, everything was lumped together in one giant test. I wanted to do a general cleanup of my Lexer trait, and extract the mutability from it. And as usual, I wanted to make sure things were nice and clean and short and readable. I have a few goals on my plate that I didn't get to as well. Comments, Strings, Chars, peeking at the next token without mutating. Some of that stuff might require the Finders to be away of a Context of some sort. I hope not. Also, maybe I could write a preprocessor to replace any comments with white space.

Multi Line Support


Anyway, lets get to the new line support. Like I said, the main goal was to have better syntax error support, knowing the line and the offset, not just the offset. But the parser needs to more information that just the offset, as well. In order to do this, I decided to introduce the idea of a CompilationUnit, which holds all the code. Previously, the Lexer just worked with a single array.

Old code:

trait Lexer extends SyntaxErrorHander {
var chars: FunctionalCharArray
var currentIndex = 0

New code:

trait Lexer extends CompilationUnitWalker with SyntaxErrorHander {
var unit: CompilationUnit

Notice something else important here. The old code held its chars and was responsible for handing its own mutation, and the mutation was a bit cluttered in with the lexing logic. It wasn't horrible, but not great. Now the mutation is all self contained in CompilationUnitWalker, who manages the current line and current character pointers. I'll save the listing for that trait until the end, as its where all the ugliness lives. But that's the good news, its no longer in Lexer. For now though, it helps to see the interface for that trait:

trait CompilationUnitWalkerInterface {
var unit: CompilationUnit
def skipChar: Unit
def skipChars(i: Int): Unit
def skipLine: Unit
def currentLine: FunctionalCharArray
def currentLineNumber: Int
def currentOffset: Int
def eofReached_? : Boolean
}

The finders need the current line, Tokens and errors need currentLineNumber and currentOffset, and the Lexer itself needs to know if it processed the whole file (eofReached_?), and needs commands to move in the array (the skip methods). Since some of this stuff wasn't in the original code, this was a refactoring and a feature add at the same time. No big deal though, just pointing it out.

I'll show the remainder of the Lexer code in the next section.

Syntax Error Handling


We're a little bit closer to getting syntax errors with line numbers in them. I didn't show the SyntaxErrorHandler trait in the last post, but it was trivial. It just took a SyntaxError Token (which I've decided was stupid because SyntaxErrors are not tokens), and it just printed them out. Here's both now:

Old code:

trait SyntaxErrorHander{
def syntaxError(error:SyntaxError):Unit = {
println(error)
}
}

New code:

trait SyntaxErrorHander{
def syntaxError(unit: CompilationUnit,
lineNumber: Int, offset: Int):Unit = {
println("Syntax Error(" + unit + ", line number:" +
lineNumber + ", offset:" + offset + ")")
}
}

Not much different, but now the handler takes the CompilationUnit, and the line number and offset. Those are all we really need to handle the requirement. And I do it like this:

trait AddToCompilationUnitSyntaxErrorHander extends SyntaxErrorHander{
override def syntaxError(unit: CompilationUnit,
lineNumber: Int, offset: Int):Unit = {
super.syntaxError(unit, lineNumber, offset)
unit.syntaxError(lineNumber, offset)
}
}

The code for calling into the syntax error handler in the Lexer actually shaped up nicely as well:

trait Lexer extends CompilationUnitWalker with SyntaxErrorHander {
var unit: CompilationUnit

def finders: List[FunctionalCharArray => Option[Lexeme]]

def nextToken: Token = {

// if we've already lexed the whole file, get the f out.
if (eofReached_?) return EOF(currentLineNumber)

// find all possible Lexemes
val matches =
finders.map(f => f(currentLine)).filter(_.isDefined).map(_.get)

// if we've found no lexemes, syntax error! adjust and try again.
if (matches.isEmpty) {
syntaxError(unit, currentLineNumber, currentOffset)
skipChar
return nextToken
}

// the longest match found should be on top...i think...
val longestMatch = matches.sort(_.size >= _.size).head

// deal with the best lexeme
handleLexeme(longestMatch)
}

def handleLexeme(lex: Lexeme) = lex match {
case NewLine => {
skipLine
nextToken
}
case WhiteSpace(_) => {
skipChar
nextToken
}
case lex => {
val indexOfLexeme = currentOffset
skipChars(lex.data.length)
Token(lex, currentLineNumber, indexOfLexeme)
}
}
}

In the middle of the nextToken we check to see if matches is empty. If it is, we haven't found any Lexeme's and we call the syntaxError method with the current CompilationUnit (which we hold), and the currentLineNumber, currentOffset which we get from the walker. Done!

Also notice the handleLexeme method at the end, because this completes the first requirement. When it creates a new Token, it passes the currentLineNumber as well. I'm still debating putting the CompilationUnit in the Token as well. Certainly a parser will know which unit its working on, but it might be helpful. I guess I'll wait til I start writing my parser to find out.

Refactoring Towards Reuse


Refactoring Towards Reuse was by far the most broad and complicated requirement/idea in the set - I thought. But as it turns out, it only took a few minutes total. All I had to do was move a few files, change a few names, honestly almost nothing.

First, I created a package called scala under lex, created a Scala file called ScalaLexer.scala, and started pulling in anything that looked Scala specific. I ended up with this:

trait ScalaLexer extends Lexer with ScalaFinders with
AddToCompilationUnitSyntaxErrorHander

trait ScalaFinders extends
ScalaCharFinders with NumberFinder with ScalaIdentifinder with
WhiteSpaceFinder with CaseClassFinder with CaseObjectFinder

case object CaseClass extends SimpleLexeme("case class")
case object CaseObject extends SimpleLexeme("case object")

object ScalaIOLBuiler extends IdentifierOnlyLexerBuilder{
def apply(cu: CompilationUnit) = {
new ScalaIOL{ var unit: CompilationUnit = cu }
}
}
trait ScalaIOL extends IdentifierOnlyLexer with ScalaIdentifinder

case class ScalaTwoPartIdentifinder(
w1: String, w2: String, l: Lexeme) extends
TwoPartIdentifinder(ScalaIOLBuiler, w1,w2,l)

trait CaseClassFinder extends LexemeFinder {
override def finders =
super.finders :::
List(ScalaTwoPartIdentifinder("case", "class", CaseClass).find _)
}

trait CaseObjectFinder extends LexemeFinder {
override def finders =
super.finders :::
List(ScalaTwoPartIdentifinder("case", "object", CaseObject).find _)
}

trait ScalaCharFinders extends LexemeFinder {
override def finders = super.finders ::: CharFinders(
Underscore, Comma, Dot, Eq, Semi, Colon, LeftParen,
RightParen, LeftCurly, RightCurly, LeftBrace, RightBrace
)
}

I also moved Identifinder into the Scala package as well, since it's Scala specific. All in all, I didn't have to do much. It's only about 30 lines of code and a lot of it can probably be cleaned up and reduced further.

Now, if I wanted to lex Java I'd probably have to write a similar 30 lines of code, plus a JavaIdentifinder. I should do it, because then I'd find further areas for factoring out common code.

So I guess in my next installment I'll do just that, and, I'll show the testing at that point too. The testing is coming along really nicely. At some point I plan to contrast this code with the real Scala lexer code, and the tests for it (I still have to find those).

For now I'll give a quick snippit of the testing I did for handling multiple lines, but I'll wait to explain it:

trait NewLineTests extends LexerTest{

test("new lines") {
checkCode("x,\ny",
(0,0) -> Identifier("x"),
(0,1) -> Comma,
(1,0) -> Identifier("y"))
}

test("more new lines"){
checkCode("""x,
y,
z""",
(0,0) -> Identifier("x"),
(0,1) -> Comma,
(1,4) -> Identifier("y"),
(1,5) -> Comma,
(2,6) -> Identifier("z"))
}
}

That's all for now. It's late.

Scala over Ruby - My Debate Ends

I've maintained posts about things I like in better about Scala and/or Ruby, but, it's time to put an end to the debate. I've been doing Ruby every day for almost 6 months now, and finally conclude that for me, it just doesn't feel nearly as nice as Scala. It's not ever close, to be blunt.

My main reason? Refactoring. Over the years I've become very good at refactoring; I've actually been called a refactoring machine. I have a lot of experience refactoring really, really terrible code. Yes, this sucks, I have a history of picking the wrong job. Fortunately though, I also have some experience refactoring really nice code as well. The code that I'm working on now in Ruby is fairly new, and quite good. All the Scala code that I write is pretty good (room for improvement, but pretty good).

I can refactor so easily in Scala with huge confidence and I can't do that in Ruby at all. In Ruby:

  • It takes a long time to make major refactorings.

  • I'm never fully confident in my refactorings and almost always get annoyed by runtime errors.

  • Sometimes the stack traces are all messed up and I can't figure out where my actual error is.

  • Most code you run into isn't going to have enough test coverage to help anyway.

  • For all the preachers of TDD and instant feedback - tests untimately/inevitably take (far) longer than the compiler. So I really don't have the instant feedback I need.

  • If you skip refactorings in Ruby because they are hard, your code becomes harder and harder to refactor. This goes for statically typed languages as well, but, it's a lot harder with Ruby, especially considering the points above.

  • Ad infinitum...

In my next post I'm going to cover a bunch of major refactorings I just did to my Lexer, and how easy it was.

Tuesday, May 12, 2009

A Scala Lexer

This is a really long post, but IM(not so)HO, its really fun!

Sunday I decided, "I'm going to write my own Lexer for Scala", simply because I felt like it. I'm only about 8 hours in, but I've got a lot of functionality. And its only about 300 lines of code so far. Now, that 300 lines doesn't cover nearly all of Scala. It's lacking XML handling, and comments, has limited support for numbers, and String literals, Unicode characters and error handling. But, it does have a lot of nice features including the ability to understand most (maybe all) identifiers, and operators.

There are other really nice things about it as well. First, its very functional - it uses HOF's to recognize tokens and it's mostly immutable, with its mutable state isolated to one very small area. Second, the tests for it are simple end elegant. Third, its quite reusable, as I've managed to leverage several components from within itself.

Before I get to the implementation, let me show how to use it. It can be done very simply at the command line. Example:

scala> import com.joshcough.compiler.lex._
import com.joshcough.compiler.lex._

scala> val code = "x"
code: java.lang.String = x

scala> val lexer = new SimpleLexer(code) with Identifinder
lexer: com.joshcough.compiler.lex.SimpleLexer with com.joshcough.compiler.lex.Identifinder = SimpleLexer(FunctionalCharArray([C@b6aea4))

scala> lexer.nextToken
res0: com.joshcough.compiler.lex.Token = Token(Identifier(x),0)

scala> lexer.nextToken
res1: com.joshcough.compiler.lex.Token = EOF(1)

In this code I create a Lexer only capable of recognizing identifiers by mixing in the trait Identifinder (and yes, that trait name is awesome). The "code" that I use here is simply the String x. When I ask the lexer for its tokens, it gives me back exactly what I'd expect, Token(Identifier(x),0), and EOF(1). This means it found the identifier x at offset 0, and found EOF at index 1. Simple, but obviously not yet very useful. Luckily, the identifier recognition is much more powerful. Let's see some more examples (From now on, where appropriate, I'll remove redundant type info from the output or replace it with "...", and simply indent the output):

scala> lexer lexing "-->"

scala> lexer.nextToken
Token(Identifier(-->),0)

scala> lexer.nextToken
EOF(3)

scala> lexer lexing "_ewrk_1212_adf435445_^^^"

scala> lexer.nextToken
Token(Identifier(_ewrk_1212_adf435445_^^^),0)

Much better. But what happens if I pass in something that the lexer shouldn't understand?

scala> lexer lexing "1"

scala> lexer.nextToken
SyntaxError(0)
res4: com.joshcough.compiler.lex.Token = EOF(1)

Not too terrible. "1" isn't a valid identifier and since this lexer only recognizes identifiers, it spits out "SyntaxError(0)", because it did not recognize the input starting at position 0. Now, this is actual System.out output. That's all you get by default. The actual Scala compiler simply adds an error to the Compilation unit with the offset. There really isn't much a lexer can do with errors. Later, I'll show how we can cache them off as well.

The last lexer failed to recognize numbers. That's easy enough to fix. Simply mix in NumberFinder:

scala> val lexer = new SimpleLexer(code) with
Identifinder with NumberFinder

scala> lexer lexing "123"

scala> lexer.nextToken
Token(Number(123),0)

scala> lexer lexing "-->"

scala> lexer.nextToken
Token(Identifier(-->),0)

Now we can recognize identifiers and numbers. Next up we'll start recognizing all of these single characters: [ ] { } ( ) , . : _ = ; This as just as easy as the others - just mix in CharFinders

scala>lexer lexing "(x,y);"

scala> lexer.nextToken
Token(LeftParen,0)

scala> lexer.nextToken
Token(Identifier(x),1)

scala> lexer.nextToken
Token(Comma,2)

scala> lexer.nextToken
Token(Identifier(y),3)

scala> lexer.nextToken
Token(RightParen,4)

scala> lexer.nextToken
Token(Semi,5)

Notice that we haven't seen any white space yet. In fact, the lexer here isn't yet capable of handling white space. To do that - you guessed it - mix in WhiteSpaceFinder.

scala> val lexer = new SimpleLexer(code) with Identifinder with
NumberFinder with CharFinders with WhiteSpaceFinder
lexer lexing "( x , y );"

scala> lexer.nextToken
Token(LeftParen,0)

scala> lexer.nextToken
Token(Identifier(x),2)

scala> lexer.nextToken
Token(Comma,4)

scala> lexer.nextToken
Token(Identifier(y),6)

scala> lexer.nextToken
Token(RightParen,8)

scala> lexer.nextToken
Token(Semi,9)

Notice that like any good lexer, this lexer simply skips over the white space, handing you the next actual token at its right location.

There are a few more, such as RocketFinder (finds =>), CaseClassFinder (finds "case class" and returns it as one Lexeme instead of two seperate identifiers), and CaseObjectFinder (which does exactly the same thing). But, I'm not going to show examples of their usage. I want to get to the implementation.

Lexer Implementation


Earlier I made claim that the Lexer was "mostly immutable". Of course, it has to be somewhat mutable to return something different to the parser on successive calls to nextToken. To do this, the Lexer uses a FunctionCharArray, which is just a very thin wrapper around Array[Char].

case class FunctionalCharArray(chars: Array[Char]) {
def skip(i: Int) = new FunctionalCharArray(chars.drop(i))
def size = chars.size
def nextChar: Option[Char] = get(0)

def get(index: Int): Option[Char] = {
if (chars.size > index) Some(chars(index)) else None
}
}

Taking a brief peek into the Lexer we find that its the FunctionCharArray that is mutated:

trait Lexer extends SyntaxErrorHander {

var chars: FunctionalCharArray

var currentIndex = 0

def finders: List[FunctionalCharArray => Option[Lexeme]]

...

There are a few other interesting bits here. The lexer also mutates its currentIndex. It puts the currentIndex into the Tokens when Lexemes are found. We'll see how that works just a bit later on. Most importantly though, is the fact that almost everything else runs through the Lexer's finders.

The finders are actually quite simple. Given a FunctionalCharArray, the finders return an Option[Lexeme]. They return Some(Lexeme...) if they found what they expected at the very beginning of the array, and None otherwise. The remainder of the array is simply ignored by the finder. A few simple examples should help.
  • The Identifinder returns Some(Identifier(x)) when its given the an array containing 'x' as its only element, and None if the first character in the array is a character that can't legally start an identifier.
  • The NumberFinder returns Some(Number(123)) when given this array: ['1', '2', '3', ' ', '+', ' ', '7']
  • The CharFinders only ever look at the first element in the array.

To fully demonstrate just how simple they are, we need to see some. Have at you!

trait WhiteSpaceFinder extends LexemeFinder {
override def finders = super.finders ::: List(findWhiteSpace _)

def findWhiteSpace(chars: FunctionalCharArray): Option[Lexeme] = {
chars.nextChar match {
case Some(' ') | Some('\t') | Some('\n') => Some(WhiteSpaceLex)
case _ => None
}
}
}

The first thing to notice about this finder is that it immediately adds a finder method to it's super. This is what enabled us to keep mixing in finder after finder in the beginning of this post.

The finder that it adds is its own findWhiteSpace method. As I explained, it takes a FunctionalCharArray and returns an Option[Lexeme]. In this case, it peeks at the top character in the array, and it that char is a space, tab or new line, it returns a Some with the Lexeme, and if its not, it returns None. Simple.

That one is pretty trivial though, it only needs to look at one character. Let's take a look at one that's more involved. Here is a version of CaseClassFinder. It's not the exact implementation, but I've only changed it slightly to help demonstrate.

class IdentifierOnlyLexer(var chars: FunctionalCharArray) extends
Lexer with WhiteSpaceFinder with Identifinder

trait CaseClassFinder extends LexemeFinder {
override def finders = super.finders ::: List(findCaseClass _)

def findCaseClass(chars: FunctionalCharArray) = {
if (chars.size < 10) None
else {
val lexer = new IdentifierOnlyLexer(chars)

(lexer.nextToken, lexer.nextToken) match {
case (Token(Identifier("case"), _),
Token(Identifier("class"), _)) => {
Some(CaseClass)
}
case _ => None
}
}
}
}

I love this example because it's small, but not trivial, and because it reuses components in the lex package. Before I explain, let's think about what a CaseClassFinder should do. If it see's "case", followed by "class" at the beginning of the array, then it should return the Lexeme CaseClass. Otherwise, it should return None. But, its more important to think about that in a slightly different way:

If a CaseClassFinder finds the Identifier "case" followed by the Identifier "class", then it should return CaseClass.

Well, we already know how to find identifiers...it was the first thing I showed in this post! As it turns out, that's exactly how CaseClassFinder does it as well. It creates a new Lexer with its input, an IdentifierOnlyLexer (technically, it could fire up a Lexer with all the finders, but it's overkill). It then asks the Lexer for its first two tokens, and if those Tokens contain the Lexemes Identifier("case"), and Identifier("class") the it knows its found a case class. BAM!

The Indentifinder and NumberFinder traits are too long to show here. I'll post a link to the codebase.

Now, we have a few things to do to wrap up. I still need to show the how the Lexer itself actually uses the finders. And, the astute reader might have already realized that two finders might both return a Lexeme. For example, given the Array [':',':'], the Colon finder would return Colon, and the Identifinder would return Identifier(::). The Lexer handles this case easily. The real value must be the longer of the two. Simple as that. Now lets take a look at all of Lexer.

Lexer Implementation (for real this time!)


1 trait Lexer extends SyntaxErrorHander {
2
3 var chars: FunctionalCharArray
4 var currentIndex = 0
5
6 def finders: List[FunctionalCharArray => Option[Lexeme]]
7
8 def nextToken: Token = {
9
10 def skip(i: Int) {currentIndex += i; chars = chars.skip(i)}
11
12 val token = nextToken_including_whitespace_and_syntaxerror
13
14 token match {
15 case SyntaxError(i) => {
16 syntaxError(SyntaxError(i))
17 skip(1)
18 nextToken
19 }
20 case EOF(_) => token
21 case Token(WhiteSpaceLex, _) => skip(1); nextToken
22 case Token(lex, _) => skip(lex.data.length); token
23 }
24 }
25
26 private def nextToken_including_whitespace_and_syntaxerror = {
27 if (chars.nextChar == None) EOF(currentIndex)
28 else {
29 val lexemesFound: List[Lexeme] = {
30 finders.map(f => f(chars)).filter(_ match {
31 case Some(t) => true
32 case _ => false
33 }).map(_.get)
34 }
35
36 if (lexemesFound.size == 0) return SyntaxError(currentIndex)
37
38 val lexemesSorted =
39 lexemesFound.sort(_.data.size >= _.data.size)
40
41 Token(lexemesSorted(0), currentIndex)
42 }
43 }
44
45 def lexing(s: String): Lexer = {
46 lexing(new FunctionalCharArray(s.toCharArray))
47 }
48
49 def lexing(cs: FunctionalCharArray): Lexer = {
50 chars = cs; currentIndex = 0; this
51 }
52}

First, look at the nextToken_including_whitespace_and_syntaxerror which starts on line 26. This method returns the best Token possible (like :: vs : ), but more importantly, it always returns a Token. It returns WhiteSpace and SyntaxError tokens as well. The calling method, nextToken, is the guy in charge of filtering those out, resetting, and continuing. But, well get to that in just a second. For now, lets enumerate the nextToken_including_whitespace_and_syntaxerror methods steps.

  1. The first thing it does is check if there are any characters left. If there are none, it simply returns EOF. Any more calls to nextToken will continue to return EOF until the Lexer gets new data (via the lexing methods).
  2. It then calls all of its finders with the current array: finders.map(f => f(chars))
  3. It then immediately filters the results, because its only interested in any finders that actually returned Some(Lexeme...), and not none. Of course.
  4. At that point (line 36) it checks to see that someone actually found a Lexeme (if (lexemesFound.size == 0) return SyntaxError(currentIndex)). If no finders found anything, the we must have some unrecognized character in the input. Syntax Error!
  5. On line 38 it sorts the Lexemes by size, aiming to get the largest.
  6. Finally it returns the largest token (ignoring any other matches).

Now I'll explain the nextToken method, and then wrap up.

The first action nextToken takes is to call the nextToken_including_whitespace_and_syntaxerror on line 12. With that token, it must make a decision.

  1. If it receives a SyntaxError, then call the notify the syntax error handler of it, and in an attempt to reset/restart - move ahead one space in the input, and recur.
  2. If it receives an EOF, just hand it over to the Parser, it should know what to do with it.
  3. If it receives WhiteSpace, then also move ahead one space and recur. There might be a better strategy here (and/or for syntax errors), but this one is simple, and works ok.
  4. Finally, if it gets a legitimate token, then move the entire array ahead by the length of the Lexeme that was found, and return the token. The next time nextToken is called, it should start at the new location, one character beyond the end of this token.


Wrapping up, the way I've implemented this, I don't think it will be that difficult at all the add in missing features. XML will probably be a bear, just because. I guess I don't have String literal support at all, so I'll add that next, and Char literals too. They should be mostly straightforward.


If you're actually still reading, awesome! Thanks! I hope you've learned something. You can find all the actual source code here. Not bad for about 8 hours huh?

Oh BTW, my only two references for this were the Dragon Book (2 ed), and the Scala compiler itself.

Bye!

Monday, May 04, 2009

Typed Lambda Calculus For Me (And Maybe You)

I'm reading chapter 10 of Lambda-Calculus and Combinators by J. Roger Hindley and Jonathan P. Seldin, and I wanted to put down some notes. These notes are mostly for myself, much in the spirit of why I started this blog - to measure my own progress. However, since I know I have readers I'll write it in a way that people learn something. If people are interested in this though, since I'm already investing time in it, I'd be happy to do into more detail on the beginners stuff and/or some of the history behind this (I might just do it anyway). Also if anyone smarter than me finds any errors here, or has tips to explain it better, please do. Okay Go.

In the book there is this nugget:

(λx:p->ø->t.
(λy:p->ø.
(λz:p.((x:p->ø->t z:p):ø->t (y:p->ø z:p):ø):t):p->t
):A
):B

You (the reader) are supposed to solve this for A and B.

For those of you unfamiliar with lambda calculus, and/or typed lambda calculus, this is a lot easier than it looks. What I'll do here is try to explain the steps nice and slow, and then at the bottom maybe try to write this in Scala to demonstrate.

Explicit Type Information


Ok, the important thing here is that we are given a lot of information. Here's a list of the things we definitely know about the types in the statement above:
  1. x:p->ø->t

  2. y:p->ø

  3. z:p

In more detail:
  1. x is a function that takes a p and returns a function that takes an ø and returns a t.

  2. y is a function that takes a p and returns an ø

  3. z is simply a p

Or in Scala:
  1. val x:p=>ø=>t

  2. val y:p=>ø

  3. val z:p

How do we know this? Well, that's basic typed lambda calculus. Here are a couple of very basic notes:
  1. The statement (λx:p->ø->t.M) is a function that takes an argument, x, which is of type p->ø->t. Arrows imply function types and hence x is a function that takes a p and returns a function that takes an ø and returns a t.

  2. The return value of the function in this tiny example is M (above is much more complicated, but that's what were getting to).

Function Application


Now, using what we know about the types, and the rest of the statement, what else can we figure out? Hmm, I guess we should talk about function application, by looking at the innermost bit of the statement above.

We see this: (x:p->ø->t z:p):ø->t, and this means we are calling the function x with the argument z. Remember that everything after the colons are simply types. Ignoring the types, we could just write (x z) which is basic function application in lambda calulus. In Scala this would be simply x(z). And what type does calling x yield? Not to beat it into your brain, but remember that x is a function that takes a p and returns a function that takes an ø and returns a t. Calling it with a p will yield a function that takes an ø and returns a t, or more simply ø->t.

In the original statement, we had (x:p->ø->t z:p):ø->t, and this is correct, as I just explained.

That isn't the only function application in the statement either. Right next to it we see (y:p->ø z:p):ø. Does this check out? Remember that y is a function that takes a p and returns an ø, and z is a p. Perfect. Applying y with argument x yields a ø.

Now, that fact that I said "right next to it" is meaningful. Now that we've resolved the first two applications, we end up with another application! The result of the first application ((x:p->ø->t z:p):ø->t) was ø->t, and the result of the second was a ø, and these also match. Applying ø->t to an ø results in a t. And now we've finished this big inner statement: ((x:p->ø->t z:p):ø->t (y:p->ø z:p):ø):t.

Building Functions


Now that we've looked at the internal parts of the statement boiling it all down to a t, we can start to work our way outwords. Recall that λ is used to denote a function. λx:Int."hey" is a function that takes an Int (named x) and returns a String, "hey". Fully typed this would be (λx:Int."hey"):String

Working our way one step outwards from our function applications, we have (λz:p.((x:p->ø->t z:p):ø->t (y:p->ø z:p):ø):t):p->t. Recall that we already determined the inner part to be a t. We can now thing of this as (λz:p.(some arbitrary t)):p->t. And that's correct. This is a function that takes a argument z of type p, and returns a t. Hence its type: p->t.

Solving for A


Lets take a look back to remember what were trying to solve. One thing we need to find is A:

(λy:p->ø.
(λz:p.((x:p->ø->t z:p):ø->t (y:p->ø z:p):ø):t):p->t
):A

We've already determined the inner part to be correct - its p->t. Using what we just learned about creating functions with λ, and what we learned about types earlier, we know that (λy:p->ø. whatever):A builds a function that takes an argument, y:p-ø. That is - an argument y of type function from p to ø.

Additionally, this new function returns an A. But remember that the type of a function is not just its return type, it is (input type -> output type). So A must be the functions input type (which we know to be the type of the argument y, or p->ø) -> the functions return type (which is the return type of the inner statement, which we solved earlier to be p->t).

A is (p->ø)->(p->t).

Solving for B


To Solve for B, we do exactly what we did solving for A.

Recall:
(λx:p->ø->t.
(λy:p->ø.
(λz:p.((x:p->ø->t z:p):ø->t (y:p->ø z:p):ø):t):p->t
):A
):B

Or:
(λx:p->ø->t.whatever:A):B

Clearly the first half of B (the input type) is (p->ø->t). The output type is the type of whatever, which (from looking just above) is A.

So B is (p->ø->t)->A

And fully expanded:

B is (p->ø->t)->(p->ø)->(p->t)

Formal Listing


This came directly from the book. I'm not sure if I can even legally put this all here. If I hear that I can't, I'll immediately take it down. All the information here is explained in detail above.


x:p->o->t z:p y:p->o z:p
----------------- ---------------
(xz):o->t (yz):o
--------------------------------------------
(xy(yz)):t
-----------------------
(λx:p.(xy(yz))t):p->t
-----------------------------------------------
(λy:p->o((λx:p.(xy(yz))t)p->t)): (p->o)->(p->t)
------------------------------------------------
(λx:p->o->t ((λy:p->o((λx:p.(xy(yz))t)p->t)):
(p->o)->(p->t))): (p->o->t)->(p->o)->(p->t)

The Scala Code To Prove It


trait Lambda {
type p
type ø
type t
type A = (p=>ø)=>(p=>t)
type B = (p=>ø=>t)=>A

val f:B = {(x: p => ø => t) => {(y: p => ø) => {(z: p) => {(x(z)(y(z)))}}}}
}

Saturday, April 25, 2009

Scala Testing Presentation Outline

Here's the outline for my upcoming presentation on Scala and Testing. It's a bit ambitious, and I know I won't get through it all, but that's okay. If there's anything glaringly missing, now is the last chance to let me know. It also seems like this could be the start of something bigger, like a short book called something like, "Scala Testing for Java Developers"...maybe. However, the talk is only geared toward Java developers for the first few minutes or so.

Without further ado:

1) The prereqs ... Just Java code
a) "Production" code
- A simple List interface
- An ugly LinkedList implementation
b) Testing Java with Java
- JUnit
- Testng

2) The basics ... Testing Java code with Scala
a) JUnit with Scala
- JUnit3 (test, setup/teardown)
- JUnit4 Annotations
b) TestNG with Scala
- @Test
- @DateProviders
- All Annotations available

3) The basics ... More testing Java with Scala - Intro to ScalaTest
a) Suite
- test methods
- assert with ===
- expect(x){}
- intercept
b) BeforeAndAfter
c) TimedTest (displays time information for each class and test ran)
d) TestNGSuite
- decorate original TestNG test from step 3, "with TestNGSuite"
- run in both runners

4) Scala extending/implementing Java classes/interfaces
a) ScalaLinkedList (simple extention of Java LinkedList class)
b) ScalaMutableList (ugly impementation of simple List interface)

5) Testing Scala with Scala
a) Briefly redo #4 - but by testing the new Scala code.
b) minor intermission - Can we backtrack and test the new Scala code with Java?

6) A nicer, Functional List implemention in Scala

7) Testing Scala with Scala
a) FunSuite
-(including HOF's for those who don't know...maybe?)
-explain how tests methods work

8) Advanced ScalaTest
a) ScalaTest matchers (in a FunSuite)
b) Overview of OOTB matchers
-String, List, Hash, Option matchers, etc.
c) Combining matchers with and/or
d) Extending ScalaTest matchers - Creating your own matchers

9) Specs
a) Specifications
b) Matchers
c) Extending Specs matchers - Creating your own matchers

10) Mocking in Scala (via Specs)
a) Mockito
b) JMock

11) ScalaCheck (via ScalaTest Checkers)
a) Properties
b) Generators
c) Arbitrary Magic

12) Ant
a) ScalaTest
b) JUnit Scala and Java run together
c) Specs runs as JUnit
d) TestNG Scala and Java run together

13) Scala Frameworks Interoperability
a) ScalaTest and Specs
b) ScalaTest and ScalaCheck
b) Specs and ScalaCheck

Sunday, April 19, 2009

IntelliJ Now Supports ScalaTest

IntelliJ 8 now has support for ScalaTest (at least as of build 9805). I believe it comes in via the Scala plugin. It works, but, when I right clicked on my test class I expected to see a run option for ScalaTest and I didn't, which stinks. Not seeing it also made me think it just didn't work. It does.

To run a ScalaTest test in IntelliJ, you have to add a run configuration. Most Intellij users probably know how to do this, but for those who don't, here are the steps.

  1. Click "Edit Configurations" in the Run menu dropdown (right next to the green play arrow).

  2. Click the + button to add a new run configuration. This brings up a dropdown containing different runners.

  3. ScalaTest is in the list, choose it.

  4. Fill in the classname of your test by typing it, or selecting the "..." button to bring up the class finder dialog.

  5. Click ok.

  6. Run!

Thanks IntelliJ Scala guys! Now if you can only get that right click thing working...

Quotes

Quote from Gilad Bracha

"Tangent: Yes, I did spend ten years at Sun. A lot of it was spent arguing that Java was not the final step in the evolution of programming languages. I guess I’m just not very persuasive."

Gilad is one of my heroes. The work he's doing on Newspeak seems very, very promising.

Sun totally failed in ignoring him. I remember when he left he posted an unflattering blog about Sun, about how he felt that he couldn't get anything done there. He was very frustrated. I'm very glad to feel like I have something in common with such a smart guy.

Quote from Anders Hejlsberg

From Masterminds of Programming, which is (so far) an absolutely fabulous book:
"I think to be really good at something, you have to be passionate about it too. That's something you can't learn. That's just something that you have, I think. I got into programming not because I wanted to make lots of money or because someone told me to. I got into it because I just got totally absorbed by it. You just could not stop me. I had to write programs. It was the only thing I wanted to do. I was very, very passionate about it.

You have to have that passion to get really good at something, because that makes you put in the hours, and the hours are the real key. You need to put in a lot of work."

Anyone who knows me well knows that this describes me exactly. I particularly like, "You just could not stop me. I had to write programs, It was the only thing I wanted to do." I guess I have an important quality in common with another of my heroes. For some childish reason, I'm very excited about this.

Quote from Bill Venners

"I charge $1 for each paren, and $598 for knowing where to put them!"

'Nuff said!

Friday, April 17, 2009

Tests and Booleans and Matchers

I was working with the new ScalaTest matchers API recently when I came across something that made me think (I usually don't think). I had something simple like:

Nil.empty must be(true)

This seems harmless enough, but when I made it fail intentionally the error message I got was something reasonably similar to:

Expected true but got false.

This obviously isn't a very good error message, and I was thinking at first the maybe we should provide the user API to provide their own error message. Something like so:

Nil.empty must be(true, "expected Nil to be empty, but it wasn't!")

This also seems harmless enough. However, I talked this over with Bill Venners and in this case there is a way to do it using symbols and reflection that gives better error messages:

Nil must be('empty)

If this fails, it will give a nice error message like:

Nil was not empty

This is ok, but type safety people will scream, and they'd probably be right. And, does this work in all cases and can we always get good error messages? I think no. Let me explain why.

What if we have an object that has a method that takes an argument and returns a boolean like this theoretical surroundedBy method that could exist on String:

def surroundedBy( s: String ) = startsWith(s) && endsWith(s)

We can't use the symbol form here because the method takes a parameter. So what can we do? If we go back to the original form then we are back to our original bad error message.

"htth".surroundedBy( "h" ) must be (true)
"http".surroundedBy( "h" ) must be (false)

Do we have any other options? Yes (with caveats as well). But first lets think about what we'd really like to write here. If we were writing pure English, it would be this:

"htth" must be surrounded by "h"
"http" must not be surrounded by "h"

Matchers


ScalaTest 0.9.5 came out with a new Matchers API that I'll touch on briefly here, and point you to documentation for more. Using matchers, we can get our client code pretty close to this. We'll have to define our custom matcher first:

case class surroundedBy(r:String) extends BeMatcher[String] {
def apply(l: String) =
MatchResult(
l.surroundedBy(r),
l + " was not surrounded by " + r,
l + " was surrounded by " + r
)
}

And our client test code turns out to be pretty close to the English:

"htth" must be(surroundedBy("h"))
"http" must not be surroundedBy("h")

I like this a lot, but, only if its going to be used a lot. Otherwise, if it's a one off test, I could just kick it back way old school and use an assert with an error message:

assert( "htth".surroundedBy("h") === true,
"expected http to be surrounded by h, but it wasnt." )
assert( "http".surroundedBy("h") === false,
"expected http not to be surrounded by h, but it was!!!" )

This is less readable code, but it is less code - about 6 lines less when you include the custom matcher. You get more or less the same error message in this case as well. It is quite ugly though. In this case I'd much rather have an API that just gives me what I originally wanted:

"htth".surroundedBy("h") must be(true,
"expected htth to be surrounded by h, but it wasnt.")

or better:

"htth" must be('surroundedBy, "h")

I could see this one existing easily. It's really nice, but not type safe.

Anyway, maybe there already are better alternatives like the ones I've explained I simply don't know about. Maybe I haven't done my homework and I should be scolded. I really hope not, and I'll happily write up (and more importantly - use) anything I find that helps make this code more readable. Until alternatives surface, I guess I'll somewhat reluctantly stay with these rules:

When testing methods that return booleans you can

  • Use the symbol style ('empty) if the method takes no arguments and you aren't terrible concerned about type safety.

  • Use old school assertions when the method takes arguments but you won't have a lot of repetitive tests for the method.

  • Use a custom matcher otherwise


You'll also find that ScalaTest (and Specs which is unfortunately not covered here) provides many matchers that can use for many common types.

In my next post I'll provide a much deeper overview of how to write your own custom matchers in ScalaTest, starting by explaining what that surroundedBy case class means.

In the meantime, you can look at ScalaTest Matchers and Specs Matchers.

Here's all the custom matcher code:

package org.scalatest.examples

import org.scalatest.matchers.MustMatchers._
import org.scalatest.matchers._


class MatcherTest extends FunSuite with MustMatchers{

implicit def surroundable[T](l:String) = new {
def surroundedBy(r:String) = l.startsWith(r) && l.endsWith(r)
}

case class surroundedBy(r:String) extends BeMatcher[String] {
def apply(l: String) =
MatchResult(
l.surroundedBy(r),
l + " was not surrounded by " + r,
l + " was surrounded by " + r
)
}

test("weve got the place surrounded"){
"htth" must be(surroundedBy("h"))
"http" must not be surroundedBy("t")
}
}

Sunday, April 12, 2009

ScalaTest IDE Support coming soon.

I've written something that allows me to write my tests in ScalaTest as normal (don't have to use any TestNG annotations in my tests), but also run them in all the IDE's, piggybacking off the TestNG IDE plugins. The code isn't in ScalaTest yet, but I'm guessing it likely will get in soon.

Until then, anyone using FunSuite can borrow this code and temporarily use MiniFunSuite instead. This all seems to work quite well.

Here's the client/test code:

class FunSuiteExample extends FunSuiteTestNGAdapter{
test("hey"){ println("hey") }
test("fail"){ assert( 1 === 2 ) }
}

And here's the library code:

import org.scalatest.testng.TestNGSuite
import org.testng.annotations.{DataProvider, Test}

trait MiniFunSuite{
var testMap = Map[String, Function0[Unit]]()
def test(name:String)(f: => Unit) = testMap += (name -> f _)
}

trait FunSuiteTestNGAdapter extends MiniFunSuite with TestNGSuite{

@DataProvider{ val name="tests" }
def tests = testMap.map{ case (s,f) => Array(s,f) }.toList.toArray

@Test{ val dataProvider = "tests" }
def scalaTest(testName:String, f: => Unit) = { f }
}

Wednesday, April 08, 2009

Twitter/Ruby/Scala/Blah

For some reason I feel compelled to chime in on this debate. I like both languages. I like Scala more, but Ruby is fun.

(Reader Beware: This post was written in one pass with little to no editing.)

I don't like how people get all steamed about their languages, as if it's part of them. But, if used properly, this energy can be channeled into creating better languages, so I guess its a necessary evil. The problem is, it not often used properly. Developers seem to just get mad at each other instead of talking through things civilly, and thinking critically and objectively. This is nothing new, and nothing that hasn't been said a thousand times I'm sure, but I think serves well as a nice introduction.

Maybe the problem is that so many of the elements of programming languages are based on feel, as opposed to simple fact. I wish we'd all think more logically and have nice intelligent debates, and I wish we could support our arguments more with fact.

Now for a debate. I was tweeting with Daniel Spiewak who said,

"Obie has once again decided to completely miss the point. Twitter's legacy code is bad *because* of Ruby, not in spite. The experiences described by @stevej and @al3x precisely mirror my own: it isn't impossible to write large, maintainable Ruby...just hard."

I asked him if he could elaborate on "because of", and to that he replied,

"Ruby makes it *hard* to architect a large infrastructure and to keep it maintainable. I think that dynamic typing is part of the problem, but not all of it. Possibly the way the module system works? It isn't impossible to do large apps in Ruby (as Obie and Ola remind us), but it takes more effort and more discipline. Java and Scala both make this sort of large scale code base much easier to create and maintain."

Now, I don't necessarily disagree with that (and recall that I do like Ruby). But I'd like to point out that I've been on a lot of Java projects that were completely unmaintainable. You could easily call them absolute disasters. In fact, the majority of the Java projects that I've been on have been disasters (yes, yes get your laughs in and tell me that its my fault...but its not. I did a lot to help those failing/flailing projects). Maybe that's because I've picked bad projects, and haven't many changes to work with great teams on applications built correctly from the start.

But, I think that emphasizes my point a little. Who has? How many projects are built correctly from the ground up, by talented, disciplined developers? I'd say not many. Most developers are horrible, and the language wouldn't make a shred of difference. And for those developers who are really good, the language probably doesn't make that much difference either. A group of solid developers could easily build something far more maintainable in Ruby than the garbage developers who are writing in Java.

But can those same developers build something more maintainable in Java or Scala? I think it depends on a whole number of things (listed at the end of the post). If there's a good chance of turnover on the project, then you probably want something with type information in the code because it serves so well as documentation. But if there's little chance of turnover it probably doesn't matter.

That said, even with turnover, because Ruby is inherently higher level that Java its probably more easy to maintain. Java is just not a good language anymore, it has not stood the test of time. It came about to fix some problems with C++, and had a syntax that appealed to those developers, but that was 1994. It has not evolved. Ruby is so much more expressive than Java that the lack of type information probably doesn't mean much at all.

Scala on the other hand, probably no contest for large projects. I think Twitter made the right decision. They get the type safety, the type documentation, and the high level expressiveness all together. I do like Ruby a lot, but Scala just feels better to me, and feel is very important, even if its not measurable.

But all that could be wrong. How many large projects have actually been built in Scala? Few. I said in the beginning that I want people to think critically and objectively. It would be nice if we could base some of these debates on fact. Are there any facts out there that large Java projects are more maintainable than Ruby? There may well be, and if so, we should get that information out there. Then again, there may be documentation that supports the opposite.

So I'm not really calling for comments here, though they are always welcome. What I am calling for is experiments, evidence, observations, or facts that support that one language is more maintainable than the other for any sized project.

Here are some questions whose answers would be helpful (in no particular order):

  • What language was used?

  • What was the size of the project:

    1. LOC? Production and Test code.

    2. Number of developers?

    3. Number of testers?

  • How long did the project take?

  • What was the amount turnover?

  • What was the experience level of the developers?

  • The amount of time the team has been together?

  • The amount of time the developers have spent with the language used?

  • Type of project? (Obviously if the language chosen was a horrible choice for the project, it makes a difference)


This information is probably pretty difficult to come by. The lack of it shouldn't impede intelligent debate, but the presence of it could help us as a community.