Monday, December 22, 2008

Fun DSL Query Language for Scala

(Quick Note: If you just want to skip to the code, it's interspersed throughout the post. But, for your convenience, I've also put it all in one place at the bottom of the post.)

I've done it! Sick as a dog yesterday, I refactored the crap out of my Scrabble stuff, to create a really powerful query DSL. The kicker? It's only 24 lines of code! It allows you to do all sorts of crazily powerful queries. Lets take a look. I'm so excited! I can't stop typing exclamation points! I actually think this is the best code I've ever written. 1

I'd love to talk about the code smells of my last post, and how I ended up where I did, but to be honest, the code seems so much different now that it would be difficult. I'll just dive in.

Once again I'll start with the test code, which models the original requirements. Don't worry if you don't understand how it works just yet, because I'll explain later. For this first bit, I'm just going to try to explain what it does.

val dictionary = new Dictionary(getWords())
val tray = "defilnr"

// the 4 original requirements
val all = dictionary find includes_all(tray)
val any = dictionary find includes_any(tray)
val only = dictionary find includes_only(tray)
val all_and_only =
dictionary find (includes_all(tray) and includes_only(tray))

print(all, only, all_and_only)

Let me sum up what's going on here.
  • With "all", I've filtered the dictionary to find words that include all the letters in my tray. This produces a small list of very long words - something like: List(adenoliomyofibroma, antifederal, antifederalism, antifederalist, ... )
  • Similarly with "any", I've filtered the dictionary to find words that include any of the letters in my tray. This produces a gigantic list that I won't even begin to list here.
  • With "only", I've filtered the dictionary to find words that include only of the letters in my tray. This produces
  • a very large list - something like: List(d, d, de, dee, deed, deedeed, deer, defend, defender, defer, ... )
  • Finally, the important query, with "all_and_only", filtered the dictionary to find words that include words that include all the letters in my tray, and only the letters in my tray, just as the code reads. This produces something much more helpful to me in Scrabble: List(flinder, infielder)

A key thing to notice is the "and" on this line:

dictionary.filter(includes_all(tray) and includes_only(tray))

But like I said, I'll explain the hows later. Now I'll introduce a couple of new concepts with this line of code:

val all_or_min_10 = dictionary(includes_all(tray) or max_length(7))

The two new concepts on this line are:
  • "or", which is very similar to "and" and very self explanatory.
  • Notice I'm not calling dictionary "find" some_criteria, I'm just calling dictionary(some_criteria). I'll explain how this works later as well. I'm not actually sure I like it, but I figured I'd give it a shot to see how the code looks.

Time for more:

val xin_or_max_4 =
dictionary search (starts_with("xin") or max_length(4))
val van_or_min_4 =
dictionary search (ends_with("van") and min_length(4))

Some more new concepts on these lines are:
  • The "search" method is also an alias for find. But, the actual aliasing is pretty neat, so it will be cool to show how later.
  • Several new criteria functions - starts_with, ends_with, max_length, min_length, which I think are also very self explanatory in what they do.

If I'm moving quickly its because I'm really excited to get into the next part, where I build up a much more complex query.

Complex Queries


val complex_query =
(
(starts_with("d") and ends_with("ing")) or
(starts_with("b") and ends_with("ed"))
) and max_length(7) and min_length(4)

var result = dictionary find complex_query
print(result)

Yes! This seems to have really put it all together. In this query I'm asking for words that start with 'd' and end with 'ing', OR words that start with 'b' and end with 'ed', and any words found must have a maximum length of 7 and a minimum length of 4. I build up the query first, and then ask the dictionary to execute it for me. This seemingly innocent feature plays an important role in abstraction. Which hopefully we'll get to see in just a bit.

Executing this query gives me something like: List(babied, backed, bagged, baked, balled, banded, bangled, banked, barbed, barred, ... dueling, duffing, dumping, during, dusting, dyeing, dying)

Adding onto existing queries


But, I'm not satisfied with my results. Maybe I can't seen to find a place on the Scrabble board where an "a" can be used, and I don't have one in my tray. I want to re-execute my query, but remove all the a's. Here's how:

val complex_query_without_a = complex_query and includes_none("a")
result = dictionary find complex_query_without_a
print(result)

Fancy Schmancy! Executing this query gives me: List(bebed, bebled, bedded, beedged, beetled, beeweed, beinked, beliked, ... dueling, duffing, dumping, during, dusting, dyeing, dying)

Language Features


Before we go into the hows, lets take a very, very quick theory break. Taken straight from SICP, any language gives you three things:
  1. Primitive Expressions
  2. A means of combination
  3. A means of abstraction

Does this little DSL cover those criteria? Yes (though point 3 is somewhat shaky). Anyway, as we cover the hows, we'll refer back to those points to demonstrate how this DSL fulfills them.

Primitive Expressions


We're finally to the implementation (though I suppose many of you probably just scrolled down).

The query criteria methods are basically our primitive expressions in this DSL. 2 Here is the list of primitives provided, and their implementations. I suspect that other primitives will be added soon.

1 object WordMatchers{
2 def includes_only(s: String) = (t: String) =>
inverse(s).foldLeft(true)((b,c) => b && (! (t contains c)))
3 def includes_none(s: String) = (t: String) =>
s.foldLeft(true)((b,c) => b && (! (t contains c)))
4 def includes_all(s: String) = (t: String) =>
s.foldLeft(true)((b,c) => b && (t contains c))
5 def includes_any(s: String) = (t: String) =>
s.foldLeft(false)((b,c) => b || (t contains c))
6 def starts_with(s: String) = (t: String) => t startsWith s
7 def ends_with(s: String) = (t: String) => t endsWith s
8 def max_length(max: Int) = (s: String) => s.trim.length <= max
9 def min_length(min: Int) = (s: String) => s.trim.length >= min
10 def inverse( s: String ) =
"abcdefghijklmnopqrstuvwxyz".filter(!s.contains(_) )
11 }

Ok...I hope no one kills me when I say I'm going to save the how for this one for the next post. That's the implementation. It's pretty crazy, and can probably only be read by pretty experienced developers. I'm really tired and still not feeling great, so I'm going to wait so I don't mess it up.

Means of Combination


The means of combination in this DSL are very simple (at least from a users standpoint). They are simply and and or.

The implementation of those combinations are pretty complex though.

1 object Criteria{
2 implicit def funcToCriteria[T](f: Function1[T, Boolean]) =
new Criteria(f)
3 }
4 class Criteria[T](val f: Function1[T, Boolean]) {
5 def apply(t: T) = f(t)
6 def && (c: Criteria[T]): Criteria[T] = (t: T) => f(t) && c.f(t)
7 def || (c: Criteria[T]): Criteria[T] = (t: T) => f(t) || c.f(t)
8 }

Criteria is some seriously powerful magic, and is totally awesome. I'm also going to wait, and explain both the primitives, and combinations in the next post, which will probably be just tomorrow night. Maybe I'll just update it right here. Maybe I shouldn't even post this yet until I'm finished, but I know there are people that read my blog and this is too cool to wait on!

Execution


Before we take a look at our means of abstraction, lets take a quick look at the Dictionary class, which is something like the DSL's runtime environment, and VM. It holds the words (the environment), and executes the queries which search over those words (VM).

1 case class Dictionary( words: List[String] ) {
2 def apply(criteria: Criteria[String]) = words.filter(criteria(_))
3 def find = apply _; def filter = apply _; def search = apply _;
4 }

Its fairly simple, simply asking the Criteria to evaluate themselves, for each word in the dictionary. If the word doesn't fit the criteria, it is filtered out.

One thing that I really wanted to get to (which doesn't have a whole lot of importance, but is really cool) is the method aliasing. Check out line 3. I've aliased find, filter, and search all to apply, so that a user can call any of them. Maybe that's confusing and not needed, but it was fun.

Means of Abstraction


This DSL borrows its means of abstraction from the Scala language itself, but so far does a pretty poor job of it.

Let's revisit the very last example:

val complex_query_without_a = complex_query and includes_none("a")
result = dictionary find complex_query_without_a
print(result)

I intentionally left out the declaration of complex_query. Do you remember what it does? I suspect not. I could tell you that its a query "asking for words that start with 'd' and end with 'ing', OR words that start with 'b' and end with 'ed', and any words found must have a maximum length of 7 and a minimum length of 4. But, I shouldn't have to. Somehow the code should provide that information upon reading it. There are ways we could do this, but they aren't great, in my opinion.

We could say:

val starts_with_d_and_ends_with_ing_OR_starts_with_b_and_on_and_on_and_on =
(
(starts_with("d") and ends_with("ing")) or
(starts_with("b") and ends_with("ed"))
) and max_length(7) and min_length(4)

But, that really doesn't work. I mean, not only does the variable name go right off the stinkin' page, but the entire implementation is encoded into it. I think that its possible that this DSL fails at doing a great job of abstraction because the underlying nature of the thing its trying to represent - queries matching specific criteria - doesn't really lend itself to abstraction very well. If anyone has any ideas on how this could be improved, I'd be grateful.

OK....that's all for now. I know this thing sort of deteriorated at the end as I got tired, but I'll fix it up really soon. Hope you love it!

Implementation


As promised, here's all the code in one place:

package com.joshcough.scrabble

import Criteria._
import Dictionary._
import WordMatchers._
import org.testng.annotations.Test
import scala.io.Source

class ScrabbleTest {

val lines = Source.fromFile("/usr/share/dict/words").getLines
val words = lines.toList.map(_.trim.toLowerCase)
val dictionary = Dictionary(words)

@Test def tests {

val tray = "defilnr"

// the 4 original requirements
val all = dictionary find includes_all(tray)
val any = dictionary find includes_any(tray)
val only = dictionary find includes_only(tray)
val all_and_only = dictionary find
(includes_all(tray) and includes_only(tray))
print(all, only, all_and_only)



// additional cool things
// (use of Dictionary.apply and Criteria's "||" method )
val all_max_7 =
dictionary(includes_all(tray) and max_length(7))
val all_or_min_10 =
dictionary(includes_all(tray) or max_length(7))
print(all_max_7, all_or_min_10)



// more things (use of Dictionary.filter,
// starts_with, ends_with, max_length, min_length)
val xin_or_max_4 =
dictionary search (starts_with("xin") or max_length(4))
val van_or_min_4 =
dictionary search (ends_with("van") and min_length(4))
print(xin_or_max_4, max_4_or_van)



// more complex query
val complexQuery =
(
(starts_with("d") and ends_with("ing")) or
(starts_with("b") and ends_with("ed"))
) and max_length(7) and min_length(4)

var result = dictionary find complexQuery
print(result)



// hmmm, but i didnt like my result...
// how about i modify my query...
val complexQueryWithoutAs = complexQuery and includes_none("a")
result = dictionary find complexQueryWithoutAs
print(result)
}

def print[T](lists: List[T]*) = lists foreach println
}

/**
* @author dood
* Date: Dec 21, 2008
* Time: 10:09:24 PM
*/
object Criteria{
implicit def funcToCriteria[T](f: Function1[T, Boolean]) =
new Criteria(f)
}

class Criteria[T](val f: Function1[T, Boolean]) {
def apply(t: T) = f(t)
def and (c: Criteria[T]): Criteria[T] = (t: T) => f(t) && c.f(t)
def or (c: Criteria[T]): Criteria[T] = (t: T) => f(t) || c.f(t)
}

/**
* @author dood
* Date: Dec 21, 2008
* Time: 11:24:22 AM
*/
case class Dictionary( words: List[String] ) {
def apply(criteria: Criteria[String]) = words.filter(criteria(_))
def find = apply _; def filter = apply _; def search = apply _;
}

/**
* @author dood
* Date: Dec 22, 2008
* Time: 3:12:35 PM
*/
object WordMatchers{
def includes_only(s: String) = (t: String) =>
inverse(s).foldLeft(true)((b,c) => b && (! (t contains c)))
def includes_none(s: String) = (t: String) =>
s.foldLeft(true)((b,c) => b && (! (t contains c)))
def includes_all(s: String) = (t: String) =>
s.foldLeft(true)((b,c) => b && (t contains c))
def includes_any(s: String) = (t: String) =>
s.foldLeft(false)((b,c) => b || (t contains c))
def starts_with(s: String) = (t: String) => t startsWith s
def ends_with(s: String) = (t: String) => t endsWith s
def max_length(max: Int) = (s: String) => s.trim.length <= max
def min_length(min: Int) = (s: String) => s.trim.length >= min
def inverse( s: String ) =
"abcdefghijklmnopqrstuvwxyz".filter(! s.contains(_) )
}


(1) - I hate the NYSE for keeping me in an intellectual prison for so long, and hate myself for not leaving sooner.

(2) - However, this is an internal DSL, and so you are free to build up more and more primitives, which makes them seem not really like primitives at all...

No comments:

Post a Comment