Saturday, September 20, 2008

Creating Control Constructs in Scala

When using inner classes in Java, you can either create a named inner class or an anonymous inner class. Named inner classes are useful when you want to reference them in more than one place, and anonymous inner classes are useful for in-line single use.

Similarly, in Scala you can use either named or anonymous functions. Named functions are useful when you want to reference them in more than one place, and anonymous functions (function literals) are useful for in-line single use. In particular, anonymous functions can be used to create new control constructs that look almost identical to Scala's native control constructs.

Contents

Some Scala Syntax

For a more complete list of Scala syntax, see my Scala Syntax Primer. Below are the Scala syntax rules that contribute to the ability to make your own control constructs. You can try out these code snippets in the scala interpreter.
  • All statements are expressions. In any place where a single expression is expected, a block of expressions contained in braces can be used. The value of the block is the value of the last expression in the block. The braces also act as parentheses. The expression
    5 * { val a = 2; a + 1 }
    is valid and returns the value 15.
  • A method that takes a single argument can be written after the object instance with no dot and no parentheses. For example, a call to the String.charAt method, which is normally written like this:
    "abc".charAt(1)
    can instead be written like this:
    "abc" charAt 1
    Or, since we can use braces as parentheses, like this:
    "abc" charAt { 1 }
  • A method that takes multiple arguments can be curried, i.e. the arguments can be divided into multiple argument lists. For example, a method that takes two Strings, normally written like this:
    def cat(s1:String, s2:String) = s1+s2
    can instead be written like this:
    def cat(s1:String)(s2:String) = s1+s2
    The syntax for invoking the method matches the definition. Because we can use braces instead of parentheses, we can invoke that second definition like this:
    cat("a"){ "b" }
    or of course in place of b we could use any other expression that evaluates to a string.

Synchronized

An example of a standard Scala method that looks like a control construct is the synchronized method. In Java, the synchronized keyword is a reserved word in the language with a special syntax, but in Scala synchronized is a method on the AnyRef class that takes as its single argument a code block to be executed. Whereas in Java you would write
synchronized(myLock) { /* code to be synchronized */ }
in Scala you would write
myLock synchronized { /* code to be synchronized */ }
or, if you are synchronizing on this, you can just write
synchronized { /* code to be synchronized */ }
The synchronized method has a single argument, so by the Scala syntax rules you can write it without the dot and parentheses. In the last example, where the object is implicitly this, you do have to include either parentheses or braces around the expression or code block to be synchronized.

Your Own Control Constructs

Here is a typical pattern used in Java:
//Java code Connection conn = DriverManager.getConnection(myDbLocation); try { //SQL commands using conn sqlCommand1(conn); sqlCommand2(conn); } finally { if (conn!=null) { try { conn.close(); } catch (IOException ex) { println("Error closing database connection"); } } }
This pattern is used when you need to ensure that a resource is closed after use. The SQL commands might be just a couple of lines of in-line code that does an SQL query or update. The surrounding boilerplate code is another 10 or so lines of code, which is often significantly more than the work code. If your boilerplate also handles one or more exceptions in a standard way, it can be much larger. It would be nice to factor out this boilerplate into a method that can be reused. As a functional language, Scala makes this easy to do.

Here is a Scala function that can be used to replace the above Java pattern:
def withConnection(location:String)(body: (Connection) => Unit) { val conn = DriverManager.getConnection(location) try { body(conn) } finally { if (conn!=null) { try { conn.close() } catch { case ex:IOException => println("Error closing database connection") } } } }
The signature of body, (Connection) => Unit, says that it is a function that takes a single argument of type Connection and returns no value (Unit). We would typically call withConnection like this:
withConnection(myDbLocation) { conn => //SQL commands using conn sqlCommand1(conn) sqlCommand2(conn) }
In the above usage example, we are passing withConnection an anonymous function as the value for withConnection's body parameter. The conn => tells the compiler the name of the one argument to the function (conn), with the stuff after the => (on the following lines up until the matching close brace) being the body of the function.

Alternatively, if we have a named function that takes a Connection as an argument (such as "doSqlCommands" in this example), we can pass it instead like this:
withConnection(myDbLocation)(doSqlCommands)
Our withConnection method is designed specifically for a database connection. We could write a similar method to open a File and ensure that it gets closed, or for any resource for which we want to guarantee cleanup.

We can easily compose two such functions to create a convenience function that opens two resources and ensures both are closed. For example, assume we have the above withConnection function, and we implement a similar withFileInput function with this signature:
def withFileInput(location:String)(body: (FileInputStream) => Unit):Unit
Now we can write a withFileInputAndConnection method that opens one of each:
def withFileInputAndConnection(fileLocation:String, dbLocation:String)( body: (FileInputStream, Connection) => Unit) { withFileInput(fileLocation) { f => withConnection(dbLocation) { conn => body(f, conn) } } }
This code would be called like this:
withFileInputAndConnection(myFileLocation,myDbLocation) { file, conn => doFileCommand(file) doSqlCommand(conn) doFileAndSqlCommand(file,conn) }
Or we can pass in a named function:
withFileInputAndConnection(myFileLocation,myDbLocation)(doFileAndSqlCommand)

Refactoring

In the above examples I have suggested writing a function, but I have not mentioned where it should live. You might gather the withFileInput, withConnection and withFileInputAndConnection functions together into one utility class, but there is another approach that allows these various withSomething functions to share their structure.

Rather than defining a withConnection function, you can define a WithConnection object with an apply method that does the same thing as our previous function. You can then write the object name as if it were a function name, making it's usage look the same as any other function.

Here is WithConnection coded as an object:
object WithConnection { def apply(location:String)(body: (Connection) => Unit) { val conn = DriverManager.getConnection(location) try { body(conn) } finally { if (conn!=null) { try { conn.close() } catch { case ex:IOException => println("Error closing database connection") } } } } }
A call to this function looks identical to a call to the previous withConnection function, except with an initial upper case W character:
WithConnection(myDbLocation) { conn => //SQL commands using conn sqlCommand1(conn) sqlCommand2(conn) }
We know that the WithFileInput object would look very similar to WithConnection, so we can refactor the common code into a class that we can use as a superclass. We will call this base class WithResource. Since we now don't know the type of the resource we are dealing with, we use an abstract type R in our base class and let the subclass fill in the actual type. Likewise for the specifier of the resource, which in both of our examples has been a String but could be something else for some other resource.
abstract class WithResource { type S //the type of the resource specifier type R //the type of the resource val resourceTypeName:String def openResource(spec:S):R def closeResource(resource:R) //everything above here is abstract, subclass must define. def apply(spec:S)(body: (R) => Unit) { val resource = openResource(spec) try { body(resource) } finally { if (resource!=null) { try { closeResource(resource) } catch { case ex:Exception => println("Error closing "+resourceTypeName) } } } } }
Given the above base class, we can now more succinctly define our WithConnection and WithFileInput objects:
object WithConnection extends WithResource { type S = String type R = Connection val resourceTypeName = "database connection" def openResource(location:String) = DriverManager.getConnection(location) def closeResource(conn:Connection) = conn.close() } object WithFileInput extends WithResource { type S = String type R = FileInputStream val resourceTypeName = "FileInputStream" def openResource(location:String) = new FileInputStream(location) def closeResource(input:FileInputStream) = input.close() }
This call to WithFileInput opens /etc/fstab, prints out all the lines in it, and closes the file:
WithFileInput("/etc/fstab")( scala.io.Source.fromInputStream(_).getLines.foreach(print))
We can make our extending objects even smaller by adding type and constructor parameters to WithResource:
abstract class WithResource[S,R]( openResource: (S)=>R, closeResource: (R)=>Unit, resourceTypeName:String) { def apply(spec:S)(body: (R) => Unit) { val resource = openResource(spec) try { body(resource) } finally { if (resource!=null) { try { closeResource(resource) } catch { case ex:Exception => println("Error closing "+resourceTypeName) } } } } } import java.sql.{Connection,DriverManager} object WithConnection extends WithResource[String,Connection]( DriverManager.getConnection, _.close, "database connection") import java.io.{FileInputStream} object WithFileInput extends WithResource[String,FileInputStream]( new FileInputStream(_), _.close, "FileInputStream")
We can do a better job when creating a function to open two resources by creating a generic version and extending that generic version to get our specific version. Here we define a generic class for managing two resources, then create an object to manage two files, with an example use of that object to print out the first three lines of two files:
class WithTwoResources[S1,R1,S2,R2]( w1:WithResource[S1,R1], w2:WithResource[S2,R2]) { def apply(spec1:S1, spec2:S2)(body: (R1,R2) => Unit) { w1(spec1)(res1=> w2(spec2)(res2=> body(res1,res2))) } } object WithTwoFileInputs extends WithTwoResources( WithFileInput,WithFileInput) import scala.io.Source def test2Files = WithTwoFileInputs("/etc/passwd","/etc/group") { (in1, in2) => Source.fromInputStream(in1).getLines.take(3).foreach(print) Source.fromInputStream(in2).getLines.take(3).foreach(print) }

Object Functional

Scala is an Object Functional language: it is both an object-oriented language and a functional language, and it does a good job of integrating those two styles. Every function (when passed as a value) is an instance of a class, and as with any class, a function class can be extended to make subtypes that refine its behavior. The "apply" syntax rule (stating that an object name followed by arguments in parentheses is translated into a call to the apply method) means you can treat any class as a function class by defining the apply method, as we have done above.

Let's see how we can take advantage of Scala's integrated object-functional capabilities to improve our control class. We'll start by adding a mechanism for handling exceptions.
abstract class WithResource[S,R]( openResource: (S)=>R, closeResource: (R)=>Unit, resourceTypeName:String) { def apply(spec:S)(body: (R) => Unit) { val resource = openResource(spec) try { body(resource) } catch { case ex:Exception => handleException(ex) } finally { if (resource!=null) { try { closeResource(resource) } catch { case ex:Exception => println("Error closing "+resourceTypeName) } } } } protected def handleException(ex:Exception) { throw(ex) } }
In the above code, we have added the method handleException with a default implementation that does not change the behavior of the apply method, so subclasses which do not want to bother with implementing any special exception handling can stay as they were. A subclass which does want to implement exception handling can implement its own handleException method. Doing so in effect refines a part of the WithResource function.

Next we will add a try/catch block around the openResource call.
... val resource = try { openResource(spec) } catch { case ex:Exception => handleOpenException(ex,spec) case ex:Throwable => throw(ex) } ... protected def handleOpenException(ex:Exception, spec:S):R = { throw(ex) }
Here we are taking advantage of the fact that all statements in Scala return a value, including a try/catch statement. We make the handleOpenException method return a value of the same type as the openResource method so that a subclass can implement a method that handles the problem and returns an opened resource with which to continue processing.

Lastly we will simplify the subclasses of WithResource by making the assumption that most resources can be closed with a call to a close method.
abstract class WithResource[S,R]( openResource: (S)=>R, //we have removed the closeResource argument resourceTypeName:String) { def apply(spec:S)(body: (R) => Unit) { val resource = try { openResource(spec) } catch { case ex:Exception => handleOpenException(ex,spec) case ex:Throwable => throw(ex) } try { body(resource) } catch { case ex:Exception => handleException(ex) } finally { if (resource!=null) { try { closeResource(resource) } catch { case ex:Exception => println("Error closing "+resourceTypeName) } } } } protected def handleOpenException(ex:Exception, spec:S):R = { throw(ex) } protected def closeResource(resource:R) { type HasCloseMethod = { def close() } resource match { case r:java.io.Closeable => r.close() case r:HasCloseMethod => r.close() case _ => throw new RuntimeException("no close method available") } } protected def handleException(ex:Exception) { throw(ex) } }
In our default implementation of closeResource we check to see if the resource has a close method in two ways: firstly if it implements the standard Java Closeable interface, and secondly by Scala's ability to use structural typing ("duck typing") by seeing if it has a close method. Our WithConnection and WithFileInput objects can use our default closeResource method since each of File and Connection has a close method.
import java.sql.{Connection,DriverManager} object WithConnection extends WithResource[String,Connection]( DriverManager.getConnection, "database connection") import java.io.{FileInputStream} object WithFileInput extends WithResource[String,FileInputStream]( new FileInputStream(_), "FileInputStream")
If we want to implement a WithResource object for a resource that has some other way to close it, our extending object can implement its own closeResource method, as in our original implementation of WithFileInput above.

Conclusion

Here is what we have:
  • We have factored out all of the boilerplate in our original Java code example into our WithResource class and the objects that extend it, which we call invoke as functions.
  • We can call our WithResource function for any defined resource type with only one or two lines of code that look very similar to native control constructs.
  • We can define a WithResource function for a new resource type with only two or three lines of code.
  • We can define a function for any two resources with only one or two lines of code.
  • We can enhance the functionality of all of our WithResource functions by modifying our one WithResource base class, as exemplified by our addition of exception handling in the apply method.
We were able to do all this in Scala because of its syntax and capabilities:
  • The "apply" rule allows us to treat any object as a function.
  • The ability to pass an argument by name allows us to pass in a functional literal that looks like any other code block.
  • The ability to define a function with a curried address list allows us to split the arguments into two lists, so that we can put the last argument in a list by itself and take advantage of the following step.
  • The optional "no period and parentheses when one argument" method rule allows us to call our method in a way that looks like a built-in control construct.
  • The fact that all statements in Scala return a value allows us to use a try/catch clause and assign the result to an immutable val.
  • The ability to compare objects against a structural type ("duck typing") allows our WithResource default implementation to close any resource with a close method.

3 comments:

Chris Bouzek said...

Nice coverage of this topic, Jim. I've linked to this post from my original stream post.

Chris Bouzek said...

Jim,

This is very useful code. How have you licensed it? I'd like to include it in my picture tagging webapp (currently a personal project that I may someday host somewhere).

Thanks!

Jim McBeath said...

The code in my blog entries is now licensed under LGPLv3, which I have also added as a note in the sidebar. Thanks for bringing up this issue.