SqlJuxt – Speeding up the tests

SqlJuxt is coming along nicely. I have a great test pack in the project that tests through the top level API. I have spoken before about the value of this (gives you freedom to change your implementation etc) so I won’t harp on about it again. The problem is that the tests take 2 minutes 40 seconds to run, clearly too long. Especially when you consider that number is going to keep on growing as I add more tests.

The first step to debugging why the tests were taking so long was to turn on timings inside the Resharper test runner. This gives you a break down of how long each test takes. From that dialog I could see that all of the tests that created 2 databases for comparison were taking around 6 seconds. I then put timing around each line in one of these tests and found that the whole test ran in around 0.3 seconds and the test clean up took the other 5.7 seconds. This lead me to realise that it was the dropping of the database that was causing the slowdown.

To prove this theory I commented out the drop database code (which if you remember is in the dispose method in the DisposableDatabase type). I ran the whole test pack again and it ran in 12 seconds. Wowzer! Clearly this is not a solution as each test run will leave behind lots of databases.

I noticed that after the tests have finished if you drop a database then it happens pretty much instantaneously, but it takes ages when it happens as the last step of the test. This lead me to realise that something in my test was hanging on to the database connection causing it to have to rip the connection away. I examined my test code and I definitely close down the SqlConnection so this left me stumped for a little while. Then I got on to reading about connection pooling. Then it twigged that ADO .Net turns on connection pooling by default. What this means is that when you close a sql connection the connection doesn’t get completely destroyed. Instead it gets put into a sleeping state. This is to speed up the process of getting a connection next time you want one. In nearly all applications this is what you want as you are going to be continually connecting to your database to make connections and you do not want to keep on going through the overhead of the handshake to set up a connection. However in my scenario this is not what I want. I am making a database connection, running in a script to setup a schema, comparing that schema and then dropping it again and at that point I no longer need the database. The solution to my issue was to simply turn off connection pooling. This means that when you close the connection it is completely gone so when you drop the database it drops pretty much instantly as it doesn’t have to rip away any existing connections. To turn off connection pooling you simply add “pooling=false;” to the connection string. I did that and ran the tests and all of the tests now run in 12 seconds. Quite an improvement, its like I found the turbo button!!

I just want to reiterate that I am not advocating turning off connection pooling for everyone. It is defaulted to on for a reason (its probably what you want), its just in my specialised scenario it was causing a massive performance overhead.

It is quite a win having found that as now I can run my whole integration test pack in 12 seconds it makes it much easier to refactor.

Check out the full source code at SqlJuxt GitHub repository.

Advertisements

SqlJuxt – Active patterns for the win

F# Active Patterns are awesome!! I wanted to start this blog post with that statement. I was not truly aware of F# active patterns until I read the article on F Sharp for Fun and Profit when I was looking at a better way to use the .Net Int32.Parse function.

Active patterns are a function that you can define that can match on an expression. For example:

let (|Integer|_|) (s:string) = 
    match Int32.TryParse(s) with
        | (true, i) -> Some i
        | _ -> None 

To explain what each line is doing the | characters are called banana clips (not sure why) and here we are defining a partial active pattern. This means the pattern has to return Option of some type. So be definition the pattern may return a value or it may not so it is a partial match. This pattern takes a string and then returns Some int if the pattern matches or None if it does not. Once this pattern is defined it can be used in a normal match expression as follows:

let printNumber (str:string) =
	match str with
		| Integer i -> printfn "Integer: %d" i
		| _ -> "%s is not a number" %s

Using this technique it allows us to define a really cool active pattern for matching regular expressions and parsing out the groups that are matched:

let (|ParseRegex|_|) regex str =
    let m = Regex(regex).Match(str)
    match m with 
        | m when m.Success -> Some (List.tail [ for x in m.Groups -> x.Value] )
        | _ -> None

This regex active pattern returns all of the matched groups if the regex is a match or None if it is not a match. Note the reason List.tail is used is to skip the first element as that is the fully matched string which we don’t want, we only want the groups.

The reason why all of this came up is that the getNextAvailableName function that I wrote about in my last blog post is very long winded. For those who haven’t read my last post this function generates a unique name by taking a candidate name and a list of names that have been taken. If the name passed in has been taken then the function will add a number on to the end of the name and keep incrementing it until it finds a name that is not taken. The getNextAvailableName function was defined as:

let rec getNextAvailableName (name:string) (names: string list) =

    let getNumber (chr:char) =
        match Int32.TryParse(chr.ToString()) with
            | (true, i) -> Some i
            | _ -> None

    let grabLastChar (str:string) =
        str.[str.Length-1]

    let pruneLastChar (str:string) =
        str.Substring(0, str.Length - 1)

    let pruneNumber (str:string) i =
        str.Substring(0, str.Length - i.ToString().Length)

    let getNumberFromEndOfString (s:string)  =

        let rec getNumberFromEndOfStringInner (s1:string) (n: int option) =
            match s1 |> String.IsNullOrWhiteSpace with
                | true -> n
                | false -> match s1 |> grabLastChar |> getNumber with
                            | None -> n
                            | Some m ->  let newS = s1 |> pruneLastChar
                                         match n with 
                                            | Some n1 -> let newN = m.ToString() + n1.ToString() |> Convert.ToInt32 |> Some
                                                         getNumberFromEndOfStringInner newS newN
                                            | None -> getNumberFromEndOfStringInner newS (Some m) 
        let num = getNumberFromEndOfStringInner s None
        match num with
            | Some num' -> (s |> pruneNumber <| num', num)
            | None -> (s, num)
        

    let result = names |> List.tryFind(fun x -> x = name)
    match result with
        | Some r -> let (n, r) = getNumberFromEndOfString name
                    match r with 
                        | Some r' -> getNextAvailableName (n + (r'+1).ToString()) names
                        | None -> getNextAvailableName (n + "2") names
                    
        | None -> name

With the Integer and Regex active patterns defined as explained above the new version of the getNextAvailableName function is:

let rec getNextAvailableName (name:string) (names: string list) =

    let result = names |> List.tryFind(fun x -> x = name)
    match result with
        | Some r -> let (n, r) = match name with
                                    | ParseRegex "(.*)(\d+)$" [s; Integer i] -> (s, Some i)
                                    | _ -> (name, None)
                    match r with 
                        | Some r' -> getNextAvailableName (n + (r'+1).ToString()) names
                        | None -> getNextAvailableName (n + "2") names
                    
        | None -> name

I think it is pretty incredible how much simpler this version of the function is. It does exactly the same job (all of my tests still pass:) ). It really shows the power of active patterns and how they simplify your code. I think they also make the code more readable as even if you didn’t know the definition of the ParseRegex active pattern you could guess from the code.

Check out the full source code at SqlJuxt GitHub repository.

SqlJuxt – Defining indexes on a table

I have just finished implementing the first implementation of table indexes. Both scripting out an index on the table using the fluent builder syntax and comparing indexes on tables. When writing this feature I had some interesting design decisions to make…

My first design for the type to represent index is shown below:

type Constraint = {name: string; columns: (Column * SortDirection)  list; clustering: Clustering}
type Index = {name: string; columns: (Column * SortDirection) list; clustering: Clustering; uniqueness: Uniqueness}

type Table = {schema: string; name: string; columns: Column list; primaryKey: Constraint option; indexes: Index list}

As you can see the primary key on the table is defined as Constraint of Option and the indexes are defined as Index list. When I started writing the code to use these types I noticed a lot of duplication. Then I realised that an index and a primary key are both really constraints just with slightly different properties. Those being that a primary key is always unique that’s what makes it a key!

I decided to extend the Constraint type by adding the uniqueness property to it. Then it was a simple job of extending the primary key methods to always set the uniqueness to unique. Now the type for a table looks like:

type Table = {schema: string; name: string; columns: Column list; primaryKey: Constraint option; indexes: Constraint list}

So a table has a list of indexes which could of course be empty and it may or may not have a primary key which we can represent by using a Constraint of option. The other advantage of modelling both primary keys and indexes using the constraint type is that we can select them out of the database when loading up to build the schema for comparison all at the same time. We simply have to extend the query to bring back the additional information of whether the constraint is unique and if it is a primary key or not so we know whether to put it on the primary key property of the table.

I did toy with the idea of having all constraints in a list on the table called constraints. That list would’ve the primary key if there was one and all of the indexes for the table. I decided against that approach as it feels a bit clunky to have to go through a list to find the primary key of a table. Also the reason I didn’t like that approach is that where possible you should use the type system and thus the compiler to enforce correctness in your program and make it impossible to model illegal state. If I had a list of constraints on a table I would have to manually check to make sure there was only one primary key in the list. Whereas if I have the primary key defined on the table as option of Constraint then there can only ever be one primary key.

If you want to check out the full source code and delve deeper feel free to check out the SqlJuxt GitHub repository.

SqlJuxt – Using partial function application to improve the API

After I got all of the tests passing for creating and comparing primary keys on a table I looked back at the code and decided that it had a bit of a smell to it.  Consider the following code snippet:

let private withPrimaryKey columns table isClustered =
    let cs = getColumnsByNames columns table
    {table with primaryKey = Some {name = sprintf "PK_%s" table.name; columns = cs; isClustered = isClustered}}
            
let WithClusteredPrimaryKey columns table =
    withPrimaryKey columns table true

let WithNonClusteredPrimaryKey columns table =
    withPrimaryKey columns table false

The isClustered parameter is of type bool. This in itself does not feel quite right. You can see what I mean when you look at the implementation of WithClusteredPrimaryKey or WithNonClusteredPrimaryKey. The line reads “withPrimaryKey columns table false”. It is obvious what all of those parameters are except the bool at the end. What does false mean?

Clearly this would be much better if it was self describing which we can easily do in F# using a discriminate union. By defining one like so:

type Clustering = CLUSTERED | NONCLUSTERED

The other part that was causing the code to smell but was perhaps less obvious was the order of the parameters. As described by the excellent post on F# for fun and profit on partial function application the order of your parameters is very important. In the builder functions shown above table is always placed last this means you do not need to mention it when using the script builder API, for example:

let rightTable = CreateTable "DifferentKeyTable"
                        |> WithInt "Column1"
                        |> WithInt "Column2" 
                        |> WithInt "Column3" 
                        |> WithClusteredPrimaryKey [("Column1", ASC); ("Column2", ASC)]
                        |> Build 

Notice how the table parameter does not need to be explicitly passed around. You would have to pass this if it was not the last parameter.

So we know that the order is important if we look at the with primary key functions we can see that the isClustered parameter being last stops us from using partial application so when we define the two methods to create a non clustered and clustered primary key we have to explicitly pass all of the parameters.

Here is the redesigned code of the WithPrimaryKey methods on the TableBuilder API:

let WithPrimaryKey clustering columns table =
    let cs = getColumnsByNames columns table
    {table with primaryKey = Some {name = sprintf "PK_%s" table.name; columns = cs; Clustering = clustering}}
              
let WithClusteredPrimaryKey = WithPrimaryKey CLUSTERED
let WithNonClusteredPrimaryKey = WithPrimaryKey NONCLUSTERED

Notice how much cleaner these three methods are now. By moving the clustering parameter to the start and using a discriminate union instead of a bool we have achieved two things. Firstly, the code is now self documenting as we are not passing true or false but instead passing CLUSTERED or NONCLUSTERED. Secondly because the clustering parameter is now first we can use the magic of partial function application to define WithClusteredPrimaryKey and WithNonClusteredPrimaryKey. Those two methods simply bake in whether the key is clustered or not and then leave you to fill in the rest of the parameters.

I really love how F# allows you to write beautiful code like this. I’m still learning to write functional code and am really enjoying the experience. Any feedback comments are welcome so please keep them coming.

If you want to check out the full source code and delve deeper feel free to check out the SqlJuxt GitHub repository.

SqlJuxt – Building primary keys on a table

There were a few interesting design decisions I had to make when designing the code to script a primary key on a table. Before I dive into them I think it is good to see the finished code, here is a test that uses the TableBuilder to create a clustered primary key on a table:

[<Fact>]
let ``should be able to build a table with a clustered primary key on a mulitple columns``() =
    CreateTable "RandomTableName"
        |> WithInt "MyKeyColumn"
        |> WithInt "SecondKeyColumn"
        |> WithVarchar "ThirdCol" 50
        |> WithVarchar "ForthCol" 10
        |> WithClusteredPrimaryKey [("MyKeyColumn", ASC); ("SecondKeyColumn", DESC); ("ThirdCol", DESC)]
        |> Build
        |> should equal @"CREATE TABLE [dbo].[RandomTableName]( [MyKeyColumn] [int] NOT NULL, [SecondKeyColumn] [int] NOT NULL, [ThirdCol] [varchar](50) NOT NULL, [ForthCol] [varchar](10) NOT NULL )
GO

ALTER TABLE [dbo].[RandomTableName] ADD CONSTRAINT [PK_RandomTableName] PRIMARY KEY CLUSTERED ([MyKeyColumn] ASC, [SecondKeyColumn] DESC, [ThirdCol] DESC)
GO"

What I like about this code is that just from reading it, it is obvious what the code will do. It will create a create table script with 4 columns with a clustered primary key on 3 of those cloumns.

I decided to go with the approach to take the minimal amount of arguments possible in order to define the primary key. Those being the list of column names and sort order of the columns that you want as a primary key. Originally I thought about allowing the user to specify the primary key name but actually this just adds noise to the code. One can be generated using the convention “PK_” + tableName. Also why make the user think about the name for the primary key when all we really care about is that there is a primary key there and that it has a unique name.

I love the way that in f# you can use discriminate unions to represent states to make the code easy to read and work with. In the above example I could have taken many approaches to specify the column sort order such as using a bool to say whether or not the column is ascending. However, if I had gone with that approach then when calling the code you would have ended up with “columnName true” or “columnName false”. This already feels horrible as just from reading the code you do not know what the true or false means. By defining a discriminate union of ASC/DESC you can immediately tell what the parameter is and what it is doing.

The primary key is defined as a constraint as the following type:

type Constraint = {name: string; columns: (Column * SortDirection)  list; isClustered: bool}

Then the table type has been extended to add a Constraint as a primary key using the option type. As a table may or may not have a primary key. It is nice that we can use Option to represent this rather than having to rely on null like we would in an imperative language.

The hardest part to making this work is taking the list of (String * SortDirection) that the WithClusteredPrimaryKey function takes and turning that in to a list of (Column * SortDirection). This is done using the following function:

let private getColumnsByNames (columnNames: (string * SortDirection) list) table =
            columnNames |> List.map(fun (c,d) -> let column = table.columns |> List.tryFind(fun col ->  match col with
                                                                                            | IntColumn i when i.name = c -> true
                                                                                            | VarColumn v when v.name = c -> true
                                                                                            | _ -> false)
                                                 match column with
                                                    | Some col -> (col, d)
                                                    | None -> failwithf "no column named %s exists on table %s" c table.name )

It is great in F# how we can let the types guide us. If you look at the signature of the function above then we can see that it is:

getColumnsByNames (columnNames: (string * SortDirection) list) -> (table:Table) -> (Column * SortDirection) list

When you look at the types you can see that there are not too many ways this function could be implemented. Using what is in the room as Erik Meijer would say we go through the columns on the table and match them up with the column names that were passed in (throwing an exception if a name is passed in that is not on the table) and then return the actual column along with the sort direction.

F# is proving to be an interesting choice in writing the database comparison library. It is a totally different way of thinking but I feel that I’m starting to come to terms with thinking functionally.

If you want to check out the full source code and delve deeper feel free to check out the SqlJuxt GitHub repository.

SqlJuxt – Using disposable in F# to drop a test database

The integration tests in SqlJuxt need to do the following:

  • Create two databases
  • Set the databases up in a certain state (create tables, views etc)
  • Compare them
  • Assert the result of the comparison
  • Clean up the databases at the end (drop them)

Before I dive in to how I went about making this work I think its good to take a look at how the finished test looks:

[<Fact>]
let ``should return identical when two tables are the same``() =
    use left = createDatabase()
    use right = createDatabase()
     
    let table = CreateTable "TestTable"
                    |> WithNullableInt "Column1"
                    |> Build 

    runScript left table
    runScript right table
        
    loadSchema left.ConnectionString
            |> compareWith right.ConnectionString
            |> should equal IsMatch

I think that test reads really well. In fact you can tell exactly what is going on in the test from the code which is one of the key ingredients of a good test. The test is creating two databases, then creating the same table on both of the databases. It then compares them using the SqlJuxt library and expects the databases to match. The result of the comparison is a discriminate union which I will talk about more in an upcoming post. For now you can read that it says “should equal IsMatch” which is really clear.

The astute reader will notice that nowhere in the test does it mention dropping the databases so you might wonder how this is done. The secret behind this is IDisposable. I got the idea from reading this page on let, use and do on the F# for fun and profit site. Scott talks about using the dispose to stop a timer so I thought it would be neat to use it to drop the database.

To make this work the createDatabase function returns a class that implements IDisposable. Notice that the variables left and right are declared using the ‘use’ keyword and not the ‘let’ keyword. This means that when the variables go out of scope Dispose is automatically called.

This is how the disposable database type is defined:

type DisposableDatabase(name, connectionString) =

    member this.ConnectionString = connectionString
    member this.Name = name

    interface System.IDisposable with 
        member this.Dispose() = 
            dropDatabase this.Name

The code is a little bit clunky as we have to declare a class to implement IDisposable but the cool part is that we do not have to explicitly drop the database in our test it just happens for us.

If you want to check out the full source code and delve deeper feel free to check out the SqlJuxt GitHub repository.

SqlJuxt – Implementing the builder pattern in F#

As an imperative programmer by trade a pattern that I often like to use is the builder pattern for building objects. This pattern is especially useful for test code.

The reasons the builder pattern is so useful is because:

  • It means you only have to “new” the object up in a single place meaning your test code effectively goes through an API (the builder API) to create the object
  • It makes you code really readable so you can understand exactly what it is doing without having to look into how the code works

In my SqlJuxt project I started by coding it in C#. I wanted a builder to make a table create script. This would mean that in my integration tests I could fluently create a script that would create a table and then I could run it in to a real database and then run the comparison. This makes the tests very readable and easy to write. In C# using the table builder looks like:

    var table = Sql.BuildScript()
                   .WithTableNamed("MyTable", t => t.WithColumns(c => c.NullableVarchar("First", 23)
                                                                       .NullableInt("Second")))

I think that is pretty nice code. You can easily read that code and tell that it will create a table named “MyTable” with a nullable varchar column and a nullable int column.

I wanted to achieve the same thing in F# but the catch is I did not just want to translate the C# in to F# (which is possible) I wanted to write proper functional code. Which means you should not really create classes or mutable types! The whole way the builder pattern works is you store state on the builder and then with each method call you change the state of the builder and then return yourself. When the build method is called at the end you use all of the state to build the object (note an implicit call to the Build method is not needed in the above code as I have overridden the implicit conversion operator).

I hunted around for inspiration and found it in the form of how the TopShelf guys had written their fluent API.

This is how using the table builder looks in F#:

    let table = CreateTable "TestTable"
                    |> WithNullableInt "Column1"
                    |> Build 

I think that is pretty sweet! The trick to making this work is to have a type that represents a database table. Obviously the type is immutable. The CreateTable function takes a name and then returns a new instance of the table type with the name set:

    let CreateTable name =
        {name = name; columns = []}

Then each of the functions to create a column take the table and a name in the case of a nullable int column and then return a new immutable table instance with the column appended to the list of columns. The trick is to take the table type as the last parameter to the function. This means you do not have to implicitly pass it around. As you can see from the CreateTable code above. The Build function then takes the table and translates it in to a sql script ready to be run in to the database.

Here is an example of a complete test to create a table:

    [<Fact>]
    let ``should be able to build a table with a mixture of columns``() =
       CreateTable "MultiColumnTable"
           |> WithVarchar "MyVarchar" 10
           |> WithInt "MyInt"
           |> WithNullableVarchar "NullVarchar" 55
           |> WithNullableInt "NullInt"
           |> Build 
           |> should equal @"CREATE TABLE [dbo].[MultiColumnTable]( [MyVarchar] [varchar](10) NOT NULL, [MyInt] [int] NOT NULL, [NullVarchar] [varchar](55) NULL, [NullInt] [int] NULL )
GO"

I think that test is really readable and explains exactly what it is doing. Which is exactly what a test should do.

If you want to follow along with the project then check out the SqlJuxt GitHub repository.