Thursday, October 27, 2011

SQL Azure Federations and Enzo Shard Demo

Attached are my powerpoint and C# demo of the Enzo SQL Shard library for the October KY Dot Net user group meeting. This presentation will cover Azure Federation concepts, sharding strategies, and a short demo of the Enzo SQL Shard Library utilizing a SQL Azure database.


Session information:

"Designing enterprise data solutions for today's cloud computing environments can leave the average architect puzzled and confused. Listen in as we discuss how database sharding and the upcoming release of Sql Azure Federations will provide virtually unlimited scalability of data for your enterprise applications. Data federation solutions will be analyzed as we examine some of the current data virtualization vendors and preview some of the new features of Sql Azure Federations. Additionally, a federation-aware codeplex library will be demonstrated which supports fan-out queries and parallel round robin loading. Don't miss!"


Bio: Patrick Riley is a software developer for the Rawlings Group in LaGrange, KY. He has worked on master data management solutions in the .NET framework and SQL Server for several years. He enjoys golf, grilling food on his Webber grill, and watching the technology race to the cloud

Feel free to send any feedback you might have to patrick@patrickmriley.net.

Enjoy!

PPTX:
Sql Azure Data Federations

Sample Enzo Federations Source Code:
EnzoShardCode_v1.0

Wednesday, August 31, 2011

Bitwise XOR on Byte Array

Spent the past few hours scrambling for a generic XOR solution on two byte arrays. I needed a solution that would add the match value to a specific result value if it did not already exist (assuming the most significant byte starts at the beginning of the array). You'd think this would be somewhere in the C# libraries by now. After an hour of searching, I decided to write my own. Ended up with the solution below.

static byte[] BitwiseXOR(byte[] result, byte[] matchValue)
{
    if (result.Length == 0)
    {
        return matchValue;
    }

    byte[] newResult = new byte[matchValue.Length > result.Length ? matchValue.Length : result.Length];
            
    for (int i = 1; i < newResult.Length+1; i++)
    {
        //Use XOR on the LSBs until we run out
        if(i > result.Length)
        {
            newResult[newResult.Length - i] = matchValue[matchValue.Length - i];
        }
        else if (i > matchValue.Length)
        {
            newResult[newResult.Length - i] = result[result.Length - i];
        }
        else
        {
            newResult[newResult.Length -i] = 
                (byte)(matchValue[matchValue.Length - i] ^ result[result.Length - i]);
        }
    }
    return newResult;
}

Thursday, August 25, 2011

Windows Azure Marketplace


In an attempt to update my blog, I am posting this from my January discussion at the KY .NET user group meeting on consuming cloud services. I will be speaking this October on Sql Azure Federations, and will be focusing my efforts in the coming year on cloud based applications. More to come…
Windows Azure Marketplace: Consuming the Cloud DataMarket
Leveraging the Cloud’s data services can provide real-time feeds that keep your users bustling. Learn how to quickly absorb the DataMarket’s datasets using OData APIs and expose them to client applications on any platform. A sample from Microsoft’s Windows Azure Support team utilizing a Windows Azure service to consume crime data from Data.gov will be discussed. OData, one of the cloud’s increasingly popular and flexible web protocols for querying and updating data, will be examined and reviewed.
Link to slides:
PPTX: Windows_Azure_Marketplace_Consuming_the_Cloud

Tuesday, August 23, 2011

Fuzzy matching: a programmer's view of the Damerau-Levenshtein algorithm


The sheer mechanics of fuzzy matching algorithms are fascinating. Optimized variations are behind search engines and optical character recognition (OCR) softwares we see on the market today. An interesting algorithm I've recently used for a patient matching solution utilizes the Damerau-Levenshtein distance algorithm to locate fuzzy matches.

The Levenshetin approach considers how many substitutions, deletions, and insertions are needed to make two strings equal (also known as the edit distance between the strings). The algorithm was later improved by Damerau to include transposition (step highlghted on line 55). This solution is an example of bottom-up dynamic programming, because a matrix is flood filled to find the minimum distance. The complexity is cubic, but is improved to K x L complexity when using a maximum distance of interest (where K is the maximum distance, and L is the shortest string). The algorithm below uses a maximum distance (threshold paramter) and other simple optimizations to vastly improve the performance of the Damerau-Levenshtein. It has been tested to run over 10K string compares/second with two given strings under ten characters in length and a maximum threshold of three edits.

public static int DamerauLevenshteinDistanceImproved(string string1, 
    string string2, int threshold)
{
    // Return trivial case - where they are equal
    if (string1.Equals(string2))
        return 0;

    // Return trivial case - where one is empty
    if (String.IsNullOrEmpty(string1) || String.IsNullOrEmpty(string2))
        return (string1 ?? "").Length + (string2 ?? "").Length;

    // Ensure string2 (inner cycle) is longer
    if (string1.Length > string2.Length)
    {
        var tmp = string1;
        string1 = string2;
        string2 = tmp;
    }

    // Return trivial case - where string1 is contained within string2
    if (string2.Contains(string1))
        return string2.Length - string1.Length;

    var length1 = string1.Length;
    var length2 = string2.Length;

    var d = new int[length1 + 1, length2 + 1];

    for (var i = 0; i <= d.GetUpperBound(0); i++)
        d[i, 0] = i;

    for (var i = 0; i <= d.GetUpperBound(1); i++)
        d[0, i] = i;

    for (var i = 1; i <= d.GetUpperBound(0); i++)
    {
        var im1 = i - 1;
        var im2 = i - 2;
        var minDistance = threshold;

        for (var j = 1; j <= d.GetUpperBound(1); j++)
        {
            var jm1 = j - 1;
            var jm2 = j - 2;
            var cost = string1[im1] == string2[jm1] ? 0 : 1;

            var del = d[im1, j] + 1;
            var ins = d[i, jm1] + 1;
            var sub = d[im1, jm1] + cost;

            //Math.Min is slower than native code here
            //d[i, j] = Math.Min(del, Math.Min(ins, sub));
            d[i, j] = del <= ins && del <= sub ? del : ins <= sub ? ins : sub;

            if (i > 1 && j > 1 && string1[im1] == string2[jm2] 
                         && string1[im2] == string2[jm1])
                d[i, j] = Math.Min(d[i, j], d[im2, jm2] + cost);

            if (d[i, j] < minDistance)
                minDistance = d[i, j];
        }

        if (minDistance > threshold)
            return int.MaxValue;
    }

    return d[d.GetUpperBound(0), d.GetUpperBound(1)] > threshold
        ? int.MaxValue
        : d[d.GetUpperBound(0), d.GetUpperBound(1)];
}

Saturday, March 5, 2011

How failure can breed success...

Having wrapped up two major projects in my first four months with a new company, one quote seems to come to mind: "success is overcoming your failures, one at a time". For a software developer, this point might be frequently overlooked. Small hurdles quickly get overshadowed with the next challenge. Reaching one milestone quickly opens discussion about the next.

But what do we hold onto on our journey? How do we learn from constant change?

One small victory opens a pandora of new opportunity, and a bigger gap for failure at the same time. With great risk, failure is inevitable. Those of us who embrace failure, will also embrace success.

Thanks for reading.