New and Improved Software Error Continuum

On about the first day of Computer Programming 101, we were taught about a continuum of error types that exist in software.  This continuum spanned a range of error types on a spectrum from “least problematic” to “most problematic.”   That continuum is as follows:

Classic Error Continuum:

  1. Compilation/Syntax Errors
  2. Runtime Exceptions
  3. Logic Errors

I’ve written a previous post about the importance of taking every opportunity to design code that you write in such a way that pushes possible errors towards the less problematic end of the continuum, towards the lowest numbered type possible.

I’d like to revise that continuum to bring it a forward a bit, such that it accounts for more modern testing practices which were omitted due to their near nonexistence at the time I attended college.

New And Improved Error Continuum:

  1. Compilation/Syntax Errors (best)
  2. Automated Test Failures
    1. Unit Test Failures
    2. Integration and U/I Test Failures
  3. Runtime Exceptions
    1. Exceptions which occur close to their cause
    2. Exceptions which occur distant from their cause
  4. Logic Errors (worst)

It is unfortunately common for developers and teams to unwittingly design their code in such a way that pushes likely errors in the worst direction rather than in the best direction.

If each developer on your team makes hundreds or even thousands of decisions in code over the course of months or years of an application’s lifespan, then the necessity of preserving and validating and making explicit each little piece of developer intent is of paramount importance.

The weakest styles of software validation lead to a high likelihood that a project will be prone to the worst type of errors, Logic Errors (#4), which effectively require that every developer maintain constantly in their mind ever piece of every previous developer’s past intent, always and forever, to avoid introducing any defects in any new work that they produce.  This is of course impossible to achieve, and such software will become littered with a much greater rate of defects than is absolutely necessary.  (I’m looking at you, Anemic Domain Model.)

The sooner you acknowledge that you cannot hold an entire application’s code in your head, the sooner you can begin to account for this fact in the design of your code.  I know a lot of renowned Computer Scientists have long since beaten me to the punch on this realization, but alas I cannot locate their quotations on this matter, so for now I’m left with just this, my own paraphrasing.

Encapsulation and DRY Go Hand in Hand

Avoiding the Primitive Obsession anti-pattern goes hand in hand as well, but doesn’t have quite the same ring to it, so I didn’t try to fit it into the title.

In fact, Avoiding Primitive Obsession is a good place to start. Take for example the following code sample:

public class Product
{
    public string Description { get; set; }
    public decimal Price { get; set; }
    public int QuantityInStock { get; set; }
}

Primitive Obsession is a bit hyperbolically named, since it is used to describe code which is “obsessed” with primitives, when it might be more accurate to describe the code that chooses to use language primitives by default, beyond the point that a custom type would have been more suitable.  It has nothing to do with cavemen.

The code sample above uses a System.Decimal numeric type (for those of you in the .NET world) to represent a product’s price.  This might seem sensible at first, until you consider that a decimal can represent values ranging from negative 79,228,162,514,264,337,593,543,950,335 to  positive 79,228,162,514,264,337,593,543,950,335, but surely these are not reasonable values for product prices.

Similarly with System.Int32 that is used to describe the QuantityInStock.  The integer can represent values from about negative 2 billion to positive 2 billion, unlikely for product quantities.

It might seem easy at first glance to simply avoid setting these properties to outrageous values, and in many cases this will indeed be the best approach.  But consider that these value might be set from multiple places in code.  Consider that there might be a business requirement that no product ever be priced greater than some amount well in excess of any actual products carried by the business, say $10,000, as a fraud prevention measure.  Consider that Price values might get set from a multitude of places throughout a very large project.  Now consider the following code snippet:

public class Price
{
    private readonly decimal _price;

    public Price(decimal p)
    {
        if (p < 0)
            throw new ArgumentException("Price must not be less than zero.");
        if (p > 10000)
            throw new ArgumentException("Price exceeds the maximum allowed value.");
        if (p % 0.01m > 0)
            throw new ArgumentException("Price must not contain fractional cents.");

        _price = p;
    }

    public override bool Equals(object other)
    {
        var otherPrice = other as Price;
        if (otherPrice == null)
            return false;
        return otherPrice._price == _price;

    }

    public override int GetHashCode()
    {
        return _price.GetHashCode();
    }

    public static explicit operator decimal(Price price)
    {
        return price._price;
    }
}

If you look at the constructor, you’ll see that it is impossible to initialize an instance of the Price class without an exception being thrown.  This might not seem like a big deal, but many applications let these small bits of validation details leak out all through the remainder of the application code, often repeatedly, and sometimes even inconsistently.

But with this Price implementation, we have ensured that the knowledge of exactly what constitutes a valid Price within our domain is completely Encapsulated in this one place.

An additional possible benefit is that it is now impossible to inadvertently mix Price decimal values with any other decimal values throughout your project.  Admittedly, I have not seem many bugs in the wild caused by such basic numeric mix-ups, but I have however worked with a number of external services (Endeca, AWS Kinesis) which tend to inundate you with a variety of human-unreadable strings which you must not mix up.  Sometimes even the names of the string values (“Shard Iterator, Sequence Number,” etc.) were a bit difficult to keep straight following a night of poor sleep.  By encapsulating such values within their own custom types, the compiler itself will ensure that you do not mix the disparate types.

Still another benefit to this approach is its expressiveness.  The code now contains a Price type which describes everything about a Price in your particular domain.  Using the primitive approach, one has to read through all of the usages of the primitive to deduce the rules surrounding its usage.  Eventually the entirety of rules by which your system operates become impossible to fully deduce, and your team might start to rely heavily on tribal knowledge, when instead the code could itself enforce the correctness of the data which it manipulates.

Naturally given the ~1/2 page of code necessary to create the custom Price type, you probably will not want to create custom types for every single primitive in your system.  Rather it is best to be judicious, and only apply the effort once you have identified particular values in your domain which are particularly susceptible to either large scale duplication of validation logic, or modest scale duplication where the validation itself is slightly more complex.

If you want nearly zero effort type decorations, I’d highly suggest checking out F#.

Alas, I feel that I have reached the length limit for a reasonable blog post, so I will leave off here to resume in a future post, even though I have not even begun to discuss DRY and have only lightly covered encapsulation.  Perhaps I can roll all of this into a single Grand Unified Programming Principle that covers everything.

For now, let me end with the thought, “What if these same principles were applied to objects which represent almost all of the main concepts in your system…?”

 

 

Back to Basics – Compile Time vs. Run Time vs. Logic Errors

I recall being taught at the very beginning of Computer Programming 101 about the distinction among compile time errors, run time errors, and logic errors.  The definitions are easy to memorize by rote and are easy to accept at face value, however throughout my career I have observed code written in a way which suggests that the deeper implications of these error type distinctions might have been lost on the author.  The importance of these distinctions did not sink in for me immediately upon learning them, either.

This might be caused in part by the lack of mention of said deeper implications in any of the educational materials which I can drum up.  Why do we care to define these three categories in the first place?  There must be some reason other than just dividing errors into three arbitrary categories.

The reason we care is that the compile/run time/logic error taxonomy is not flat, in terms of the grief each type can give you.  Rather, the error types typically represent a worsening progression, and this progression is caused by the fact that each error type results in much later feedback to the developer that an error has occurred, than for the error type before it.

Compile-time Errors:

Compile-time (syntax) errors are the easiest to deal with.  While any such errors are in place, the software cannot be built, nor run, nor deployed for the most part.  Many modern IDEs will even notify you of such errors as soon as you type them, without even the need to manually invoke the compilation step.

Run-time Errors:

Run time errors are the second-most problematic, but fortunately they tend to make themselves pretty visible when they occur.  The problem with run time errors is that you might need to execute every possible combination of paths in your code, an impossible task using any testing strategy that exists, for all but the most trivial of applications.  The best you can hope for is to exercise your code enough that you gain some sense of certainty or confidence in it.

But it should be obvious that you should seek out any possible opportunity to move any potential error from a being run-time to a compile-time error, if possible.  Accomplishing this requires a solid understanding of the language in which you are developing, so that you understand the types of checks that your language’s compiler can perform to your benefit. (I should explain at this point that the bulk of my software development experience is with C# a statically-typed, compile-time type checked language.  I don’t have enough experience with any dynamic languages well enough to leverage their dynamicness for design benefit.)

So anyway, time and again I have seen developer after developer pay the Static Typing Ceremony Tax, declaring interfaces and classes and inheritance hierarchies, and yet fail to completely reap the benefit that should be gained by paying it.

Logic Errors:

The final error category is logic errors.  These are errors that occur in your application in such a way that the execution of your application continues without interruption.  Logic errors are often more insidious, because your application is left functioning, but in an unknown state, and it is usually left up to the user to detect that any error has occurred.  In heavily trafficked systems, such as large websites or in web services handling requests from many clients, large amounts of incorrect or corrupted data can be created in a very short amount of time, and often no convenient provisions exist for quick cleanup of this data, so the after effects can be very costly and time-consuming.

Examples:

I typically see developers pushing what could be compile-time errors to run-time due to what I must assume is a lack of understanding of how to leverage the compilation process to detect errors much earlier and much more reliably.  Here is a typical example:

public void AssignAName(IAnimal animal, string name)
{
    if (!(animal is Duck))
    {
        throw new ArgumentException();
    }
    // assign a name to the duck
}

Whenever I see code doing this, I can hear Jerry Seinfeld in my head, saying, “You know how to take the reservation, you just don’t know how to hold the reservation,” except changed to, “You know you’re supposed to code against interfaces, you just have no idea why.”

At least the person who created this snippet was thoughtful enough to throw an exception if the type check fails; not everyone is so conscientious.  Fearing that some audiences might shut down if I were to start preaching high-brow concepts such as the Liskov Substitution Principle (or the Open/Closed principle for that matter), in some cases I like to just refer to this as violating the “Sandwich Principle.”  Imagine if I were to tell you that I’m so hungry, I could eat any type of sandwich in the world!  So you kindly offer me a hoagie, which I reject, stating, “I only accept reubens!”  This is exactly what this code example is doing.  Failing this analogy, I will sometimes try simply explaining, “You’ve made the public-facing interface of your class a liar.”  I’ve also jokingly referred to this type of code as fulfilling the Principle of Most Surprise.  But enough poking fun …

Ideally, the author of this snippet would have simply made the method accept a Duck only.  Therefore, if any other callers to this method attempted to provide any type of argument other than a Duck, the compiler would balk with a descriptive error message.  Instead we are left hoping that some form of testing, manual or automatic, will happen to execute any inappropriate arguments being passed to this method.

Suppose that this method were to check the type of the IAnimal argument to see if it is a Duck, a Goose, or a Falcon?  If you own the object model yourself, then it might be time to introduce a new abstraction, an IBird perhaps.  Why fight against an object model that you yourself (or your team) has created?  This seems surprisingly common, however.  Read up on Semantic Coupling to learn more.  I know that Code Complete Vol II has some good information.

Here is a typical example of code that could cause a run-time error but which has been masked to instead push the issue into logic error territory:

public void DoStuff(arg obj)
{
    if (obj != null)
    {
        // do stuff
    }
}

This is another unfortunately common practice.  In this case, for the sake of argument, assume that there is no reasonable case in which the method’s argument should ever be null, but some crafty developer has checked the argument anyway, and has ensured the method silently returns (and therefore, “succeeds”) in any case.  Perhaps at one point the developer encountered a situation in which the argument itself was actually null and thought, “A-ha!  I know exactly how to solve this,” but the developer never stopped to adequately consider that there was no reasonable case in which the argument ever should be null and therefore, even though it is possible to prevent this method from blowing up, the real problem was elsewhere.

Ask yourself this question:  Is it better to disable the “Checkout” button on the shopping cart of your website, or is it better to litter your code with 1,000 null checks everywhere the ShoppingCart object might be used before it has been initialized?

This is but one practice which I put into a category named, “Fixing the line of code on which a problem manifests itself, rather than fixing the broader context of the application in which the actual error case originated.”  (The previous code snippet is an example of this same behavior.)  After all, the goal is not to prevent exceptions in our software, rather the goal is create software that functions correctly.  Sometimes the best way to accomplish this is to proactively notify ourselves as developers, during the development phase that some problem has occurred, or some unexpected state has been entered into, so that we can correct it.

I have a friend and coworker who makes the hilarious comparison between development practices which push errors in the wrong direction to a Three Stooges episode he once saw (“Cactus Makes Perfect“) in which Stooge #1 falls into a cactus, so Stooge #2 pulls him loose and begins pulling cactus spines out of his butt cheek with a pair of pliers, but Stooge #3 takes a pair of scissors and starts cutting the cactus spines level with the skin on the other butt cheek.  Don’t be the stooge to cut the cactus spines level with the skin.  Completely remove them with pliers instead!

In many cases, choosing to “fail fast” can lead to much more robust software.  Here is a quote from an excellent article from ThoughtWorks on the subject of failing fast, which I highly recommend that you read:  “Failing fast is a nonintuitive technique: “failing immediately and visibly” sounds like it would make our software more fragile, but it actually makes it more robust.  Bugs are easier to find and fix, so fewer go into production.”

So in conclusion, always bear in mind that run-time errors tend to be more problematic than compile-time errors, and logic errors tend to be more problematic than run-time errors.  The closer to compile-time you can cause possible errors in your code to manifest themselves, the better off you (and perhaps your software) will be.  And always consider that the cost to fix a bug increases rapidly with the amount of time it takes to detect the bug.

Visual Studio “Any CPU” == MSBuild “AnyCPU”

Notice the difference in spacing. You select a build platform named “Any CPU” in Visual Studio 2010 (which is automatically created with your solution), but to target that platform when using msbuild.exe from the command line (or from Nant, or whatever), be sure to leave the space out, and use /p:AnyCPU. If you are using Nant, don’t spend an hour trying to figure out the best way to embed quotation marks in your build file to surround your platform target with spaces such as

“Any CPU”

Because this will not resolve the issue, and you will continue to receive your initial error message:

C:\WINDOWS\Microsoft.NET\Framework\v3.5\Microsoft.Common.targets(539,9): error : The OutputPath property is not set for this project. Please check to make sure that you have specified a valid Configuration/Platform combination. Configuration=’Dev’ Platform=’Any CPU’

Apparently this is a known issue, a bug with Visual Studio that was brought to light too late in the beta testing process to be fixed before release:

Use [TestInitialize] / [Setup] Attributes to Simplify Your Test Setup

If you have a class under test into which you are dependency injecting multiple (mocked) dependencies, you might have struggled with strategies for minimizing code duplication when creating your class under test along with its mocked dependencies:

[TestFixture]
public class MyTest
{
    [Test]
    public void Test1
    {
        // Arrange
        var myDependency1 = new Mock<IMyDependency1>();
        var myDependency2 = new Mock<IMyDependency2>();
        var myClassToTest = new MyClassToTest(myDependency1.Object,  
        myDependency2.Object);

        // Test code follows...
    }

    [Test]
    public void Test2
    {
        // Arrange
        var myDependency1 = new Mock<IMyDependency1>();
        var myDependency2 = new Mock<IMyDependency2>();
        var myClassToTest = new MyClassToTest(myDependency1.Object, 
        myDependency2.Object);

        // More test code follows...
    }
}</pre>
<pre>

As you can see, the “Arrange” portion of your test code might contain substantial duplication. If you change the constructor signature of your class being tested, you might wind up fixing compilation errors all over the place.


One naive method might be to try to use a Refactor -> Extract Method style of strategy, such as:

[TestFixture]
public class MyTest
{
    [Test]
    public void Test1
    {
        var myClassToTest = CreateClassToTest();

        // Test code follows...
    }

    [Test]
    public void Test2
    {
        var myClassToTest = CreateClassToTest();

        // More test code follows...
    }

    private MyClassToTest CreateClassToTest()
    {
        var myDependency1 = new Mock<IMyDependency1>();
        var myDependency2 = new Mock<IMyDependency2>();
        return new MyClassToTest(myDependency1.Object, myDependency2.Object);
    }
}

In this example, code duplication is reduced, however you’re left with the problem of your Mocks being out of scope from your actual test methods, so there is no way to perform any .Setup(x => …) actions on them.


For the longest time I worked around this by changing the class fields holding reference to the dependencies inside of MyClassToTest from private to internal, and then using the InternalsVisibleTo assembly attribute to give my test project visibility to the internal members of classes within my project that was being tested. This meant that in a test method, I only needed to create instances of the mocked dependencies which that method itself actually required:

[TestFixture]
public class MyTest
{
    [Test]
    public void Test1
    {
        var myClassToTest = CreateClassToTest();
        var myDependency1 = new Mock<IMyDependency1>();
        myDependency1.Setup(x=> x.GetStuff).Returns(new List<Stuff>());
        myClassToTest.MyDependency1 = myDependency1.Object;

        // Test code follows...
    }

[Test]
public void Test2
{
    var myClassToTest = CreateClassToTest();
    var myDependency2 = new Mock<IMyDependency2>();
    myDependency2.Setup(x=> x.GetOtherStuff).Returns(new List<OtherStuff>());
    myClassToTest.MyDependency2 = myDependency2.Object;

    // More test code follows...
}

private MyClassToTest CreateClassToTest()
{
    var myDependency1 = new Mock<IMyDependency1>();
    var myDependency2 = new Mock<IMyDependency2>();
    return new MyClassToTest(myDependency1.Object, 
        myDependency2.Object);
}

}

Aside from the fact that I was constantly constantly forgetting to assign my mock’s .Object to the property exposed by the class under test, this example can be improved on considerably. It is starting to get cluttered, and I’ve always felt that the whole InternalsVisibleTo trick was a bit of an encapsulation violation.


[TestFixture]
public class MyTest
{
    private Mock<IMyDependency1>  _myDependency1;
    private Mock<IMyDependency2>  _myDependency2;
    private MyClassToTest _myClassToTest;

    [SetUp]
    public void SetUp()
    {
        _myDependency1 = new Mock<IMyDependency1>();
        _myDependency2 = new Mock<IMyDependency2>();
        _myClassToTest = new MyClassToTest(_myDependency1.Object, _myDependency2.Object);
    }

    [Test]
    public void Test1
    {
        _myDependency1.Setup(x=> x.GetStuff).Returns(new List<Stuff>());

        // Test code which operates on _myClassToTest ...
    }

    [Test]
    public void Test2
    {
        _myDependency2.Setup(x=> x.GetOtherStuff).Returns(new List<OtherStuff>());

        // Test code which operates on _myClassToTest ...
    }   
}

There are a number of benefits to this approach. You only have to call new… on each of your object types once and only once. Your test runner will run your [SetUp] / [TestInitialize] method prior to each test method being executed, which wipes out any .Setup(x=>…)s you had for other tests, preserving the independent/isolated nature of your tests (the “I” in the A TRIP mnemonic). Lines of code are reduced, encapsulation is preserved, and perhaps most I importantly (for me), forgetting to assign the mock’s .Object back to the code under test is no longer an issue, once you have your setup/initialize method created properly.

This becomes ever more useful as the number of tests you write and the number of constructor-injected dependencies (possibly) increases.

Improving Your Experience with the VS2012 Test Runner

Tired of slow, long-running unit/integration test suites? Want to speed up your feedback cycle while coding? Me too.

At my current employment I am working with by far the largest Visual Studio solution that I have ever encountered and some of the tests run very slowly. I like a rapid feedback loop: Write a few lines of code, build, run tests, repeat all day. Unfortunately some tests are fast running, some are slow. Fortunately the team I am on has segregated their automated tests using TestCategory attributes with names UnitTest andIntegrationTest.

Although it is a bit unintuitive, you can group your tests according to these attributes within the test runner built into Visual Studio 2012. Simply open the top menu Test->Windows->Test Explorer

… and click the little icon shown in the screen capture. The key point to remember is that “Traits” equals “Test Attribute Names.”

Also this giant solution at work is fairly 3D/Mathematical in nature and as such, many of the unit tests run more slowly than the integration tests, so a further handy tip is to run the tests once, and then group them by Duration, for subsequent test runs while you have VS open. This way I can enter my rapid code writing/feedback loop, only running the tests which execute quickly.

Then, I can run the entire suite prior to committing any code which, if broken, might hinder the efforts of my teammates. Provided that your integration tests can be run on demand without any special setup, it is nice to also have the reassurance that they are functioning along with your unit tests, as long as you’re primarily focused on the vast majority of tests that run very quickly.