Thursday, January 01, 2015

What I Did Over Christmas Break. A Little fun with Docker, Redis and Python


Over the Christmas break, I revisited Python.  Python is a great language and is one of the first scripting languages I ever played with.  I stopped using it years ago as more often then not I am work in a Java/JVM environment and I have found that Groovy is a better fit for my scripting needs.

However, as I have been starting to get into Machine Learning, I have been finding more and more of the examples I am looking at written in Python code.  So over the Christmas break, I started finding myself playing with Python.

Whenever I try a new language, I always use a code kata to exercise some of the basic aspects of the languages.  For this example, I took an example kata I found in a GoLang tutorial.  This tutorial built a small program that would split a comma-separated list of terms passed in on the command line and perform a lookup in Reddit on topics associated with the term.  The tutorial then printed the results to the terminal window.

I have taken this code kata and modified it into two Python scripts.  The first Python script, called, generates a list of Reddit URLs and then pushes the results out to a Redis server.  The application then goes into a loop and waits, listening for a Redis list for any results returned.

The second Python script, called, listens to the Redis server and pulls the URL off of the Redis server and executes an HTTP lookup against Reddit.  The resulting JSON response from Reddit is saved as a String back to a list in the Redis server.

Part of the reason why I split the application into two pieces is so that I can run multiple Python scripts across  multiple servers allowing me to easily scale.

Just to make things interesting, I setup the Redis server to run in a Docker container.

Python and Python Libraries Used

The following versions of Python and Python libraries were used:

  1. Python 3.4.2
  2. httplib2           (Used pip install httplib2)
  3. redis                (Used pip install redis 

Setting up Redis to run in Docker

I did not run Redis natively on my box.  Instead, I used a Docker container to run the Redis instance.   I used boot2docker to run my Docker container.  

Note:  Follow the boot2docker installation instructions to install boot2docker.  

Once you have installed boot2Docker, following the instructions below to install Redis.

1.  Start up boot2Docker by launching the boot2docker application from the finder:

You should see the following terminal window open:

2.  I then wrote the following Docker file configuration file.  This configuration file when run will
create a Docker container and install wget, gcc and Redis in an Ubuntu-based container.  The Docker file is shown below:

FROM ubuntu:14.10

# Force apt update
RUN apt-get update -q

# Install wget, gcc and make
RUN apt-get install -yq wget
RUN apt-get install -yq gcc
RUN apt-get install -yq make

#Build Redis in the temp directory
RUN cd /tmp && \
    wget && \
    tar xvzf redis-stable.tar.gz && \
    cd redis-stable && \
    make install

# Start redis-server
CMD /usr/local/bin/redis-server

# expose redis ports

Note:  This file has to be saved with the name Dockerfile

3.  To actually build the Docker container image, issue the following commands in the shell created in step 2.   Note:  To run this command you need to be in the directory where you saved the Dockerfile created above.

docker build --rm -t=redis .

You should see a significant amount of activity running through the terminal window.  If everything runs successfully, you should see a success indication in the terminal window:

4.  Once a container image has been created issue, the following command to start the container in the open shell window:

docker run -d -p 6379:6379 redis

The above command will start the redis container created above and expose port 6379 on the container.

5.   Once the container has been started you will need to redirect a port locally to the VM being managed by boot2docker.  To do this open a new terminal window and issue the following boot2docker command:

boot2docker ssh -L6379:localhost:6379

At this point, Redis should be running within a docker container and ready to receive messages from our Python scripts.

6.  To shutdown your boot2docker instance, simply type boot2docker down.

Note:  Once you have created a docker image, you do not need to rebuild the container from scratch.    Instead you can launch boot2docker and then execute steps 4 and 5 to start your Redis container.


The script is used to parse the comma-separated list of terms passed in via the script command-line and push them out to a Redis list for processing.  The full code for the script is shown below:

import sys
import redis
import json

conn    = redis.Redis()
results =  [conn.rpush('urls', "{0},{0}.json".format(term) ) for term in sys.argv[1].split(',') ]

while True:
    msg = conn.blpop("results")

    raw = msg[1].decode("utf-8")

    term = raw[:raw.find(",")]
    jsonPayload = raw[raw.find(",")+1:]
    results = json.loads(jsonPayload)

    for children in results['data']['children']:
        print ("{0}--->{1}".format(term,children['data']['title']))

The first line of the script conn = redis.Redis() connects to the Redis server we have running in Docker.  Since we have passed in no parameters, the connection will by default look for the Redis server on port 6379 of the localhost.

Note:  Remember the port redirect we did earlier in our Docker container setup.

After a Redis connection has been established the code will use a Python comprehension to split apart the values passed in via the command-line argument, format a URL and push them out to Redis list called urls.   The URL being sent to Redis will be placed at the end of the list via the conn.rpush call.

results =  [conn.rpush('urls', "{0},{0}.json".format(term) ) for term in sys.argv[1].split(',') ]

Originally, I wrote the above code using a map function call and an anonymous function (e.g. a Python  lambda), but I really liked the elegance of the comprehension.  It is a very compact mechanism for executing over a list.

If you look closely at the code above, I actually send a comma-separated list to Redis.  The first item in the list is the term I am searching for, while the second item is the formatted URL.  

urls', "{0},{0}.json".format(term)

I do not want to lose track of the term being sent to Redis.  Redis only accepts string and numeric values and I was too lazy to build a JSON string,

After all of the terms I want to search Reddit on have been sent to Redis, the script will enter a while loop and wait for results to be sent back from the script.

while True:
    msg = conn.blpop("results")

    raw = msg[1].decode("utf-8")

    term = raw[:raw.find(",")]
    jsonPayload = raw[raw.find(",")+1:]
    results = json.loads(jsonPayload)

    for children in results['data']['children']:
        print ("{0}--->{1}".format(term,children['data']['title']))

The first line of code in the above while loop will pop a message off of a Redis list called results.  However, it will block until a message is actually received.

msg = conn.blpop("results")

The message coming off the Redis list will again be a comma-separated list (again I am being lazy). The first item in the list will be the term that was searched on.  The second item will be the JSON payload returned from Reddit.  Since there is a chance that the JSON payload will contained commas and other characters that would break a standard CSV parser, I pull the string apart manually using the following code:

term = raw[:raw.find(",")]
jsonPayload = raw[raw.find(",")+1:]

Once the term and the JSON payload have been separated, I use the JSON parser included in the base Python libraries to convert the JSON payload to a standard Python dictionary structure. 

for children in results['data']['children']:
        print ("{0}--->{1}".format(term,children['data']['title']))

This allows me to then walk through the returned data using a standard Python construct. 
I did not use a Python comprehension because I am just printing out the results and do not need anything further with the data.


The second Python script in my code kata,, is what actually executes the lookup in Reddit for the term being passed in.

import redis
import httplib2

def lookupUrl(url):
    resp,content=httplib2.Http('.cache').request(url, "GET")
    if (resp.status==200):
        return content.decode("utf-8")
      return "Error"
conn = redis.Redis()

while True:
    msg = conn.blpop("urls")

    values = msg[1].decode('utf-8').split(",")
    term,url = values[0],values[1]
    print("Term {0}".format(term))
    result = lookupUrl(url)

    if result!="Error":

We are going to start in the middle of the script with the while loop.  This loop is going to do a blocking pop of the URLs Redis list urls.  

msg = conn.blpop("urls")

The message coming of the list will be decoded into a UTF-8 string and then split into its components pieces of the term and url

values = msg[1].decode('utf-8').split(",")
term,url = values[0],values[1]

Once of the URL we call the lookupUrl function and store the JSON payload in the result variable. 

result = lookupUrl(url)

The lookupUrl function uses the httplib2 library to execute the actual GET call against Reddit  using the url variable passed into the lookupUrl function.

def lookupUrl(url):
    resp,content=httplib2.Http('.cache').request(url, "GET")

The code then checks the response back from Reddit.   If the HTTP response code is 200, it returns the JSON payload as a string from the lookupUrl function.  Otherwise, it returns a simple string containing the word "Error". 

Once the lookupUrl function returns a result our while loop will push the result to a Redis list called results.

Some Closing Thoughts

Obviously the scripts presented here are examples only and are not meant to be picked up and used in production.   The error handling is non-existent and the scripts just run in a loop.  What this exercise did allow me to do though is:

1.  Play around with Docker and get a Docker container running.  Docker is an amazingly cool 

2.  Kick the tires on Python again.  In particular, I was again struck with how easy it is to pick up 
     Python and run.   Python is a "low friction" language that allows you seamlessly blend OO and 
     functional programming concepts.  I find more times then not I can guess at the syntax of 
     something in Python because the language just makes sense.

3.  Play with Redis.   Redis is a simple and easy to use key-value store database and using it to build
     simple point-to-point queue was incredibly simple.  I especially like how lightweight it is.

Sunday, March 17, 2013

Getting Started with Lein

Note:   These are just some ramblings as I work with Clojure.  This has not been edited or is meant for consumption out of the ramblings I have been working on.

Started playing with the Leiningen build system for Clojure today.   It is suppose to be a cleaner simpler version of Maven/Gradle.

At this point, it was not difficult to setup.  I had to start with the following actions:

  1. Download the Lein installation script.  The script can be downloaded off of the Leiningen (now referred to as Lein) website at this link.

  2. After I downloaded the script, I setup up my PATH directory (I am using OS X) to point to the directory where I have lein installed and ran:


  3. This kicked off the installation process.  All files were installed in the self-installed directory.

  4. Once everything installed, I was able to start my first Clojure project.  I am currently working on a simple project, I call cvssplitter.  This is a test project to kick the tires on Clojure and better understand the LISP programming model.  To get started with my project, I issued the following command:

    lein new cvssplitter

  5. This creates a new directory structure for me with the following directories and files created for me:

    cvssplitter                             -->    Root directory for my project.
         src                                    -->   Source directory for all Clojure project files
         test                                   -->   Test directory for all Clojure test projects
         doc                                  -->    Directory for all Clojure documentation.
         project.clj                         -->  Clojure file that describes the project and its corresponding
                                                         jar dependencies.
         target                                -->  Output directory for all compiled class files
  6. After the project was created, I modified the following the file:
                core.clj                       -->  This is the file I modified.
  7. I  added two new functions to the code in the file:
(ns cvssplitter.core
   (:require [ :as io])                                <-- br="" imported="" library="" the="">)
(defn foo                                                                      <- br="" by="" generated="" his="" lein="" nbsp="" was="">  "I don't do a whole lot."
  (println "Hello, World!"  x))

(defn loadfile                                                              <-- a="" br="" is="" library="" new="" this="">  [file-name]
  (with-open [rdr (io/reader file-name)]
    (doseq [line (line-seq rdr)]
      (println line)))

(defn -main[] (foo "Sean"))                                        <-- a="" added="" blockquote="" main="" method="">
8.  Once the code was completed, I loaded a repl via lein:

      lein repl

9.  In the repl, I entered the following commands:

     (require 'cvssplitter.core)
     (cvssplitter.core/loadfile "/Users/carnellj/projects/clojure/cvssplitter/resources/2011-gainful-employment.csv")
10.  The small script then loaded the file and printed out the contents of the file.


Saturday, November 13, 2010

Bruce Tate and Seven Languages in 7 Weeks

I was in Milwaukee yesterday attending a one-day seminar taught by Bruce Tate called "7 Programming Languages in 7 Weeks." Bruce based the talk on his new book "7 Programming Languages in 7 Weeks."

The seminar was exhausting, but it was great overall overview of 7 different programming languages. The languages covered by Bruce included:

  1. Ruby
  2. IO
  3. Scala
  4. Erlang
  5. Prolog
  6. Clojure
  7. Haskell
The one message that I took away from this class is that different programming languages force you to think about problems in different ways. For many of us in the IT profession, we really have been straight-jacketed into thinking the only way to solve a problem is through an OO-based language like Java or C#.

Bruce took the 7 languages listed above and not only showed the basics of the languages, but often gave some great examples of how to solve complicated problems using the language. My personal favorite was when he devised a Soduku solver using Prolog in about 20-30 lines of code.

So, even though I don't nearly as often as I should, I got a lot out of the class and would encourage anyone why is interested in stepping out of their programming comfort zone to pick up his book and check it out.

Saturday, November 24, 2007

The Seduction of Standardization

One of the driving tenents of Lean is that standardization significantly drives down variation. A reduction in variation means less wasted movement and a reduction in cost.

From a Lean perspective, standardization of process is more important then standardization of tools. Lean practitioners know that you must use the right tool for the job and not necessarily the same tool for the job. What you want is consistency in the process of how a job is executed.

One of the most common mistakes I have seen in IT organizations is the blind insistence on the adoption of a single technology platform. The idea is that by forcing everyone to use the same development language, framework, application server and technology platform an organization can significantly reduce their overall support and training costs.

This is a naive outlook to have because it basically says that every system you develop can be solved in the exact same manner. Every technology has specific constraints and costs associated with them. How you solve a problem is in many ways going to be dictated by the constrains of the technology being used.

Let me give you an example of this phenomena. J2EE is an enterprise stack that allows you to build high-volume applications that can manage transactionality across multiple data sources. Many organizations adopt J2EE as their standard and then in turn force everyone to build their applications on this stack. (Does the refrain: "Its the corporate standard" sound familiar?) Most applications in an organization do not have the complexity and overhead required to mandate the use of a J2EE stack. Over the last several years I have seen many organizations where even small applications that are read-only, with a low volume of activity and a single database end-up getting built using EJBs and deployed to a full blown J2EE application server.

Most of these applications could have been built using a dynamic scripting language (e.g. JRuby and Groovy) and deployed on Tomcat.

Here in lies the problem. Building applications on a complex, but "standard" stack is waste. It is waste in a number of different ways including:

  1. Defects - Using a complex platform significantly increases the chance that something will go wrong. This usually results in more defects and increased support costs as development resources have to deal with these defects instead of building new functionality.

  2. Unnecessary Lead Time - Because complex platforms require specialized skill sets you are naturally going to have fewer people with the ability to manage these platforms. This makes these individuals the bottleneck in your IT organization as they are the only ones who can identify and resolve issues.

  3. Financial Cost - Many organizations end up buying their application servers. These licensing costs can be extremely expensive. Making development teams use these products when they do not need them is an overallocation of overhead. Projects who should have never used a J2EE application server basically end up subsidizing the projects who do need this complexity.
If organizations truly want to see the benefits of standardization they need to let development teams work with the "right" tools to do the job. For instance, I personally no longer consider J2EE a platform. The Java Virtual Machine (JVM) is the platform and J2EE is an implementation choice. That means that depending on the problem being solved my development teams might use Groovy/Grails for web-based applications, Java/J2EE for components with multi-transactional requirements and a functional programming language like Scala to handle applications that require a high degree of scalability and throughput.

The benefits of standardization come into play when development teams standardize their development processes. Development teams in an organization should agree on a standard mechanism for documentation, how they manage their source control, how they build their code and how they unit test. Most developers can pick up a new language quickly. Standardizing on process allows developers to be moved between teams and be able to beging contributing quickly.

Organizations should be looking at supporting not a single one-size fits all language, but rather a toolbox with different options right-sized to the problem at hand. This is counter-intuitive as most organizations believe that learning new languages and techniques are incredibly expensive. However, software development is in many ways like a machining shop that produces a wide variety of different types of parts. Every part produced can be radically different. Machinists have to set their machine up to produce the right part.

Companies need to give developers flexibility and let them pick the right technology set for the problem at hand. Conversely, developers need to realize that they must constantly re-invest in themselves and learn new technologies and techniques. Training and new skills are not sole responsibility of your employer.

Ultimately it comes down to this: A machine shop would not lay down a mandate that all of their operators must use only a hacksaw and hammer because using these standardized tools will drive down operation costs. The corresponding time required to manufacture parts and the amount of re-work would go through the roof and quickly drive that shop out of business. Why require the development teams in your organization to do the same?

Sunday, November 12, 2006

Breaking the Chains of Dependency

Managing dependencies between classes is not something a lot of developers like to think about when they write code. After all, when you are in the flow of writing code, if you need an object you simply create and use it. What could be easier?
However, the small decisions we make while in the flow of coding can have a significant impact on the long-term maintainability of the applications you build.

This became evident to me the other day as I was writing a piece of code and thinking about how I wanted to unit test the code. The code I was unit testing was rather simple, but I had one problem: I needed two classes (CompareBundleLocations and CompareLocations) that were the responsibility of another developer. This developer was behind schedule and and I knew I could write my code and unit test it around him.

Typically the class and method I would have written would have looked like this:

public MyClass {

public LocationResults compareTwoLocations(Location pLocationA,
Location pLocationB){
if ( pLocationA.checkBundling() ){
CompareBundleLocations cpl = new CompareBundleLocations();
return, pLocationB);
CompareLocations cl = new CompareLocation();
return, pLocationB);

Functionally, this class works, but I also made this class very difficult to unit test. When I write my unit test I have no way of stubbing out the behaviors of the CompareBundleLocations and CompareLocations class. The creation of the two objects are hard-coded inside of my compare() method.

One thing I could do is define a "Stub" for each class and change my actual code to use the stub classes. I have two problems with this. First, I am not testing the actual code that would go into production. By hard-coding my stubs into the actual class I want to test, I am changing the base behavior of the class I am testing. Secondly, I always run the risk of forgetting to change my code back from the "Stub" to the actual class after it has been delivered by the developer.

The problem is that by the act of instantiating the objects within my method, I have locked myself into using that class. I not change easily "plug-in" new functionality.

Lets take a step back and start refactoring the class:

public MyClass {
private CompareBundleLocations mCBL = null;
private CompareLocations mCL = null;

public MyClass(){
private mCBL = new CompareBundleLocations();
private mCL = new CompareLocations();

public LocationResults compareTwoLocations(Location pLocationA,
Location pLocationB){
if ( pLocationA.getCheckBundling() ){
return getCompareBundleLocations().compare(pLocationA, pLocationB);
return, pLocationB);

protected CompareBundleLocations getCompareBundleLocations(){
return mCBL;

protected void setCompareBundleLocations(CompareBundleLocations pCBL){
mCBL = pCBL;

protected CompareLocations getCompareLocations(){
return mCL;

protected void setCompareLocations(CompareLocations pCL){
mCL = pCL;

One of the major differences between this version of MyClass and the earlier version is that I no longer directly instantiate the CompareLocations and CompareBundleLocations directly inside of my compare() method. Instead I instantiate an instance of these classes in the constructor of MyClass. When the compare() method wants to use these two objects, it retrieves them by using the get() methods.

Doing this extra work up can save a significant amount of effort in a multi-person development environment. Lets say the developer writing the CompareLocations and CompareBundleLocations classes are running behind. All they have done is provided you with a stub or interface for their classes.

All you want to do is test that your code is behaving the way you expect it to. So what you could do is write a test class that behaves in the following manner:

public TestMyClass extends Test{

public TestMyClass(String pArg){

public testCompare_NonBundled(){
Location locationA = new Location();
Location locationB = new Location();



CompareLocations compareLocations = new NonBundledCompareSearch();
MyClass myClass = new MyClass();
myClass.setCompareLocations( compareLocations );

LocationResult result =, locationB);

assertTrue(result.getLocationMatch() );

public testCompare_Bundled(){
Location locationA = new Location();
Location locationB = new Location();



CompareLocations compareLocations = new BundledCompareSearch();
MyClass myClass = new MyClass();
myClass.setCompareLocations( compareLocations );

LocationResult result =, locationB);

assertFalse(result.getLocationMatch() );

class NonBundledCompareSearch extends CompareLocations{
public LocationResults compare(Location pLocationA, Location pLocationB){
LocationResults results = new LocationResults();

results.setLocationMatch( true );

return results;

class BundledCompareSearch extends CompareBundleLocations{

public LocationResults compare(Location pLocationA, Location pLocationB){
LocationResults results = new LocationResults();

results.setLocationMatch( false );

return results;

So what just happened here? By removing the dependencies from the compare() method, can inject new functionality into a MyClass instance.

CompareLocations compareLocations = new NonBundledCompareSearch();
MyClass myClass = new MyClass();
myClass.setCompareLocations( compareLocations );

This is extremely useful because I need to unit test the code in MyClass and not in the CompareLocations and CompareBundleLocations classes. By using inner classes to extend the CompareLocations and the CompareBundleLocations classes, I can prove my code works and that the appropriate behavior is being exercised.

One of the biggest traps developers fall into when writing their unit tests is that they start worrying about the behavior of other classes. By managing the dependencies between your classes you can focus on the behavior of a single class.

This technique is called Inversion of Control (IOC) or Dependency Injection (DI). By using IOC as a design principle, a developer can modify the behavior of their objects at run-time.
The behavior of the compare() method has been changed by using inner classes that override the behavior of dependent objects used inside the compare() method.

This same behavior could be duplicated by using Spring and having different Spring configurations for our unit tests. One Spring configuration could contain our "live" code configuration and another Spring configuration could be used Strictly for unit testing.

In the end by thinking about how to test our code and learning to managing object dependencies, you can end up with extremely flexible and maintainable code.

Saturday, November 11, 2006

You Touch the Code, You Break the Code

I have always been a big advocate of test-driven development. One of the primary reasons I like it, is that it is a humbling experience and reminds me that I have to be on a constant look out for even the most basic bone-headed mistakes.

The other day, I needed to write a piece of code that called our geography service to calculate the distance between two lat/long points. It is a simple call that takes four parameters: the latitude and longitude of the first point and the latitude and longitude of the second point. I decided to not clutter up my function with this call and broke it into a small method that looked something like:

protected getLatLongDistance(Location pLocationA, Location pLocationB){
return .... (lat/long call);

As is my habit these days as soon as I wrote the code, I immediately wrote a unit test to prove the method worked. Much to my amazement the unit test failed. When I reviewed the code, I found two defects. Here is the actual code below (the defects become glaringly obvious):

public getLatLongDistance(Location pLocationA, Location pLocationB){
return getGeoService().calculateDistance(
pLocationB.getLatitude(), pLocation.getLatitude() );

Simple bone-headed mistakes.... That is the point of test-driven development. Test driven development lets you find your mistakes early. If I had waited to write my unit tests until the end of the coding phase (which many people do), I would have had to wade through a significant amount of code to find my issue. Also, I would have not had the right mind-set. It is always more difficult to find an error in a piece of code you have not been working on recently.

I know my development team hates hearing it, but "Any time you touch code, you break code until you can prove otherwise."

Wednesday, March 09, 2005

The Importance of Daily Builds

The Scenario

I was talking with a colleague of mine today and he mentioned to me that he is having trouble getting his team to deliver a working build. He has large development team with over 20 developers, with the majority of them working offshore.

I asked him if his team was doing daily builds. He said no and asked me if my team does daily builds. For the most part we do (sometimes we slack :( ). It has always been a fundamental belief that many of the integration problems that rise during large scale software development can be exposed by building the entire project every day and running all of the project's unit tests.

My colleague asked me if I could document some of my thoughts on the subject of daily builds. Rather then write them up in an e-mail, I decided to post them in "Monkeys with Keyboards."

John's Thoughts on Builds
  1. Build Daily (If not more often) - Pulling all of your source code from the source control system and building it helps shake out stupid integration issues. Remember people make mistakes and forget to check code in or get sloppy with their code and do not check to see if it compiles. (Ohh, this small change won't break anything! ) Checking out the code daily and building the entire application finds these issues.

    A few years ago I was working in Boston leading a team of 13 developers. We had daily build schedule and because of this we were able to always give our QA group a build within 2 hours or less.

    A team right down the hall from us did their builds once ever two weeks and would build the entire application on the developers machine. It took them 2 days to deliver the build and often times the QA department would find major and blatantly obvious problems with the build.

    All of these things could have been avoided if they would have done a daily build and then quickly run through all of the screens in their application.

  2. Build in a clean room environment - When your team does the build do it on a clean machine and not on a developer's desktop's. A developer's desktop is suppose to be a play area and can have often have configuration and JAR files on it that can cause integration problems to be hidden. Nothing is more irritating to discover that the application built on a developer's desktop only to find out there desktop environment was configured differently then the machine on where the build is actually going to run.

  3. Automate your builds - If builds are not easy to do developers will find a way to avoid them. Automate your builds to the point where it is simple as a push of the button. A well-behaved build script:

    • Check out all of the source code from the source code repository.
    • Compile all of the code
    • Run all of the unit tests and code coverage
    • Run all documentation (i.e. JAVADocs tasks)
    • JAR/ZIP any deliverables up
    • Push the build out to an integration server
    • Notify the buildmaster that the build is complete
    • Cleanup any temporary build files or tasks

  4. If you can setup your build to use a continous integration tool like Cruise Control or AntHill. These tools will poll your source control system every X minutes and check to see if new files have been checked in since the last build. If the tool(s) finds new files, they can automatically kick of the build. Remember:
    The success of your build practices will be strongly correlated to how easy and automated the build process is.

  5. Smoke test your builds - After the build has been deployed the buildmaster should walk through each of the screens and check to make sure major pieces of functionality are working. This does not mean checking every detail of the screen, but rather seeing if after you build your application does the server it runs fall over. The idea behind "smoke" tests are not new. I believe they were first implemented as a practice at Microsoft and were documented in Steve McConnell's book Code Complete.

    For smoke test's your team should have a simple checklist of functionality that is run through after each build. If all of the items on the checklist pass then the build has passed the smoke test and is ready for consumption by testers.

  6. Enforce check-in standards - Developers never like to check in their code until it is "done." Done is a relative code. My team's rule of thumb is this:
    Check code in everyday. Only check your code in if it compiles.
    Code is the one of truest sources of looking at the health of a project. If a developer does not know what they are doing wouldn't you rather find the problems right away rather then wait until the very end and find out that there are serious problems in the code being delivered.

    The enforcing of checking-in standards is extremely important in a distributed development environment because the developers do not have the luxury of talking to each other throughout the day. Since problems can take days to weeks to manifest themselves, have the ability to look at code everyday is critical.

  7. Everyone is responsible for the build (Share the pain) - Everyone on the team should know how to do the build and should take turns doing the builds. This is particularly important when instituting daily builds in a team that has never done. After a few nights of having to stay late to fix a build problem, developers will very quickly become more conscious of how they check-in code.

  8. If a build fails fix it or re-deploy the old build - This is a big one. If a build fails it is the buildmaster's responsibility to have the person who broke the build fix the code. If that person is not available, it is the build master's responsibility to fix the problem.

    If the buildmaster can not fix the problem they have to recover the development/testing server back to the previous days build. This is critical because:
    • Developers often forget that QA and business analysts rely on a build to be stable for testing and/or demoing. Nothing frustrates them more when they go out to the application and it is just broke. Often times these folks are held accountable for getting through their test cases within an X amount of time. Having the test environment unavailable because the build is broke puts them behind schedule. (Believe or not developers are not the only ones who have to work weekends.)

    • A build is a testimony to where the team is at in the development efforts. Not having a new build available everyday can end up hiding serious problems until the end of the project.
My final thoughts on this whole subject are that builds must be:

  • Automated - The build runs with minimal effort from the development team.
  • Repeatable - Follow a process to ensure that the same steps are used regardless of who does the build.
  • Dependable - Builds must be available to a QA or business analyst with minimal effort.
  • Opaque - Builds must expose problems quickly and obviously. The quicker a problem is exposed to daylight, the quicker the development team can work to resolve the issue.