Here at Flatirons, we’ve traditionally specialized in document management or information applications. I’ve always found software applications that manage documents and information to be interesting – documents are by definition rich repositories of information. So, with the renewed interest in large and/or complex data (dare I say “Big”), it’s an exciting time to work in information application software development.
We’ve also adopted iterative, test-driven development (TDD) practices in our project work. In TDD, the developer writes a “unit test” for each bit of code. These tests are a powerful in a couple of ways. First, they are a great way to document how the code should function. Second, if your code base has thorough unit test coverage, you can boldly refactor and improve your code with the confidence that your unit tests will allow you to quickly identify bugs introduced in any given change to the code.
When developing large and/or complex information applications, I’ve noticed that the code and accompanying unit tests are only part of the story. Often the code will rely on a series of system configurations and information (state) to function correctly in a given environment. This information can include anything from low-level application configuration files to product information, document templates and taxonomies. With the rise of XML, key-value store and JSON-based document databases, it’s becoming increasingly common to have a software application ecosystem with information not just in multiple databases, but in multiple *types* of databases. Add to this multiple development, test and production environments that must all contain the current configurations and information and you begin to understand how unit testing the code is only part of the solution.
To meet this challenge, you can leverage your existing unit testing framework to make assertions about the information. This type of test is not a pure unit test. Information tests do not set up and tear down their own state, but rather make assertions about the state of an environment. Information tests can straightforwardly assert the existence of expected relationships between, say, products and templates, or information in disparate databases and database types. You can quickly validate the setup of a new environment by running this suite of tests.
This approach is most useful when information is static system configuration information that will evolve over time. For example, the information may include relationships between taxonomies and products or relationships between a product and a template. This information has expected relationships that are set up ahead of time and relied upon by software applications (e.g. each product has one and only one template). However, this information will evolve, and expectations about its makeup will change over time (e.g. new products are added, or later a product may have more than one template). Tests help you to verify the referential integrity of the information, help to document the configurations and establish a useful baseline as the information evolves.
In a recent project, we are setting up information tests to run via Jenkins as part of the regular suite of unit tests we run after each build. This way, we are constantly validating the configurations on each environment (development and test) where we do automated builds. It’s also useful to have a way to run this suite of tests against a production environment where you might not normally run unit tests. These tests do not set up or tear down state so are less risky to run against production environments where you may not run your full unit test suite. The tests are useful for troubleshooting data corruptions or to help validate that an installation of a new release was successful.
Just as unit tests help to document code, information tests will help to document the configuration of the system. This documentation is valuable to both developers and QA teams interested in high-quality, maintainable systems. So by testing the information, you help to document it. And in the end, it’s all about the documents.