You won’t get everything right on the first try, and your ability to respond and correct your course will play an outsized role in the success of your business. Whether it’s fixing a bug or making a significant upgrade to your software, you need systems in place that enable rapid, safe, and iterative improvement.
This includes your technical systems and your operational systems. As your business grows, so will your systems. You don’t want to overoptimize processes, but when it comes to building systems of any kind, optimizing your ability to quickly and safely iterate is almost always a safe bet.
When your business is small, your ability to adapt is your sharpest competitive advantage. This is why so many hulking companies have their lunches eaten by small startups. Larger companies have political, procedural, and technical baggage that slow them down and prevent them from making the changes necessary to improve. As a small business, you won’t face any of those limitations. Instead, you have an opportunity to iterate and improve at speeds unimaginable to more bureaucratic or less technically efficient companies–but you have to be deliberate about it.
Nothing enables iteration like automation. When you start a new business, you have a small team that can’t possibly stay on top of all of the facets of your business that matter. Good iteration relies on you having a level of confidence that every release will be successful, and, in those few cases where something goes sideways, you need to be confident that you’ll be alerted right away so you can fix things efficiently.
In a technical sense, your ability to iterate will largely depend on your tooling and configuration. Many of these processes and systems will require significant up-front investment that will slow you down at first. Waiting to implement them, however, will increase your setup and configuration time.
Through everything, your ability to iterate has to remain a constant priority. Processes and systems tend towards entropy; they become slower and more cumbersome over time. If you don’t invest in keeping them lean, you’ll constantly struggle to ship faster now, at the expense of your ability to ship faster later on.
This book can’t cover the intricacies of implementing the perfect process for your specific project and tools. Instead, we’ll review the high-level components that enable iteration. Once you’re familiar with the concept, finding the tools for your stack is fairly straightforward. The important part is ensuring that your team values iteration, and the tools and processes that enable it.
What are the key pieces? We’ll cover them more extensively in the chapter about development processes, but let’s explore how some of the components work together to enable iteration.
Continuous integration is the core of iteration. Continuous integration is a process that runs automatically in the background and ensures your full codebase is in a well-tested and working state. This way, if new code breaks any old code, you’re alerted right away.
However, continuous integration is worthless unless you commit to writing and maintaining thorough tests, but we’ll cover that in depth later. The simple version is that developers write automated tests in conjunction with any new code. These tests are designed to make sure that the code written works as intended. They also serve as regression tests to prevent future changes from breaking the original intended behavior without warning. This is your foundation, and it’s your insurance against regressions. If you release new features quickly but break old features in the process, that’s not iteration–it’s just treading water. You can’t iterate and move forward if you’re constantly fixing new things that break.
In conjunction with continuous integration, you’ll need to set up tools and systems to monitor the quality and consistency of your code. This includes standardizing on linters in your team’s development environment, as well as automated static analysis of the code whenever new code is committed. These tools watch your codebase for common security or syntax mistakes, as well as guarantee that code standards are followed by your entire team. This frees up your team members to focus on more high-level improvements during code reviews instead of getting bogged down in syntactic standards. Plus, machines are just better at this kind of work.
One key process that will always require a human touch is code reviews. At the simplest level, code reviews ensure that no code ever goes into production without at least someone else setting eyes on it. Code reviews are great for catching mistakes, but they’re also the best learning opportunity for your team. They take time, and they slow the process down in the short term, but the benefits (in terms of both preventing mistakes and learning) quickly outweigh the extra time needed for the review. In the long run, your team will be able to move more quickly with fewer mistakes.
Continuous integration, linters, and code reviews will help facilitate higher quality code, but you still have to ship that code. You need a rock-solid release process, which will need to constantly evolve and improve over time as your stack complexity evolves. At minimum, the yardstick for a good release process is that it requires only a single command. If you can make it a single push of a button, even better.
The more steps in your release process, the more likely it is that something will go wrong. It takes time, but rest assured: mistakes will cost much more. So automate your release process, and ensure that resilience is built in for when something goes wrong with a release. If a release requires more than one command, there’s room for improvement.
A great release process is only half the battle, though. You also need processes and tools to roll back bad releases, or quickly troubleshoot, identify problems, and quickly ship updates for problematic releases. A huge part of releasing with confidence is knowing how quickly you can react if something goes wrong. This isn’t an excuse to play fast and loose with releases, but it’s inevitable that something will go wrong. And when your application is offline or broken, you’ll perform much better when you have the tools to put it in a safe state while you resolve the issue. Like backups, the ability to roll back or quickly respond to a problematic release is easily taken for granted if not used regularly. So it’s important to test it from time to time, and ensure it’s still safe and reliable.
No good release process is complete without automated smoke tests. After every release, you should run some basic tests to verify that nothing major broke during the release. This can be as simple as a check to see that your application is available, or it could be as complex as an automated login to test that the application is responding. With Sifter, I used simple and automated web requests to our marketing site and our login page at the end of every release. These checks would display the resulting status on the command line, as well as post to Slack. If anything was off we knew right away, and could investigate and do a quick rollback if necessary. It also did some basic checks on key processes like search and background processing to ensure they were running and available.
Once your release process is running smoothly, you can start looking into continuous delivery, which helps ensure your software is always ready to release, and, in the most advanced cases, it can be released automatically. This isn’t something to approach haphazardly, though. Think of continuous delivery as a level of maturity in your other processes. Continuously delivering broken code doesn’t help anyone, but once all of your other processes and quality control are running reliably, continuous delivery is worth investigating.
With web applications, you also must have tools to constantly monitor uptime and alert you if anything goes wrong. Often, with new releases, things look fine at first, but after some time, new code can create problems that lead to downtime. In those cases, you’ll need to know as quickly as possible.
In addition to monitoring uptime, you’ll also need a trusted third party to check your production application for security holes. Using linters and static analysis can help mitigate these kinds of problems, but there’s always room for other issues in your production application. As these can change with each release, it’s best to have a tool automatically monitoring security in the background. Be ready, though: the first time you set up and run a security monitoring tool, you could be in for some downtime or a lot of new work. The downtime can happen if the security tool floods your site with more traffic than it can handle. The new work will be a result of any issues that it uncovers. In Sifter’s case, it added an extra week of work to validate the issues and fix the genuine problems. Of course, it’s better to discover these problems before hackers do, but it’s still going to take some time.
You should also minimize your external dependencies, and set up monitoring for the dependencies you do have. It’s inevitable that your code will rely on external dependencies, but it’s important to remember that adding external dependencies comes with a cost. Those dependencies could have bugs or security problems, be abandoned by the developer, or create other conflicts within your application. In any case, you’ll need to stay on top of them and make sure they’re regularly updated. The best way to do this is to enlist tools to help monitor for updates and automatically let you know if a problem arises.
At this point, you’ve got all of your bases covered for static analysis and your release process, but there’s one more really important factor to rapid iteration: fixing bugs and refactoring.
When you ship new code, you introduce bugs. Even fixing bugs creates an opportunity to introduce new bugs. In many cases, “fixed” code is more likely to contain errors because the developer fixing it is focused only on the bug at hand. Here’s one of my favorite riddles to help illustrate the idea:
Ryan’s web application has 10 bugs. Ryan fixes 8 of the bugs. How many bugs does Ryan’s application have now? 12. 12 bugs.
Well-written tests and code reviews can help reduce the number of bugs, but no process is perfect, and production environments have a way of introducing new and unexpected scenarios that are great at uncovering bugs nobody could have anticipated. This is where exception handling and alerting comes in. It’s imperative that your application has extensive monitoring for and logging of exceptions. If something goes wrong, you need to know quickly. You also need to know if it’s widespread, where it’s happening, who it’s affecting, and every possible detail you can that will help you fix it.
The final piece of a mature iterative process is customer communication. A release isn’t done once it’s live. A release is done when you let customers know and give them clear channels for feedback. Not all of the issues will be problems automated systems can detect. This means systematizing your customer communication. Your customers need to be aware of and able to report those types of issues that your automated systems aren’t capable of detecting.
In the case of large releases with significant customer-facing components, you’ll likely want to start preparing customers for the release well in advance by sharing news and screenshots of the impending changes. You could use social media, newsletters, your blog, a public changelog, or even in-app notifications. Nobody likes being surprised by big changes, and the more notice you give customers before and after changes, the smoother things will go.
Building applications is easy enough. The challenging part is continuing to improve and manage the iterative process without having that process decay into chaos. Steady, reliable, and consistent improvement is the foundation for success, and that all depends entirely on how quickly and safely your processes enable you to iterate. These tools and processes aren’t simply nice-to-have. They’re the heart and soul of your ability to safely and efficiently improve your business. Invest in them, and they’ll reward you. Ignore them, and you may move quickly in the short term, but you’ll quickly become bogged down in the long term.
Post-Deploy Smoke Tests Nathaniel Talbott of Spreedly provides a simple example of post-deploy smoke tests and related notifications.