6.6 Parity in Development and Production

We’ve established the importance of having clearly defined build and deployment processes. In a similar vein, we have the different application environments like development, production, staging, feature branches, SaaS vs. on-premise environments, and so on. Environments are divergent by definition, we are going to end up with different features in different environments, whether they are debugging facilities, product features, or performance optimizations.

Whenever we incorporate environment-specific feature flags or logic, we need to pay attention to the discrepancies introduced by these changes. Could the environment-dependant logic be tightened so that the bare minimum divergence is introduced? Should we isolate the newly introduced logic fork into a single module that takes care of as many aspects of the divergence as possible? Could the flags that are enabled as we’re developing features for an specific environment result in inadvertently introducing bugs into other environments where a different set of flags is enabled?

Conversely, the opposite is true. Like with many things programming, creating these divergences is relatively easy, whereas deleting them might prove most challenging. This difficulty arises from the unknown situations we might not typically run into during development or unit testing, but which are still valid situations in our production environments.

As an example, consider the following scenario. We have a production application using Content-Security-Policy rules to mitigate malicious attack vectors. For the development environment, we also add a few extra rules like 'unsafe-inline' letting our developer tools manipulate the page so that code and style changes are reloaded without requiring a full page refresh, speeding up our precious development productivity and saving time. Our application already has a component that users can leverage to edit programming source code, but we now have a requirement to change that component.

We swap the current component with the a new one from our company’s own component framework, so we know it’s battle-tested and works well in other production applications developed in house. We test things in our local development environment, and everything works as expected. Tests pass. Other developers review our code, test locally in their own environments as well, and find nothing wrong with it. We merge our code, and a couple weeks later deploy to production. Before long, we start getting support requests about the code editing feature being broken, and need to roll back the changeset which introduced the new code editor.

What went wrong? We didn’t notice the fact that the new component doesn’t work unless style-src: 'unsafe-inline' is present. Given that we allow inline styles in development, catering to our convenient developer tools, this wasn’t a problem during development or local testing performed by our team mates. However when we deploy to production, which follows a more strict set of CSP rules, the 'unsafe-inline rule is not served, and the component breaks down.

The problem here is that we had a divergence in parity which prevented us from identifying a limitation in the new component: it uses inline styles to position the text cursor. This is at odds with our strict CSP rules, but it can’t be properly identified because our development environment is more lax about CSP than production is.

As much as possible, we should strive to keep these kinds of divergences to a minimum, because if we don’t, bugs might find their way to production, and a customer might end up reporting the bug to us. Merely being aware of discrepancies like this is not enough, because it’s not practical nor effective to keep these logic gates in your head so that whenever you’re implementing a change you mentally go through the motions of how the change would differ if your code was running in production instead.

Proper integration testing might catch many of these kinds of mistakes, but that won’t always be the case.