Real-World Testing

If your company sells video games then your users' goal is playing the game; testing the game is a side-effect. Whether their obscure environments break your client software or they somehow trigger impossible game logic bugs, the sheer number of real-world users and the variation in their interactions are going to find problems you never even imagined were possible. More users also means more unhappy customers and a whole new pile of problems to fix.

For better or for worse, all software projects have some variation of real-world testing and a corresponding feedback loop for fixing the problems it finds. What are the benefits and costs of real-world testing?

Who is it for: Mostly users, since their needs are what they’re testing. Some of the testing can also measure whether the software meets organizational needs. Consider a high traffic website which makes money by advertising. Changes to the design of the website can have a significant impact on ad revenue, but the only way to truly measure the impact is go live with the changes.

Strengths: Real-world testing will catch problems that other forms of testing have not. And if done right, real-world testing can mitigate the cost of defects by channeling user unhappiness into productive bug reports, and by redirecting the main testing effort to users who are more willing to take on the cost of finding defects.

Weaknesses: Feedback from real-world testing is slow to non-existent, and for the most part only occurs after problems have impacted users or production systems. This reduces the benefits of finding problems since discovered problems are hard to reproduce and fixing the problem happens late in the development cycle.

Cost: Developing software infrastructure to help users report problems, and make debugging problems easier, can take significant resources. Focusing testing on particular subset of users may be easy or difficult, depending on the particulars of a product.

Voice and exit

You will recall from earlier chapters that unhappy users have two choices: voicing their concerns or exiting, i.e. giving up on your product. Since software products usually have no purpose without users your organization usually want your users to stick around. And since your users are after all testing your software for you, hearing their problems means your development team actually have a chance to fix the issues they find. Your goal then is to reduce the costs of defects in your software by encouraging voice and discouraging exit. You can do so by:

  1. Making it easy to report problems.

  2. Actively gathering feedback from users who would otherwise just grumble to themselves, or worse yet have already given up on the product.

  3. Fixing problems quickly, before they hit other users.

The feedback loop for fixing problems requires noticing the problem, communicating it to the development team, figuring out how to address it and then distributing the fix. If you get all of them right, you can reduce the negative impact of defects caught by real-world testing and take advantage of the testing your users are providing.

Noticing problems

The first step to dealing with the problems discovered by real-world testing is to notice them: unnoticed problems will take longer to solve, potentially causing far more damage. Crashes are easy to notice, but other problems may be harder to detect. A confusing user interface may irritate your users, but perhaps not consciously enough to articulate the problem. Similarly, missing documentation, minor bugs and missing functionality may go unnoticed, as motivated users will find a workaround and less motivated users simply give up. By paying close attention to users' questions, bug reports and feature requests you can often discover related, underlying problems the users were not even aware of. Ongoing usability testing can also help you find these issues. Online applications can also use analytics to discover problems, e.g. a feature that never gets used or a path that users abandon half way.

You should also be on the lookout for problems that the resiliency of your system hides from users: crashes, data loss in a replicated system, slowdowns. Monitoring of your system is key to discovering these problems before they get bad enough to start affecting users.

Communicating problems

Once your users notice a problem you want them to communicate with the development team. We want this to happen as close as possible in time to when the problem occurred:

  1. To help with diagnosis. User recall varies from decent in the 5 second range to "there was a flying moose on the screen? or maybe a hyena" a week later. Additionally, if we have a two-way communication channel with the user reporting the problem we can ask for more details or further investigation in situ.

  2. To reduce the latency in our problem feedback loop. Faster reporting mean faster solutions.

We also want to make it as easy as possible for the user to report the problem. Each additional step, e.g. creating an account in an issue tracker, means another set of less motivated users dropping out of the reporting process.

For example, an unhelpful error page on a website will say something like "An error has occurred, please refresh your browser. If the issue persists please report the problem." The page gives no indication of how or where the user should report the problem. You can do better by:

  • Providing a form for reporting the error.

  • If you’re automatically logging and then analyzing errors, say so and then ask for additional feedback: "The problem has been logged and we will be looking into it within an hour. If you want to add additional feedback about what you were doing please do so in the form below."

  • Even better, provide a link to a status page where the user can track your progress in fixing this issue.

More generally:

  • Every user interface page or screen should have a way to give feedback. Command-line tools never seem to do this. This may be reasonable for an open source project with limited resources, but commercial organizations can and should do better.

  • Your software should report every obvious error to you as automatically as possible. This is commonly done for crashes in desktop applications, where a post-crash dialog asks users if they wish to report the problem. Opt-in is important to ensure users' privacy.

  • Look for other opportunities to explicitly ask for feedback, since some users will never get around to reporting problems they encounter. If a user is looking at your software’s documentation or help pages then maybe they are having a problem you care about. But make sure you don’t annoy your users with constant pestering.

  • If at all possible, gather contact information to allow for clarification and questions about the problem later on.

  • Try to identify users who have chosen exit, e.g. users who haven’t renewed a subscription, and ask them what went wrong.

  • Thank your users for giving you feedback and try to respond as quickly as possible. A note from a human being will be much more effective than an automated response, but the latter is better than silence.

Addressing problems

Once you know about a problem you can then address it…​ assuming you can understand the underlying cause. If diagnosing problems is impossible, or even just too expensive, you’re not going to do it and the problem will remain. And that means your users will continue to be unhappy.

Let’s look at a common bug report:

Your program crashed and deleted a day’s worth of my hopes and dreams. I hope you’re proud of yourself.

Many bug reports are similarly unhelpful. Even if the bug report is sufficiently detailed it can often be hard to reproduce the problem. Most other forms of testing happened in controlled environments, which makes investigation and reproduction easier. Real-world testing happens out in the wild, and reproducing that elusive crash may well be impossible.

A great way to improve the quality of bug reports is to gather as much information as possible about the application’s state, and then make sure it gets included in the problem report. As always, you should respect your users' privacy and never include information from remote applications without getting explicit opt-in from the user.

  • For crashes, you should be automatically capturing stack traces, what the user was doing, and so on, and as discussed above sending a bug report automatically.

  • More generally, you can add extensive logging to your program and include the relevant logs either automatically in the bug report, or perhaps allow log extraction and uploading via an easy-to-use tool.

Releasing and distributing the fix

You or a user have noticed a problem, communicated it to you, after which managed to diagnose and fix it. Now all you have to do is get the fixed version into your users' hands…​ and unless and until you do so all of that effort is pointless.

Releasing software typically involves testing beyond real-world testing, as we’ll discuss in later chapters. The slower and less automated this process, and the release process in general, the more time will elapse between fix and release. Similarly distribution of software can range from effectively instantaneous (deploying new software to a server) to manual process that requires complex human intervention (upgrading the firmware on a physical server’s power supply). At the fastest extreme are systems like websites where an appropriate architecture can allow Continuous Deployment (CD): a programmer checks in code to a source code repository, and the CD system then runs automated tests, and if they pass automatically deploys a new version of the software. Even in cases where distribution and release is more difficult, a concerted effort can lower release times by orders of magnitude. The Firefox browser, for example, used to take a month from fix to release, and even then users didn’t always get fixes until long after. More recently the time from important bug fix to users getting an update is less than 24 hours.

Active testing

So far we’ve talked about improving passive real-world testing: improving the feedback cycle for defects. In addition to waiting for bug reports to arrive you can also take more active measures: you can use real-world testing to find problems even as you reduce their impact on the majority of your users.

Pre-releases

If you distribute your software to end users as a fixed release, releasing work-in-progress versions of your software can both catch problems earlier and make your users happier. Since pre-releases are opt-in, the problems they contain will only encountered by more adventurous users. By warning them upfront that they are participating in a testing effort you can reduce the chances of users getting angry when they encounter bugs.

Staged rollout

For live online systems like websites staged rollout is an equivalent to pre-releases. You ask some percentage of users to opt-in to a new version of the code that is running in parallel to the old code. Providing a way to fall back to the old code keeps adventurous users happy if the new version is sufficiently broken, and also provides an avenue for feedback. Over time you can switch more users to the new code, and eventually you can switch the remaining users over automatically without an opt-in.

A/B testing

Whereas staged rollout is voluntary and can support testing of significant changes to functionality, A/B testing is a way of doing involuntary testing of more minor changes while still limiting the impact of problems. Consider the advertising-driven website which has to worry about even the smallest of design changes decreasing ad revenue. At the end of each day if revenue is lower than expected you can roll back any recently changes deployed. Unfortunately decreased revenue doesn’t necessarily tell us which of the changes caused the problem, and rollback happens after ad revenue has decreased.

An improved technique is A/B testing. You show some small percentage of randomly chosen visitor a new design; the rest continue to see the current design. You can then compare the revenue from the current and new designs in isolation from other changes. If the new design improves revenue you can deploy it as the default; if it decreases revenue then the cost of discovering this was low.

Dogfooding

"Dogfooding", or internal testing, means using one’s own software during development. In cases where the software you are writing is also software you can use in your development team or organization you can test your software by using it. For example, Google uses GMail internally for email and so can test new features or designs on employees before releasing the software publicly. Members of your development team or organization can be useful initial users insofar as they have the ability and motivation to voice and communicate problems, and are far less likely to choose exit as a solution. You can combine dogfooding with pre-releases, staged rollouts or A/B testing where applicable.

Even when you can apply these techniques, real-world testing still provides slow, costly feedback. As we discussed in previous chapters, other testing techniques can find defects before users starts using your product, providing faster feedback and hopefully reduced costs as well.