The story of how I lost a ton of signups to a bug I never saw and could never reproduce

Shortly after launching Tamboo, I saw that our registration page was getting good traffic and that people were signing up for the service — but a fair number of them weren’t completing the onboarding process.
Getting people through the onboarding process was something I was concerned about from the start. Since Tamboo requires users to place a tracking code on their website before they can use the service, I knew I had to make that process as smooth as possible to remove any possible friction. I spent a good amount of time creating an onboarding wizard that would help people through the steps as quickly and as painlessly as possible.
But based on the numbers, it seemed that people weren’t following through with the onboarding. This was pretty much the thing I feared the most rearing its ugly head.
I was determined to find out why these users didn’t go through the onboarding process. I had spent a ton of time refining the process with a number of beta customers and I thought everything was solid based on their feedback. So what was keeping them from going forward?
The first thing I did was reach out to those people who created an account but didn’t finish onboarding via email. Being a big fan of providing great customer service, I wish I could say that’s what made the difference here. But surprisingly, no one responded to any email inquiries.
Something was definitely amiss.
At this point, my only option was to “go to the tapes” as it were. If you’re unfamiliar with Tamboo, it’s a service that lets you record what your website visitors do and watch those recordings as short videos. And of course I run Tamboo on Tamboo (pretty meta, right?).
When I started watching videos where the person registered but didn’t complete the onboarding process, I was blown away.
What I saw was that people were signing up as normal, but instead of getting redirected to the onboarding screens, they stayed stuck on the registration page and then tried to register again (and again, and again).
Here’s an example of one of those sessions:

You can see from this screenshot that this user hit the registration page five times in a row trying to sign up — and then just left.
Seeing that they were using a mobile device, I immediately whipped out my iPhone and tried to sign up myself — I just tested this! How could this have just broken?!
And of course, when I did it, it just worked.
In case you’re not a software developer, this is what’s known as a non-reproducable bug. Which means that you know there is a bug, but you can’t reproduce it yourself. And if you can’t reproduce it, you can’t study it to understand what’s wrong, and you can’t confirm that you’ve actually fixed it. It’s the type of thing that gives developers cold sweats and nightmares.
At this point, I realized why no one had emailed me back. They were either pissed off or thought my app was a joke. That realization made my stomach churn.
The next 24 hours were frought with anxiety.
Because the problem was only with certain Android devices running Chrome, I got my hands on every Android device that I could to try and reproduce what I was seeing — to no avail. I looked over the registration code until my eyes were red and could see no obvious reasons this should be happening. I tried testing over and over again with Developer Tools turned on and saw nothing that event hinted at a reason this should be happening. All of this led me to realize that it wasn’t a bug with my code, it was some odd mobile browser behavior that only happened under certain conditions, and that I most likely could not just “fix it”. Like I said — cold sweats and nightmares.
Finally, I took a step back and thought about the problem differently.
If this were to happen again down the road for any reason, even if it shouldn’t, what could I do to safeguard against it?
And so I put in some logic on the registration page that detected if the user had tried to sign up before, and if so, simply said “It looks like you’re already signed up! Click here to sign in.” with a link to the login page.
I pushed that code change live and waited, watching videos of new signups to Tamboo to see if the bug reared its ugly head again and if my “fix” would solve the issue.
After a few hours, my nemesis returned.
Only this time, my workaround prevailed and the user was able to move through the onboarding process (cue the fist pumping!).
The moral of this story? (Aside from another fine example of Murphy’s Law, or that things always behave differently “in the wild”?)
Unless I was able to actually watch how people were going through the registration process, I would have thought that there was something wrong with the onboarding process. I would have probably tried a million different tweaks, knob turns, and changes to get the onboarding process “right” — when it was never the problem in the first place.
Instead of wasting hours (and probably days or weeks) guessing what the problem was, I was able to see first hand what the real issue was. And even when I couldn’t reproduce it myself, I was still able to see that the fix I came up with did in fact remediate the issue.
