About five years after people first tried to automate my work (long before AI — just by moving data construction into generalized dashboards), I finally managed to articulate why it has always felt fundamentally wrong to me.
And spoiler: no, it’s not because I’m afraid AI will take my job and I’ll end up drinking under a bridge — if anything, that kind of techno-optimism is something we should aspire to!
Here’s my favorite fact about my professional pattern recognition: I’ve personally run a few hundred A/B tests and reviewed roughly the same number run by other people. On the surface, many of them require the same steps: build at least some kind of baseline — retention, conversion to purchase, average check… If we’re showing visuals, we look at CTR. In short, we start with the basics. So why not just put everything into a dashboard where it calculates itself, while I sip my coffee?
Well — no.
When I’m faced with a dashboard that immediately shows all metrics, my dumb brain instinctively latches onto the one with the biggest difference between test and control. It doesn’t even try to understand whether my test could possibly be related to that metric. Nope. If I ran a test enabling a new ad placement and suddenly the number of whales increased — my brain will sprint straight toward that number.
But if, instead, I first check… say, the number of users who watched at least one ad, and then the number of users who watched exactly one ad — that’s when I start asking much more relevant questions. I won’t even glance at whales who randomly wandered into the test. I’ll have to write code by hand. I’ll have to think about what I want to focus on — and ask the right questions. Because a good product manager is a damn lazy creature!
When you’re handed 20 metrics upfront, it’s incredibly easy to invent a plausible explanation after you’ve seen everything. But it should be the exact opposite: first explain — then verify.
Yes, sometimes you do end up seeing something completely unexpected. Then you have to explain it — but that’s already in the realm of things that aren’t visible on a dashboard. Take my ad example: for some reason, views increased on all other ad placements too. Why? Shouldn’t there be cannibalization? And then it turns out that, for the first time, the SDK was integrated not like complete garbage, and when users saw this first ad, they didn’t encounter errors on other placements anymore (sorry, engineers — I know it doesn’t really work like that, but I had to invent an example).
So no — you can’t automate test analysis. The time allocated for analysis exists for a reason. A test should force a person to think, to doubt, and most importantly — to set priorities. You have half a day to a day to make a decision.
When you have 100 charts for that decision, you realize you can’t possibly study them all in that time. You stare at them for 20 minutes and make a decision based on “this is how I see it.”
When you start with nothing and extract everything yourself, you’ll end up with maybe a dozen charts. Only five of them will be useful — but those five will actually tell the story of what happened.