Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was the data and analytics part of a global team at HomeAway as we were struggling to finally release a "free listing / pay per booking" model to catch up with airbnb, I wired up tracking and a whole bunch of stuff for behavioral analysis including our GA implementation at the time.

Before launch we kept seeing a step in the on-boarding flow where we saw massive drop-off and I kept redflagging it, eventually the product engineering team responsible for that step came back with a bunch of splunk logs saying they couldn't see the drop off and our analytics must be wrong "because it's js", which was just an objectively weird take.

For "silo" reasons this splunk logging was used by product and no one else trusted it or did anything actionable with it as far as I could tell other than internally measuring app response times.

I would not unflag the step and one PM in particular started getting very upset about this and saying our implementation was wrong and he roped a couple senior engineers in for support.

I personally started regression testing that page based on our data and almost immediately caught that any image upload over ~1mb was not working and neither was mobile safari, turned out they had left their mvp code in place and it used flash or something stupid so it would break on image size and some browsers just wouldn't work at all.

It was updated a couple weeks before launch and the go live was as good as could be expected.

To this day I have no clue how this particular team had so misconfigured their server side logging that it HID the problem, but you see it all the time, if you don't know what you're doing and don't know how to validate things your data will actually sabotage you.



You've accidentally described 100% of my experience with Splunk at every org I've worked at: it's so expensive no one is given access to it. It's hard to get logs into it (because of expense). And so you're experience of it is the annointed "splunk team" want something, but you never see how they're using it or really the results at all except when they have an edict they want to hand down because "Splunk says it".


Did they just assume "no errors on the server side, so the problem can't exist"? That's bizarre.


It was odder than no errors, they weren't seeing any funnel drop off at all.

It wasn't worth investigating and fixing for them, at the time I figured they were excluding traffic incorrectly or didn't know how to properly query for "session" data... could have been any number of things though.


A funny pattern I've seen several times is someone querying some data, getting results that don't match their mental model/intuition, then applying a bunch of filters to "reduce noise" until they see the results they expected.

Of course this can easily hide important things.

Made up example. The funnel metrics records three states: in progress, completed, or abandoned. If a user clicks "cancel" or visits another page the state will be set to abandoned, otherwise it eill be in progress until it's completed. Someone notices that a huge percentage of the sessions are in progress, thinks there can't be that many things in progress and we only care about completed or abandoned anyway, and then accidentally filters out everyone who just closed the page in frustation.


Real example: when you're working on the data analysis of one of the "beyond the standard model" physics experiments. For example, there is one where they basically shoot a big laser against a wall and see if anything goes through. Spoiler: it won't.

Such an experiment will usually see nothing and claim an upper bound on the size of some hypothetical effect (thus essentially ruling it out). Such a publication would be reviewed and scrutinized rather haphazardly. Regardless, the results are highly publishable and the scientists working on it are well respected.

Alternatively, the experiment might see something and produce a publication that would shatter modern understanding of physics, which means it would be strongly reviewed and scrutinized and reproduction attempts would happen.

Since the a-priori probability of such an experiment finding something is absurdly low, the second case would almost always lead to an error being found and the scientists involved being shamed. Therefore, when you do data analysis for such an experiment, especially if you want your career to move on to a different field or to industry, you always quickly find ways to explain and filter away any observation as noise.

And no, a lot of them don't use data blinding...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: