In my experience, the problem has a lot to do with how teams organize around ML....

tixocloud · on April 28, 2020

This. In my experiences across larger enterprises, data science teams rarely hold the key to production environments and therefore, relies heavily on IT to productionizing ML. And I completely agree that data scientists need to be focused on measuring and analyzing success as opposed to churning more and more models.

moandcompany · on April 28, 2020

In many cases, this is not just because production teams don't want data science teams to be able to deploy to production --due to lack of trust or confidence- but data science teams often don't want this responsibility.

softwaredoug · on April 28, 2020

There’s also perhaps a syndrome of not wanting to do the organizational work to do ML well. Instead of changing the whole org by integrating ML into every team, a data science team is hired doing god knows what. There’s status assigned to being the “data scientist” and they work away, siloed, on fun sounding deep learning models. In this mode, if they produce anything it’s impractical, divorced from the product realities, and rather hard for the main engineering/product org to maintain or implement.

The reality is there’s more work to embracing ML than hiring data scientists. Everyone needs to understand ML a little, and it needs to be OK to critically question data science work from product and engineering angles.

moandcompany · on April 28, 2020

Another aspect of this I've observed -- Personal sense of value (and industry pay goes into this) contributes to partitioning of work. If we're charitable, it's from a belief of comparative advantage, and if we are brutally honest about some people, it's because people often feel that "_____ isn't a good use of their time." This is also fed by the "sexiest job of the 21-st century" saying that's been created.

We see this in data science and machine learning where people complain about spending their time cleaning data, etc... when their time should be spent "generating insights/etc." We also see that those insights are interesting but not very useful if they aren't actionable, too costly or too impractical to implement.

Ultimate value is related to being able to contribute to and achieve the holistic outcome, but the lens of success is often focused on models or insights instead. This is a cultural and organizational problem, rather than a technological one. It also takes a dose of humility to appreciate the true value of the so-called dirty work.

softwaredoug · on April 29, 2020

I see this with my own work. I maintain the Elasticsearch Learning to Rank plugin. People assume it's all magic machine learning. The reality is much of the work involves understanding Elasticsearch plugins, informed by machine learning that needs to happen. Oh and 50% of the work is support and fun things like Maven repos :)

tixocloud · on April 29, 2020

Another point is that while we technologists love to marvel at data science and machine learning, it still begs the question of what value does it bring to the business. Does the added responsibility of creating all the infrastructure and processes worth it to justify a 5% increase in conversion rates? As you say, even the dirty work has a cost and whether that cost is worth paying to find out that there's nothing you can do to improve the business. That's why all the massive multi-year central data warehouse cleansing type projects keep failing without yielding much value. There's just a lack of focus on delivering incremental value with these data projects.

atupis · on April 29, 2020

I think currently it creates new possibilities to do business. Computer vision is that point that nowadays it is more engineering than data science so adding something like somewhat good object detection is not that hard. NLP is probably same point where cv was 5 years ago so we start seeing very good NLP models.

tixocloud · on April 29, 2020

Definitely. Even as an engineer working on CV 10 years ago, the hard part wasn’t object detection but rather network bandwidth to stream incredible amounts of data and processing it in real-time.

Spoke to an experienced engineer who used to lead NLP at MSFT and same comment. NLP models are already fantastic and it isn’t very hard to build a smart chatbot. The implementations these days are just very poor because they are not well thought out from a user perspective.

tixocloud · on April 28, 2020

Huge insight and definitely on point. Where I've worked, data science teams are focused on business impact. More responsibility requires larger budgets and at times, creates a burden. Plus I have a sense not a lot of senior executives know how to hire ML engineers in the first place as they come from a business background and would rather leave it to IT.

softwaredoug · on April 28, 2020

Exactly! Frankly I see so many naive assumptions on quantitatively measuring user behavior (like CTR means success!). I wish more time was spent robustly understanding the users behavior rather than just jumping to optimizing one unquestioned metric with a model.

Optimizing a loss function is far far easier than finding the right loss function(s)

winrid · on April 29, 2020

I agree! We had this org structure and had tons of problems. But certain small teams that worked cross functionality were very productive

c3534l · on April 29, 2020

> When you have engineering team separate than

Which is the whole idea behind DevOps: to break down the barriers between development and deployment by focusing on rapid iteration to production by continuously integrating changes into that pipeline.

It's ironic that DevOps has become a specialty in and of itself. The idea is to get rid of separate teams, not create a new one!

softwaredoug · on April 29, 2020

Oh I agree! And it's funny when I run into DevOps teams with that are rebranded Ops teams :)