Critique of Paper by “Deep Learning Conspiracy”

zxcvvcxz · on June 30, 2015

So you mean to tell me that rediscovering older work with faster computers and strategically referencing (or omitting references) to make yourselves look like the founding fathers of a resurgent field is a good way to get your university-funded research lab snatched up for millions of dollars by Big Companies?

And to add onto this, another academic is getting his jimmies rustled because he didn't get the money his former PhD students did??

Heavens to Betsy!

---

Everyone self promotes and does things in their best interests. There is no clear divide between academic and industrial interests. Maybe in something more pure where truths are evident (e.g. pure math). But not something like machine learning, where your success and funding depends on armies of grad students fine-tuning stuff like the number of "hidden units" in some over-complicated model whose ultimate goal is to over fit a training set and cause more hype, etc.

Nothing wrong with this though, progress happens continually, just not linearly: https://en.wikipedia.org/wiki/Hype_cycle

jff · on June 30, 2015

You want to make your way in the CS field? Simple. Calculate rough time of amnesia (hell, 10 years is plenty, probably 10 months is plenty), go to the dusty archives, dig out something fun, and go for it. It's worked for many people, and it can work for you. - Ron Minnich

mturmon · on July 1, 2015

The NIPS community is more prone than most to the hype cycle and to a herd mentality. (It has its strengths as well, but we should also recognize the weaknesses.). It is very very good that there is a corrective influence coming from somewhere.

I used to think that the feud over the origins of backprop was an annoying one-off, but it is not. The field has had many examples of lazy citation patterns, including people not actually reading what they cite, and/or ignoring corrective papers like this one.

The example that bothered me most was the "Gaussian processes" boomlet. It played for a couple of years, but it was basically linear prediction theory redone by people who did not adequately cite prior work nor convey its limitations.

joe_the_user · on June 30, 2015

Well,

What I'm curious about is how such hype cycles affect our understanding of whether this shit actually works.

As far as I can tell, the "great advance" attributed to deep learning is doing benchmarks better than other methods. Which seems to have the result of doing the same old machine learning tasks more accurately than previously.

The main thing we hear about is more "edgy" stuff like learning video game by oneself, describing pictures with phrases or giving answers to "philosophy". But since these applications are easy to cobble together in a half-assed fashion, is there really more here than incremental progress?

I wouldn't know one way or the other, so I'd like to find out.

kastnerkyle · on July 1, 2015

I disagree with the "great advance" dig. If you took a time machine 5 years ago and showed any of the recent advances in deep neural networks (without showing the algorithmic techniques) people would say "this is AI". There are huge fundamental gains happening every day where the rubber meets the road with tasks that could feasibly be seen in the real world.

We are inventing new "real world" benchmarks to try and counteract this (MS COCO, dialog datasets, the big flickr datasets, translation generally), but many approaches from the 90s were clearly right mathematically (as Jurgen says) and just needed more data fuel. So it is obvious to go back and find interesting ideas that didn't get their due as long as proper attribution is given. Profs also have their pet projects that didn't quite pan out, and often want to breathe new life into a cool idea.

These things are only easy to half-assed cobble together in hindsight AND/OR if you have expertise - having the knowledge and know how to input conditional information, interpreting deep networks as modeling joint probability distributions, etc. is just as much algorithmic design as any other task in graphical modeling, statistics etc. Slapping a big deep convnet (or feedforward net) on new datasets IS easy, and usually not interesting scientifically, but also doesn't get published and is reserved for the blogosphere or bad ArXiV papers.

Incremental progress is 0.5% performance gains in major benchmarks like ImageNet etc. - company PR (and university PR as well) will crow about this but no one in academia really cares unless it is accompanied by interesting scientific ideas or fundamental questions being answered.

rndn · on June 30, 2015

So you mean that it's alright that the possibly most prestigious science journal on this planet is used for self promotion?

Retric · on June 30, 2015

Due to publish or perish ALL major journals are used for self promotion.

cbd1984 · on July 1, 2015

Meh. If you define your terms correctly, absolutely nothing is original.

speechduh · on June 30, 2015

So, for anyone not aware: Schmidhuber is _obsessed_ with this. He wrote an enormous literature review of deep learning [0] basically because he felt that people weren't crediting ideas enough. This isn't a one-off essay, for him, he's been banging this drum for quite a while.

Not saying he's wrong, just FYI.

[0] http://arxiv.org/abs/1404.7828

jandrewrogers · on June 30, 2015

While I do not have anything invested in Deep Learning, I do have a similar reaction because I am familiar with the research from 10-20 years ago, particularly around neural Turing machines. From that perspective, most modern Deep Learning is essentially that older research with the primary novelty being better marketing and much faster computers. I can understand why someone like Schmidhuber would be irritated by the apparent assignment of credit to people who are essentially repackaging old computer science, given how much Schmidhuber has done in the field.

DeepMind is a bit of an exception to this. At least one of the founders was involved in quite a bit of original research way back then.

This phenomenon is common in theoretical computer science. Timing and marketing matter a lot when it comes to getting credit for important inventions. I've seen it many times.

sitkack · on June 30, 2015

I go on the cite rant now and then again. It isn't so much for the attribution itself, but that it breaks the knowledge graph. By not citing past or similar work, these researchers prevent others from learning, exploring and ingesting knowledge from a field.

A few of the ideas at play here are:

    * Wanting to appear more cutting edge than is actually the case
    * Limiting or strengthening patent applicability
    * Preventing loss of focus via competitors research

I think researchers should actually get penalized for having a deficient bib.

pmelendez · on June 30, 2015

I started taking machine learning courses in 1999 until later 2001. One of my professors (who had worked with Vapnik back when we didn't know if support vector machines were a good idea) said that he didn't use ANN too much because probably Hinton was the only one who knew how to used them.

I'm telling this anecdote because, even when I agree that we are forgetting to mention a lot of names, that "PR" work that Hinton et all did, was necessary (IMO) to bring ANN back to the mainstream area.

sonabinu · on June 30, 2015

The Canadian research team seems to have kept the work going when others were skeptical http://www.thestar.com/news/world/2015/04/17/how-a-toronto-p...

pmelendez · on June 30, 2015

Exactly my point... I think Hinton deserve a lot of merit for being "stubborn" enough to find grants for a field that a lot of people have serious doubts about.

auvi · on June 30, 2015

I think you mean Hinton not Hilton

pmelendez · on June 30, 2015

Oops... Thanks! I could fix it just in time :)

sirseal · on June 30, 2015

This is a good critique: it's important to cite the people who have laid the early groundwork, regardless of how far in the past that work was done.

sgt101 · on June 30, 2015

Maybe every paper should cite Boole, Church & Flowers?

Or maybe not..

GFK_of_xmaspast · on June 30, 2015

You gonna cite Newtown and / or Leibniz when you differentiate a function? Use a zero? Don't forget to credit Brahmagupta!

fatjokes · on June 30, 2015

Only if you were going to later cite yourself as a pioneer of calculus.

drdeca · on June 30, 2015

maybe just #include <standardCitations.h> ?

irickt · on June 30, 2015

The author certainly seems accomplished, but his tone and egotism undercut his message. For example from the front page of his site:

"His formal theory of creativity & curiosity & fun explains art, science, music, and humor."

I've also read papers of his that take completely off-the-wall pot-shots at other researchers.

joe_the_user · on June 30, 2015

"Since age 15 or so, Prof. Jürgen Schmidhuber's main scientific ambition has been to build an optimal scientist through self-improving Artificial Intelligence (AI), then retire"...

"His formal theory of creativity & curiosity & fun explains art, science, music, and humor."

Maybe he's a mad man and maybe he just hasn't tweaked his "theory of humor" quite enough to know some people won't get it.

eli_gottlieb · on June 30, 2015

What do you mean, maybe? Schmidhuber is insane, in the best possible way. He's exactly the kind of person for whom academia exists: an incredibly competent, devoted scientist who won't stand for bullshit cooked up by marketing departments.

_efcy · on June 30, 2015

He is a nice guy, and very smart. And direct. Nothing wrong with that.

bachback · on June 30, 2015

lecture of the topic by the author: https://www.youtube.com/watch?v=JSNZA8jVcm4

Several founders from Deepmind where his PhD students.

kastnerkyle · on July 1, 2015

One thing which works against the "cite everything" approach is that most of the major conferences have page limits of 8-10 pages with 1 page bonus for references. That means if you go over 1 page of references for (at least NIPS) then you cut into the meat of the paper, reviewers look on in disdain and give poor marks, etc. So you have to actively prune for the most recent and directly relevant citations many times, which sometimes counts out semi-relevant but older work in favor of more relevant recent work.

Much of Dr. Schmidhuber's work is very interesting and especially relevant now that RNNs are really heating up again - but it is sometimes hard to figure out exactly which of his papers to cite because many are partially relevant. And having a full page of only Schmidhuber citations is no good either...

Speaking as a member of the Montreal lab, I am much more up to date with the work that happens here - so it is hard to fight the natural tendency to cite recent papers you know (since they all came from work you know of, cause you were there). Notice too that all 3 (Hinton, LeCun, and Bengio) worked directly together at some point, and collaborated often beyond that. So a version of this is in effect, whereas Juergen has been more separated (both geographically, and work focus wise) than the other 3. NYU Toronto and Montreal are all in an 8 hour triangle!

Not to take anything away from his points (I try to cite as many of his papers as possible without seeming ridiculous, generally) but these are the general factors at play. We cannot possibly cite every paper in the field, and shining the light on new works can be more important than citing older work AS LONG AS there is no claiming as a pure innovation work that was already done "in the nineties".

Claiming to improve some technique or take it from curious to usable is more than fine - but given the recent deep learning hype even recent papers are getting overshadowed by others claiming some new innovation which already exists in very current literature.

Especially given the work that is coming out of industrial labs (Google, FB, MSR, etc.) it is fairly frequent to see the same model being touted as new (with minor citations if lucky) when the exact same technique first appeared 6 months ago. Being well-read is not an option as an academic - it is a requirement! The PR machine of these companies is unfortunately very effective at dominating the airwaves if you have competing or related work, especially if you are not from a school with good press e.g. MIT, Stanford.

quietplatypus · on June 30, 2015

I don't know.

On the one hand, at the conceptual level, unless you are at the cutting edge of CS theory, I'm pretty sure almost anything else that is done in computer science is a mere re-wording of something that was done in the 1970-1980s. So there is no "holier than thou" at this level.

On the other hand, in terms of practical results in context, there are many important consequences of being able to take old concepts and run them faster, because the hardware has improved and well, generally, the entire world is different.

A big part of "popularizing" a technique is having a good implementation that takes advantage of advances in computing speed. So the author of the article misses the practical value of popularizing.

At the same time, what he says is valuable because he touches on a fundamental choice that we all make: do you want to be a groundwork layer or a popularizer?

The problem of course is that groundwork layers are mostly forgotten, with their contributions recognized posthumously, as that's how far out you have to be to lay any new groundwork, and it's difficult to predict what will be the foundation for the next hundreds of years.

It's not just deep learning, it's basically that anything that becomes popular enough to be noticed here probably has a long history behind it, and if we are to move forward we need to be in the headspace of those who had the sense back then to form it, and not be in the space of popularizing or being the tool of the popularizer.

murbard2 · on June 30, 2015

Right or not (I tend to think he is), it's incredibly shortsighted to write such an article which is bound to make him look bitter and low status. If you want to do this, you get a third party to it, come on...

eli_gottlieb · on June 30, 2015

You basically can't make Schmidhuber look "low-status" within the subfield of neural networks. He puts some weird stuff on his personal website, but his work wins contests and his publication record speaks for itself.

murbard2 · on June 30, 2015

Status as in having a nature article centered around you and your work?

There's no denying that he is a brilliant pioneer, and his accomplishments are indeed incredible, but that does not translate into status which is probably very upsetting.

mafribe · on June 30, 2015

There is an informal honour's code among scientists that you don't take credit for other people's ideas. While it is forgivable for a PhD or postdoc to be unaware of older work -- after all there's sooo much of it -- a senior researcher should not act carelessly with attributions. And where they do, it's useful to remind them that they have transgressed dearly held ethical norms.

quasiresearcher · on June 30, 2015

It depends on what you mean by status. His papers are cited 13,493 times. And that's what matters to most scientists.