I've just skimmed the paper, and this new algorithm looks very nice -- the autho...

eugene_ducker · on May 9, 2016

The idea the neighbors of my neighbors are likely to be neighbors is basically an assumption of positive curvature. In large codimension embdedded submanifolds can have very negative curvature, in which case neighbors of neighbors might not be as likely to be neighbors as one might first think.

emn13 · on May 9, 2016

I don't think that's a serious restriction.

Firstly - in general it's trivially true that you have probability 0 to cleanly embed a high dimensional space into a 2/3 dimensional representation over the set of all possible high dimensional data - yet interesting data often does have lower-dimensional structure.

Secondly - so what? Can you think of plausible scenario where this assumption does not hold and it's possible to generate a low-dimensional embedding? If it's impossible to embed, then it's not an algorithmic problem if you fail to find an embedding.

cs702 · on May 9, 2016

Correct. The authors in fact mention this in the paper, and state that this is probably not an issue because in practice most large high-dimensional datasets tend to have points which lie on/near embedded submanifolds of much lower dimension.

vladimirralev · on May 9, 2016

I wonder why they are positioning this as visualisation technique rather than dimensionality reduction which would be a much bigger deal. Is there something to suggest the technique doesn't work well for reduction in arbitrary dimensions?

eximius · on May 9, 2016

Disclaimer: Havent read it but have read summaries here.

It sounds like it is very fast but not very rigorous. This lets you get a feel for the data but it doesn't give you the same guarentees other dimensionality reductions do.

cs702 · on May 9, 2016

That's my sense from skimming the paper.

emn13 · on May 9, 2016

What kind of guarantees are you thinking of?

visarga · on May 9, 2016

I read a paper [1] about visualizing the Deep Q-Network used for playing Atari games. They map to 2D the game state and then colorize it by the Q scores. As a result they could visualize the strategy and observe how it is hierarchically organized in clusters containing sub-clusters. They could show regions associated with initial and termination rules. This method can be used to map out a game strategy and to focus on specific regions.

[1] Graying the black box: Understanding DQNs - https://arxiv.org/pdf/1602.02658.pdf

cbsmith · on May 11, 2016

The dimensionality reduction problem space rarely tries to reduce down to only two or three dimensions. The fact that in the end you are being reduced to so few dimensions allows for a lot of potential errors to not meaningfully impact the result.

st553 · on May 9, 2016

Anyone know if its possible or makes sense to use something like t-sne for dimensionality reduction? If so, could the reduced data set be used to build a classifier?