Topic Analysis of Marc Andreessen’s Tweets

languagehacker · on Jan 16, 2015

Get back to me when the implementation can group the topic words into a decent summary.

The rest of it is pretty bush league.

dev1n · on Jan 17, 2015

So I did some research and found a way to access article body text without having to use diffbot so that'll be fun. I found a python wrapper for boilerpipe [1] so I plan on redoing this analysis. The amount of data I had to work with was pathetic. This time around I'll utilize jv22222 's suggestion on how to get the hyperlinks out of the tweets too. Thanks all! This was fun :)

[1]: https://github.com/misja/python-boilerpipe

hnriot · on Jan 16, 2015

you'll also notice some editorializing the topics (venture -> venture capital)

the lda features are already grouped, that's exactly what an LDA does, however, translating a group of words into a "summary" (whatever that is) is non trivial. You'd find need to define what you're looking for. A visual summary for example might for example be a word cloud, another might be the use of ontology tagging if you consider those a salient summary.

mahouse · on Jan 16, 2015

So the 70% of the article explains the process of getting a list of URLs? Switch to Tweepy, please.

jv22222 · on Jan 17, 2015

Full links are a available as entity references in each tweet in the api.