Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Topic Analysis of Marc Andreessen’s Tweets (dhurley14.github.io)
16 points by dev1n on Jan 16, 2015 | hide | past | favorite | 5 comments


Get back to me when the implementation can group the topic words into a decent summary.

The rest of it is pretty bush league.


So I did some research and found a way to access article body text without having to use diffbot so that'll be fun. I found a python wrapper for boilerpipe [1] so I plan on redoing this analysis. The amount of data I had to work with was pathetic. This time around I'll utilize jv22222 's suggestion on how to get the hyperlinks out of the tweets too. Thanks all! This was fun :)

[1]: https://github.com/misja/python-boilerpipe


you'll also notice some editorializing the topics (venture -> venture capital)

the lda features are already grouped, that's exactly what an LDA does, however, translating a group of words into a "summary" (whatever that is) is non trivial. You'd find need to define what you're looking for. A visual summary for example might for example be a word cloud, another might be the use of ontology tagging if you consider those a salient summary.


So the 70% of the article explains the process of getting a list of URLs? Switch to Tweepy, please.


Full links are a available as entity references in each tweet in the api.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: