Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Common Crawl code contest - with fresh crawl of 3.2 billion web pages
(
commoncrawl.org
)
23 points
by
Aloisius
on July 18, 2012
|
hide
|
past
|
favorite
|
5 comments
SudarshanP
on July 18, 2012
|
next
[–]
FYI:
http://www.worldwidewebsize.com/
bashorama
on July 18, 2012
|
prev
[–]
Where does it say they have 3.2 billion pages of fresh data?
trojancjs
on July 18, 2012
|
parent
|
next
[–]
This is Chris from Common Crawl. You are right - we didn't have stats about the latest crawl posted. We're putting them up today ...
Joyfield
on July 19, 2012
|
root
|
parent
|
next
[–]
Would it be possible to get a torrent with the text only part?
Aloisius
on July 18, 2012
|
parent
|
prev
[–]
It is the 2012 data release linked from the first paragraph.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: