Access to Around the World in 2 Billion Pages!

by

Thanks to a generous grant from the Mellon Foundation, Internet Archive completed a 2 billion page web crawl in 2007. This is the largest web crawl attempted by Internet Archvie. The project was designed to take a global snapshot of the Web.

Please browse through the resulting collection.

Special thanks to the memory institutions who contributed URLs to the crawl. The crawl began with 18,000 websites from over 60 countries.

Advertisements

2 Responses to “Access to Around the World in 2 Billion Pages!”

  1. Gojomo Says:

    The main collection at http://www.archive.org/ has over 10 years of data, mostly collected in partnership with Alexa by their proprietary crawler, and accessed through the classic Wayback Machine.

    The ‘Around the World’ collection is a single large crawl that occurred in 2007, using our open source Heritrix web crawler, and accessible through the new open source Wayback interface, which has some improvements. Also, this crawl started from sites, as nominated by libraries and memory institutions, that may be less-well represented in Alexa crawls.

    Later this year, the ‘Around the World’ collection will be merged into the classic worldwide collection, and it will all be available via the new Wayback interface. But for now, this alternate entry point is the chance to try the new crawl and Wayback.

  2. alexf2000 Says:

    What is the difference between this collection and the main index at http://www.archive.org?

Comments are closed.


%d bloggers like this: