Crawl Data Delivered to Bibliotheque National de France


On April 10, 2007, we delivered our third annual contract crawl to Bibliotheque National de France. The collections included a 2006 crawl of the .fr domain and a historical collection spanning March to June of 2005, totaling more than 324 million documents.

New to the 2006 collection was a NutchWAX full-text index of the .fr domain, representing one of the largest deployments of a searchable web archive.

The collections were delivered on a 40-node Petabox storage cluster, complementing BnF’s existing 80-node cluster previously installed by the Web Team in 2005 and 2006. With this delivery, BnF now owns and operates the third largest Petabox installation in the world (after the Internet Archive and Library of Alexandria).

Petabox Racks in BNF RepositoryInternet Archive and BNF installation/crawl team


%d bloggers like this: