Library of Congress has formally announced a collaborative partnership with Internet Archive, the California Digital Library, the University of North Texas Libraries, and the U.S. Government Printing Office to preserve .gov websites during upcoming presidential transition. There is a story covering the announcement in the Washington Post as well.
Internet Archive’s role in the project will be to focus on the harvesting of websites in the .gov domain using Heritrix, the open source web crawler developed at IA. The project will serve to preserve at-risk government websites that are likely to change dramatically from one administration to the next. The resulting collection will be publicly accessible starting in February 2009.
Internet Archive has played a key role in archiving past administrative transitions with the U.S. National Archives both in 2004 and with the congressional change in 2006. These past harvests are freely accessible online.