Archive for February 22nd, 2007

Warrick, a tool for recovering websites

February 22, 2007

Anyone who has used the Wayback Machine to recover web material they thought lost will be interested to know about Warrick, a free and open source tool for reconstructing websites using publicly-available caches of old content. From the website:

Warrick is a command-line utility for reconstructing or recovering a website when a back-up is not available. Warrick will search the Internet Archive, Google, MSN, and Yahoo for stored pages and images and will save them to your filesystem. Warrick is most effective at finding cached content in search engines in the first several days after losing the website since the cached versions of pages tend to disappear once the search engine re-crawls your site and can no longer find the pages. Running Warrick multiple times over a period of several days or weeks can increase the number of recovered files because the caches fluctuate daily (especially Yahoo’s). Internet Archive’s repository is at least 6-12 months out of date, and therefore you will only find content from them if your website has been around at least that long. If they don’t have your website archived, you might want to run Warrick again in 6-12 months.

Warrick was created by Frank McCown, a PhD student at Old Dominion University. Thanks, Frank!

If you do face a loss of web material and find yourself in need of Warrick to recover material, here’s an important tip: run Warrick as soon as possible after the loss is noticed, as it consults a number of search-engine caches which are likely to be both more recent and ephemeral than the Archive’s public collection.

Indeed, you should try to run Warrick even before starting to reconstruct your website in place at the original URLs, because as soon as search engines see new content at the same URLs, they’ll start replacing their cached versions with the new content. (It seems that when URLs are responding with ‘404 – not found’ errors, the search engine caches retain the last real content returned, at least for a while.)

Call for papers for the 7th International Web Archiving Workshop

February 22, 2007

Julien Masanès, program chair for the 2007 International Web Archiving Workshop, recently issued the workshop call for papers. From the call:

——————————————————
Objectives:

Main international event in this domain since 2001, IWAW will take place the 3rd week of June, in Canada this year (date and location to be confirmed).
The workshop will provide a cross domain overview on active research and practice in all domains concerned with the acquisition, maintenance and preservation of digital objects for long-term access, with a particular focus on web archiving and studies on effective usage of this type of archives.

——————————————————
Important Dates:

Paper submission: May 1st, 2007
Notification of acceptance: May 15th, 2007
Camera-ready copy due: June 11th, 2007

Details for format and submission will be posted on iwaw.net soon.

——————————————————
Topics:

Case studies:
• Web Archiving Projects,
• Digital Archeology,
• Cyberculture Studies,
• Web Metrics,
• Web Publishing Models.

Data acquisition:
• Harvesting Technology, Focused Crawling,
• Deep Web Capture,
• Site Architecture Migration,
• Authenticity Control of Captured Documents.
• Acquisition of Dynamic Objects,
• Submission Systems,
• Data Ingest,
• Automated Metadata Capture.

Storage Models and Architecture:
• Hierarchical Storage Models,
• Redundant Storage,
• Distributed Storage,
• Storage Media Migration,
• Cost Models,
• Media Life-Time Analysis.

Digital Preservation:
• Conversion/Migration Strategies,
• Emulation Approaches,
• Data Abstraction Technologies,
• Self-Aware Objects,
• Testbeds, File Format Repositories,
• Document Functionality and Behaviour.

Access:
• Access Provision,
• Navigation,
• Web Indexing
• Collection Analysis,
• Information Retrieval,
• Interface Models.

Policy and Social Issues:
• Economics of Information,
• Intellectual Property Rights.
• Challenges and Caveats of Web Archives,
• Scenarios and Visions,
• Privacy Aspects

The Internet Archive has been a frequent contributor to past IWAW events and is looking forward to this year’s event — the first time it has been held in North America.