Archive for the ‘Conferences’ Category

Internet Archive at OSCON

July 24, 2008

Tomorrow, at the O’Reilly Open Source Convention in Portland, I’ll be presenting a session about our open source web archiving tools. Full details:

Build Your Own Web Archive: archive.org’s Open Source Tools to Crawl, Access & Search Web Captures
Gordon Mohr (Internet Archive, Web Group)
11:35am Friday, 07/25/2008
Web Applications
Location: E145

The Internet Archive, with support from other libraries around the world, has helped develop a collection of open source tools in Java to support web archiving. These include the Heritrix archival web crawler, “Wayback” for replaying historic web content, and extensions to Nutch for web archive full-text search. This session will explain the design and capabilities these tools, and quickly demo their use for the creation of a small personal web archive.

Heritrix has been designed for faithful and complete content archiving but has also found use in other web search contexts. Wayback allows URL-based lookup and follow-up browsing of archived web content. Nutch, as applied to archival web crawls, allows Google-style full-text search of web content, including the same content as it changes over time. Together, they provide everything necessary to archive and access accurate historical records of web-published content.

Also: last month James Turner of O’Reilly Media spoke to me in advance of OSCON. You can read or hear the interview at: Gordon Mohr Takes Us Inside the Internet Archives.

Call for papers for the 8th International Web Archiving Workshop

January 14, 2008

Julien Masanès, program chair for the 2008 International Web Archiving Workshop, recently issued the workshop call for papers. From the call:

Objectives:

Main international event in this domain since 2001, will likely take place the
18th and 19th of September 2008, in conjunction with ECDL in Aarhus (Denmark)
this year.
The workshop will provide a cross domain overview on active research and
practice in all domains concerned with the acquisition, maintenance and
preservation of digital objects for long-term access, with a particular focus
on web archiving and studies on effective usage of this type of archives.

Important Dates:

Paper submission: 19th of May 2008 (url for submission coming soon).

Notification of acceptance: June 16th, 2008

Camera-ready copy due: July 14th, 2008

Workshop: September 18th and 19th, 2008

Please post submission using ACM template.

Topics:

Case studies:
• Web Archiving Projects,
• Digital Archeology,
• Cyberculture Studies,
• Web Metrics,
• Web Publishing Models.

Data acquisition:
• Harvesting Technology, Focused Crawling,
• Deep Web Capture,
• Site Architecture Migration,
• Authenticity Control of Captured Documents.
• Acquisition of Dynamic Objects,
• Submission Systems,
• Data Ingest,
• Automated Metadata Capture.

Storage Models and Architecture:
• Hierarchical Storage Models,
• Redundant Storage,
• Distributed Storage,
• Storage Media Migration,
• Cost Models,
• Media Life-Time Analysis.

Digital Preservation:
• Conversion/Migration Strategies,
• Emulation Approaches,
• Data Abstraction Technologies,
• Self-Aware Objects,
• Testbeds, File Format Repositories,
• Document Functionality and Behaviour.

Access:
• Access Provision,
• Navigation,
• Web Indexing
• Collection Analysis,
• Information Retrieval,
• Interface Models.

Policy and Social Issues:
• Economics of Information,
• Intellectual Property Rights.
• Challenges and Caveats of Web Archives,
• Scenarios and Visions,
• Privacy Aspects

Workshop Officials:

Chair:
Julien Masanès (European Archive / e-mail : julien AT iwaw.net)
Andreas Rauber (Vienna University of Technology, Austria)

See details on http://www.iwaw.net/

Internet Archive at IWAW

June 21, 2007

On June 23rd Internet Archive will be presenting at the International Web Archiving Workshop (IWAW) in Vancouver.

Brad will be starting off the day of sessions with a presentation on the Wayback Machine. Here is the abstract from the paper Brad is presenting.

‘Wayback’ is an open-source, Java software package for browser-based access of archived web material, offering a variety of operation modes and opportunities for extension. In its basic, usual configuration it can both list available URL captures by date and offer recursive archive browsing starting from any capture. Advanced configurations offer better performance for challenging archived material and improved navigation.

‘Wayback’ is implemented as a collection of loosely coupled alternate implementations of core modules, for which an overview of each is provided. The functionality and implementation is also contrasted with its inspiration and predecessor, the Internet Archive’s classic public Wayback Machine software, and other ways of accessing archived web material. Finally, future directions for improvement are outlined.

After 4pm Gordon will be giving updates on IA’s tool and format developments.

Please come and and introduce yourself if you are attending the workshop!

Conferences, Conferences Conferences!

April 12, 2007

Members of the web team will be both speaking and attending several conferences in the next few months. Here is where you can find our team members out on the road.

IIPC General Membership Meeting, April 18 – 20, Paris, France
Kris will be presenting in a Pioneers of Web Archiving Panel and Gordon and Igor will lead a 1/2 day Heritrix tutorial.

DigCCurr 2007, April 18 – 20, Chapel Hill, North Carolina
Dan and Molly are attending. While at UNC, they are speaking with two UNC School of Information and Library Science classes to discuss Archive-It (the two classes have been using Archive-It for group projects).

Digitizing in a Material World, April 19, San Jose, California
Kristine will be speaking. This event aims to help those California libraries that are being asked to plan, create and provide access to material in digital collections.

The Challenge: Long-Term Preservation Strategies and Practices of European Partnerships, April 20 – 21, Frankfurt, Germany
Igor will be attending.

Best Practices Exchange 2007 , May 2 – 4, Chandler, Arizona
Kristine, Molly and Dan will be presenting in the Technology, Access and Emerging Issues Tracks, although the schedule has not been finalized. Tune in soon for more updates. Also Brewster Kahle will be a guest speaker Thursday May 3 at 8:30am.

In June team members will be attending and hopefully presenting at IWAW and JCDL (both in Vancouver, Canada). More details on these conferences to follow.

If you are attending any of these conferences, come find us and say hello!

Call for papers for the 7th International Web Archiving Workshop

February 22, 2007

Julien Masanès, program chair for the 2007 International Web Archiving Workshop, recently issued the workshop call for papers. From the call:

——————————————————
Objectives:

Main international event in this domain since 2001, IWAW will take place the 3rd week of June, in Canada this year (date and location to be confirmed).
The workshop will provide a cross domain overview on active research and practice in all domains concerned with the acquisition, maintenance and preservation of digital objects for long-term access, with a particular focus on web archiving and studies on effective usage of this type of archives.

——————————————————
Important Dates:

Paper submission: May 1st, 2007
Notification of acceptance: May 15th, 2007
Camera-ready copy due: June 11th, 2007

Details for format and submission will be posted on iwaw.net soon.

——————————————————
Topics:

Case studies:
• Web Archiving Projects,
• Digital Archeology,
• Cyberculture Studies,
• Web Metrics,
• Web Publishing Models.

Data acquisition:
• Harvesting Technology, Focused Crawling,
• Deep Web Capture,
• Site Architecture Migration,
• Authenticity Control of Captured Documents.
• Acquisition of Dynamic Objects,
• Submission Systems,
• Data Ingest,
• Automated Metadata Capture.

Storage Models and Architecture:
• Hierarchical Storage Models,
• Redundant Storage,
• Distributed Storage,
• Storage Media Migration,
• Cost Models,
• Media Life-Time Analysis.

Digital Preservation:
• Conversion/Migration Strategies,
• Emulation Approaches,
• Data Abstraction Technologies,
• Self-Aware Objects,
• Testbeds, File Format Repositories,
• Document Functionality and Behaviour.

Access:
• Access Provision,
• Navigation,
• Web Indexing
• Collection Analysis,
• Information Retrieval,
• Interface Models.

Policy and Social Issues:
• Economics of Information,
• Intellectual Property Rights.
• Challenges and Caveats of Web Archives,
• Scenarios and Visions,
• Privacy Aspects

The Internet Archive has been a frequent contributor to past IWAW events and is looking forward to this year’s event — the first time it has been held in North America.