Wayback Machine & Web Archiving Open Thread, April 2011

by

Anything you want to know or discuss about the Wayback Machine or the Internet Archive’s web archive? This is the place!

What do you want to know about the Wayback Machine and Internet Archive web archive? Do you have problems, concerns, suggestions? This is the place!

If your comment is a question, please check the classic Wayback Machine Frequently-Asked-Questions (FAQ) or new Wayback Machine FAQ site to see if your question has already been addressed before posting.

A few other things to note before posting:

Everything else? Fire away!

About these ads

8 Responses to “Wayback Machine & Web Archiving Open Thread, April 2011”

  1. adrolli Says:

    I found no entries in 2010 and 2011 for most of our (well known) websites, in 2011 there are no entries (even Microsoft.com or Apple.com).

    Is this a temporary issue or the end of the waybackmachine?

    Regards,
    Alf

  2. Vitaliy Kuzmin Says:

    How can I force Wayback Machine to archive entire site and all files on it?

  3. gokitalo Says:

    I do have a fairly urgent one. I’m not sure how long you intend to have the classic interface around, but there are certain pages and sites that only seem to exist in the classic interface. This message board site, for example:

    http://pub17.ezboard.com/bschoolforgiftedyoungsters

    Which also went by the URL:

    http://p082.ezboard.com/bschoolforgiftedyoungsters

    When I type these two URLs in the classic interface, archived versions of the site appear, as you can see below:

    http://classic-web.archive.org/web/*/http://pub17.ezboard.com/bschoolforgiftedyoungsters

    http://classic-web.archive.org/web/*/http://p082.ezboard.com/bschoolforgiftedyoungsters

    When I use these same URL with the current interface, however, no archived versions of the site appear. And while the site has changed URLs since then:

    http://schoolforgiftedyoungsters.yuku.com

    … both the classic and current versions of the interface say that no versions of the page have been archived. Frankly, I’m worried that if the classic interface is removed, all the older versions of this site will disappear. While the message board does continue to exist, a lot of old threads were deleted when EZBoard was hacked in 2005. However, a lot of these deleted threads still exist in the classic interface of the Internet Archive.

    If the classic version of the interface is removed, however… I’m worried that all these old message board threads may be lost for good. This is a roleplaying/writing board, and I don’t think anyone who posted there wants to see some of their best work deleted.

  4. kevinff Says:

    Hello,
    I’ve searched everywhere but didn’t get any decent information:
    We are using whitelisting to whitelist crawlers, eg: for googlebot we verify that the reverse address ends with google.com and that the reverse address resolves back to the IP. Then we can prevent the site from throwing captcha’s and other stuff at googlebot, bingbot, baidu, yandex and others.. While preventing fake bots from passing through our anti-gathering protection.

    However it seems that Archive.org/Alexa are using various IPs from Amazon to collect data..
    Is there a list of IP that we can whitelist? Is there any other way to be sure that some IPs are from Archive.org/Alexa? (i’m not talking only about the user agent, as we’ve found many fake Googlebots).

    Thanks for the help

  5. mariko Says:

    I see the Advanced Search is gone- will that be back? I’m interested in searching for text rather than URLs.

  6. glennp000 Says:

    I didn’t see any advanced filters by date range on your new interface. (I’m interested in entries in the last 12 months, but not just the latest) And if you discussed this in your FAQs, I wouldn’t know, because the FAQ link doesn’t go anywhere except redirect to home.

  7. siplushwguy Says:

    Hello, Internet Archive!

    Could you allow browsing archived versions of http://halflifehq.com/ ? Robots.txt blocking you was placed on this site when it got closed and its domain got parked (most domain parking services block crawling to make parked domains unsearchable), and it contained very valuable information for Half-Life community (the most valuable content is videos such as bullchicken360.avi, he360.avi, alieng.avi, alienslave.avi, burnacle.avi, tentacle.avi and xenome.avi) before closing.

    Original owners did not block you, Internet Archive (you can check it by browsing archived versions of halflifehq.com/robots.txt, it was 404 in the year 2002, archive of which I want to see.)

    And, if you can’t allow browsing the entire site, could you just send us archived versions of the following files?:
    http://www.halflifehq.com/files/downloads/avi/bullchicken360.avi !VERY IMPORTANT
    http://www.halflifehq.com/files/downloads/avi/he360.avi !VERY IMPORTANT

    http://www.halflifehq.com/files/downloads/avi/xenome.avi

    http://www.halflifehq.com/files/downloads/avi/alieng.avi

    http://www.halflifehq.com/files/downloads/avi/alienslave.avi

    http://www.halflifehq.com/files/downloads/avi/burnacle.avi or /files/downloads/avi/barnacle.avi

    http://www.halflifehq.com/files/downloads/avi/headcrab.avi

    http://www.halflifehq.com/files/downloads/avi/tentacle.avi

    And maybe the following too (if they’re not from HL: Further Data):

    [audio src="http://www.halflifehq.com/files/downloads/mp3/half-life1.mp3" /]

    [audio src="http://www.halflifehq.com/files/downloads/mp3/half-life2.mp3" /]

    Thanks in advance

  8. yahudeejay Says:

    I’m still interested and most of all interested- WHEN CHANCE TO SEE WAYBACK MACHINE RESULTS FOR http://www.djsportal.com – JUNE – SEPTEMBER 2009

Comments are closed.


Follow

Get every new post delivered to your Inbox.

Join 100 other followers

%d bloggers like this: