Updated Wayback Machine in Beta Testing

by

A new, improved version of the Wayback Machine, with an updated interface and fresher index of archived content, is now available for public testing at:

http://waybackmachine.org

Note that during the beta test period, the availability and functionality of the new service will fluctuate as issues are discovered and addressed.

The classic Wayback Machine will remain in concurrent operation for a period, for comparing functionality, but may not receive any further index updates. (It received its last major update in 2008, with only small piecemeal updates since.) So, please use the new site for accessing material from recent years. For a mixture of technical and policy reasons, most material will still appear 6 months or more after collection.

For more information, see the new Wayback Frequently-Asked-Questions (FAQ) site.

Thank you for your patience while this long-awaited update was under development!

About these ads

10 Responses to “Updated Wayback Machine in Beta Testing”

  1. inkdroid › xhtml, wayback Says:

    [...] Internet Archive gave the Wayback Machine a facelift back in January. It actually looks really nice, but I noticed something kinda odd. I was looking [...]

  2. edsu Says:

    I ran into some problems with archives XHTML which I documented here. I’d be interested to hear what you think.

  3. suchmaschinenoptimierungpassau Says:

    I have the same issue not even one result of my website in the new beta version, at least not from here :-(
    If anybody finds anything, would be kind to mail me via the mail form on my site.
    Is it because we recently chaged the layout? That has nothing to do I think, right?
    The domain is: www. bavaria-internetdienste. de

  4. Mark Says:

    Hi. I love the wayback machine and find myself using it so much. It is amazing as a research tool.

    Of course, once text search is here, it will be IN-CREDIBLE as a research tool.

    Are there any news/updates, or even an ETA, for text search?

    Many thanks.
    Mark

  5. jojowbw Says:

    Hi,

    I am a novice user of the Way Back Machine – but am currently using it to support some work I’m doing. I have a few questions.. (if possible please can you explain in layman terms – I’m a bit of a luddite..!)

    1. I know you have a FAQ on the difference between the classic wayback machine and the beta version – but I’d like to understand why they provide different results for the same year. For example, the site I am looking at, the old version of the machine provides 23 “pages” for the eyar 2007. The new version provides “18 crawls” for the year 2007. Do pages and crawls mean the same thing (ie – the date at which a copy of the site was captured?) And are the results slightly different because two different crawlers have been used?

    2. I get that the reason why the two versions will give different results after 2008 may be because the old site hasn’t been updated since 2008. What I don’t get is what it means to update the site. Does it only mean that later dates, ie. dates after the previous last date available, are added? Or can more dates be added within a year for which there were already dates captured. For example, is it possible that when the wayback machine is updated, a new capture could be added for, say, June 2002?

    3. Is there any particular reason why the classic wayback machine has 37 captures for the site for 2008, and nothing after 22 August 2008, but the Beta version only has 2 results for 2009 (jan and feb)?. Is this likely to be because it’s not been updated, or because the site hasn’t been crawled since early 2009?

    4. When I access the Beta version of the site, I can only do it by running it through the old version, then accessing the test version through the results page (where it says try the new updated version etc etc). When I put the URL in the search box on the home page of the Beta version, I get nothing. Am I potentially missing anything by only being able to access it this way? ie. If I were able to search through the Beta site itself would I get any different results?

    Many thanks for any help!

  6. Nathan Ridley Says:

    Please, please, please offer a simple API! It’ll reduce the load significantly on your service and allow third party services to obtain data quickly and efficiently without causing your service any problems. Make sure you include granular calls such as “what’s the earliest date domain X was indexed”, etc.

  7. manetpre Says:

    I have a strange problem when accessing the archived pages of the company Adtranz, which was bought up by another company, Bombardier Transportation, back in 2001. For example, when clicking this Adtranz page from 9 June 2000:

    http://web.archive.org/web/20000609151500/http://www.adtranz.com/adtranz/e/group/14history.htm

    …the page loads, but then in a few seconds, it redirects to a fairly recent archived version of a Bombardier page, although there is no Bombardier in the new URL:

    http://web.archive.org/web/20110124201632/http://www.adtranz.com/adtranz/index_e.htm?/web/20000609151500/http%253A//www.adtranz.com/adtranz/e/group/14history.htm

    This problem persists with the beta version, too.

    • gojomo Says:

      Typically this is due to archived Javascript or web-redirect responses sending your web browser to new addresses – perhaps several in a row. When these stay within the Wayback Machine, you can still drift in time quite a bit, given that for each request we show the nearest date, and it’s possible for the nearest date to be years away. At times, this effect will also send you back to the real live web, outside the Wayback Machine, because our page-rewriting for archival display is not able to perfectly remove all such active content.

      To see each of the steps is easiest using a web protocol/developer tool such as ‘Firebug’ (in Firefox) or the developer consoles available in other browsers.

      In this case, my guess is that a bit of original Javascript was designed to prevent older pages from being redisplayed/framed elsewhere. When it runs, it tries to bounce the browser to the new homepage, with an automatic extension based on the starting address. This results in a unique new URL which the Wayback does not have – in which case we fall back to making a fresh fetch this very moment (for the future archive) and showing that.

      You might have some success viewing individual older pages without being bounced elsewhere by turning off Javascript in your browser, but note that this also disables much Wayback navigation functionality.

  8. yahudeejay Says:

    I tryed as you advised but do not work. Can you check time period 1 june – 1 september 2009, please and send me pictures of my website ?
    I can pay for such service ofcourse.
    _______________________________________________________

    How to get results of your WAYBACK machine for my website [URL] ? I’m looking particulary for june 30 – september 30 2009 – BUT NOT VISIBLE.

    Last I can see are:
    Aug 22, 2008

    Yahu Pawul
    editor for [URL]

    • gojomo Says:

      If no captures of the URLs of interest in mid-2009 are shown in the new interface, then it is most likely we simply did not collect them. There are a small number of URLs from that period:

      http://waybackmachine.org/200906-200909*/http://www.djsportal.com/*

      Unfortunately I can’t in general tell why a certain site may not have been collected, or collected as thoroughly, in a prior period. There did not appear to be a robots.txt block in place, but other issues (either on the site or with the crawling operations prioritization and resources) may have prevented better collection. While we do still have some past material to catch up on indexing, I don’t think it comes from that era. So while it’s not impossible a few more results could appear, it’s not likely. I’m sorry we don’t have better coverage of your site.

Comments are closed.


Follow

Get every new post delivered to your Inbox.

Join 102 other followers

%d bloggers like this: