[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Persistent URLs Considered Useful



At 9:47 AM -0800 1/29/98, Jim Gillogly wrote:

>BTW, I went to look this up in the Cyphernomicon (I sorta think it's
>reffed in there), but the first 4 sites I saw on Altavista were all
>dead-end broken links.  The Web's ripping... what's the current
>Preferred URL?

I have no idea. Sometimes I find a site has it, sometimes not. (I'm not
interested in keeping my tens of megs of postings on a Web page. Sosumi.)

The issue of "URL decay" is a very serious one, affecting directly the use
of the Web for footnote citations, legal citations, references, etc.
Scientific or academic articles cannot reliably cite URLs, as they are
likely to decay or vanish or become corrupted over a matter of months, let
alone the years or decades that a technical or academic paper is expected
to last and be read. Call it an archivist's nightmare.

A couple of years ago I floated an idea out to a handful of Bay Area
friends and Cypherpunks, about an idea for an "Eternity.com" service which
would act as a kind of "vanity press" for authors, professors, researchers,
etc., who wanted to know that a document and URL would have a long
persistence, possibly an indefinite persistence.

(At the time I used the "Eternity" name I was not consciously aware of Ross
Anderson's work on his "Eternity" system, though I may have inadvertently
been inspired by his name, which I may have heard on the list or elsewhere.
My notion was a bit different from either his Eternity system or any of the
recent variants, but the name was based on the same idea of perpetual
storage. However, unlike my BlackNet idea, this particular "Eternity" was
not focussed on contraband, illegal, controversial, or black market
information being distributed and preserved. In fact, the expected
customers were mundane academics and corporate users...or vanity
users...anyone, basically, who wanted to know that a paper or document of
theirs could be reliably cited by others and that the citations would not
exhibit the "URL decay" so commonly seen today.)

Like I said, I sent this to a handful of Bay Area friends and Cypherpunks,
mainly to see if they had any comments or interest. I chose at that time
not to send my idea out to the Cypherpunks list, as I had some thoughts
about maybe trying to commercialize the idea....

However, I haven't, so I may as well send this idea along.

Here is the piece I sent out in late '95:

Return-Path: [email protected]
Received: from [205.199.118.202] (tcmay.got.net [205.199.118.202]) by
you.got.net (8.6.9/8.6.9) with SMTP id OAA05317 for <[email protected]>; Fri,
29 Dec 1995 14:30:25 -0800
Date: Fri, 29 Dec 1995 14:30:25 -0800
X-Sender: [email protected]
Message-Id: <[email protected][205.199.118.202]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
To: [email protected]
From: [email protected] (Timothy C. May)
Subject: Eternity.com announces "Eternal URLs"--version 1.02
X-UIDL: 820276233.000


***Press Release***

Santa Cruz, CA. Eternity, Inc. is pleased to announce the availability of
its permanent Uniform Resource Locater (URL) facility. For a fee, Eternity
will store a file for a year, for several years, or for "eternity."
Eternity will ensure that backups are escrowed in the event of
interruptions at Eternity or even the demise of Eternity as an entity.
Visit us at

http://www.eternity.com/~products/

***End***

Well, not really. But an example makes a point succinctly.

There's a growing need for "eternal URLs," URLs that have enough
persistence to last for a very long time, possibly for several decades.
(And with declining storage costs, "several years" implies "eternity" only
costs 10% more.)

Here are some reasons:

* Today's URLs are transient, fluid. Accounts go away and the URL no longer
works, e.g., the large number of "Can't access" errors when Web surfing.

(Many of these errors are because a server or account has been deliberately
been taken down, for whatever reasons. Often because of overloading, which
is a separable issue from the issue of permanence--overloading can be
handled with access fees, for example.)

* Web citations. Michael Froomkin, and others, have been raising the issue
of using the Web for legal citations. A real problem if the citations don't
persist for very long. (And I started seriously thinking about this Web
transience issue when I wrote a reply to him pointing out the utter
transience of most URLs, the "decay time" of valid URLs and how tough it
would be to rely on the Web for citations which are intended to be valid
for years or even decades.)

* Many links have a "single-point failure" built in. Let me give an example:

- my Cyphernomicon exists in an HTML version (nicely done by J. Rochkind)
at the URL "http://www.oberlin.edu/~brchkind/cyphernomicon/".

- lots of other Web pages have links to this URL. In fact, nearly every
page that mentions the Cyphernomicon points to this URL!

- so, what happens when Jonathan's account--at Oberlin I presume--goes
away? What happens when he gets tired of maintaining it, or graduates, or
whatever?

- answer: all the pages that point to it now come up only with the typical
errors.

- it's unlikely that all or even most of the pages with pointers to this
now-gone URL will update them. I call this "Web decay."

Thus, we see the increasing breakage of links. (I don't know about you
folks, but in my Web surfing I find more and more links broken every
day....)

A market fix might be the following:

- a site operator offers "archival" storage, perhaps/probably for a fee

- files could be placed in this archival storage for some fee for a given
amount of time. (Given the declining cost of storage, a user might be able
to economically buy "permanent" storage, sort of a "discounted future
value" approach. Thus, I might pay $20 to store the Cyphernomicon for a
year, $30 for 3 years, and $50 for "eternity."

- he may also charge digital cash for access (a separable issue, but worth
mentioning)

- example URL: "http://www.eternity.com/~cypherpunks/cyphernomicon/"

- the site, "eternity.com" in this example, would ensure that the document
remains mounted and available for years, decades, etc. This could be done
by using backup machines, copies of optical disks at services which agree
(for a fee) to keep backups and mount them in the event of
disruption/bankruptcy/etc. of "eternity.com," and so on. (Obviously a fee
structure could include issues of file size, latency of access, policies
for public hits on the site, etc.)

- services like this (and I expect more to appear) may have cross-backup
provisions, or arrangements to take over the good name and good will, and
of course the files, of services which vanish or go bankrupt. (All sorts of
messy details, but familiar to lawyers handling escrow matters.)

- there are obvious similarities with "archival storage" services (the
"data vaults" and salt mine companies). In this case, the archival storage
also has access via the Web attached. (And yes, an obvious wrinkle is to
sell archival storage per se, with access a separate issue: access could be
via passwords/crypto, or via paid access, URLs, etc. All separable. But the
"market focus" on "eternal URLs" is a powerful focus, likely to quickly
generate a fair amount of business.)

(Strategies for site-mirroring of URLs, where the same URL actually
involves multiple sites, is another approach. I'm not following that area
too closely, so I won't discuss it here. It may be a workable alternative.)

Another alternative is for services to arise which act as redirectors of
Web accesses. Without actually storing the Cyphernomicon file, for example,
they tell users where it actually is. (Not that much different from
Infoseek, DejaNews, Alta Vista, etc., except that those services merely
index existing pages, which may have huge numbers of corrupt links, while
the service I am proposing would take active steps to ensure valid copies
can be accessed...probably too much work to be economical, which is why I
prefer the "Eternity.com" model: the owner of a URL, or an interested party
who wishes to pay to store a copy, takes active steps to ensure a permanent
copy exists.

There are all sorts of wrinkles of this idea, such as:

-- newer versions of a file, e.g., the "Cyphernomicon v.1.5," are either
stored separately, with pointers added to the first file, or are appended.
I favor the "pure archive" approach of always having the earlier versions
stored. ("Once stored, it is never forgotten.") For example:

"http://www.eternity.com/~cypherpunks/cyphernomicon/0.666" (the original)
"http://www.eternity.com/~cypherpunks/cyphernomicon/1.5"
"http://www.eternity.com/~cypherpunks/cyphernomicon/2.0"
...etc....

-- obviously files could be stored in various ways, depending on fees per
storage and fees per access. Older versions might be archived on DAT (or
its 2005 equivalent), more recent versions might be on DVDs, CD-ROMs, etc.,
and heavily-accessed files might be on magnetic disk. All a matter of
pricing, usage, market issues.

-- jukeboxes of CD-ROMs, DATs, and DVDs should make storage of "archived
Web sites" very cheap.

-- such a service could also be a protection against political pressure:
once a file is stored, perhaps in multiple national jurisdicitions, or
perhaps even in unknown jurisdictions via Web mixes, the file could not be
removed.

-- digital timestamps and other hashes of the database could be published,
a la Haber and Stornetta's Surety service, to ensure that that the database
had not been tampered with.

-- secure data havens, such as Swiss banks (maybe not so secure, but you
get the point) could store copies of the files, perhaps on very slow media.
Enough to ensure eternal storage. Underground vaults, salt mines, the usual
shtick.

I'll close for now.

Could "Eternity.com" be the new Westlaw? (For you nonlawyers, Westlaw
publishes books of court cases and rulings, and is the de facto place for
references...they make a tidy profit by licensing their system. Cyberspace
legal thinkers are looking at Web alternatives.)

Food for thought (and grounds for further research, as Dave Emory would say).


--Tim May

We got computers, we're tapping phone lines, we know that that ain't allowed.
---------:---------:---------:---------:---------:---------:---------:----
Timothy C. May              | Crypto Anarchy: encryption, digital money,
[email protected]  408-728-0152 | anonymous networks, digital pseudonyms, zero
W.A.S.T.E.: Corralitos, CA  | knowledge, reputations, information markets,
Higher Power: 2^756839 - 1  | black markets, collapse of governments.
"National borders aren't even speed bumps on the information superhighway."