[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
WWW "remailers" (corrected copy)
This is a re-post of an earlier message where I accidently wrote "nntp"
in place of "http". I have added some more material, too. Please
ignore the earlier message, and thanks to those who pointed out the
mistake.
We have had some discussions here about privacy of accesses on the World
Wide Web. Presently servers get a variable amount of information about
the people accessing their sites, depending on the particular software
being used and how it is configured. This is potentially harmful to the
privacy of WWW users in that their access information can be recorded,
etc.
Far from being a hypothetical concern, I believe many companies are
collecting this information and using it to build up possible future
email mailing lists, etc. I spoke recently with someone who is designing
enhanced server software for the web. Their system will keep all kinds
of statistics about who accesses which pages on the server, correlating
that with which people request information on the products being sold.
We have also seen how even too-cool Wired magazine is demanding user
names to allow access to their pages. (Remember: username cypherpunk,
password cypherpunk.)
Here are some things you can do to reduce this problem. First, to see
how bad the problem is for you, try connecting to:
http://www.uiuc.edu/cgi-bin/printenv
This just displays environment variables, which shows what information
about you is being received by servers. Look particularly at the lines
reading HTTP_FROM and REMOTE_HOST. These may contain your user name and
computer address.
You may be able to remove your user name information. Some clients,
including, I am told, NetScape and version 2 of Mosaic for Mac/Windows,
allow you to set your email address, which is handy, but then they send
it along to servers, which is harmful to your privacy. You might want to
consider not setting this field and using other programs for sending
mail. Also if people complain about this then perhaps the makers of this
software will add an option to suppress sending the info.
Even if you don't see your name in HTTP_FROM it still may be possible
for somewhat more sophisticated programs to log your access if the
REMOTE_HOST information is correct and you are running on a Unix system
or something similar. This is done via the identd service if that is
running on your computer. The server can use this service to ask for
your user name once you are connected. One way to see if identd is
running on your computer is to telnet to your own computer on port 113
and see if anything is there (telnet <your-computer-name> 113). If so
then this is potentially another privacy exposure.
I have recently been experimenting with using "proxy servers" to remove
even the REMOTE_HOST information from the server's view. Proxy servers
are servers which basically receive WWW connections and pass them along.
Then when the data comes from the remote site they pass it back to the
originating user's site. Because the proxy server is in the middle the
remote site never sees the host name of the originating user. In this
respect they are somewhat similar to our cypherpunk remailers, hence the
title of this article.
(The purpose of proxy servers has nothing to do with this function; they
are designed to allow easy WWW access from users who are on firewalled
sites. But they happen to serve our purposes as well.)
Interestingly, the standard httpd (http daemon, the master server which
runs on a site which offers web pages) from CERN includes proxying
capability automatically! All you have to do is to add four lines to
the configuration file. (See the URLs below for more info.) If this
idea proves sound, perhaps some cypherpunks running httpd will enable
proxies and serve as "remailer operators of the web".
Normally proxy servers are configured to pass connections only from the
machines they are there to serve (at least, they can be configured that
way; I don't actually know how careful people are about this). But
luckily I have found that the CERN proxy server itself accepts
connections from anybody (at least, it accepts them from me!). So this
is useful for doing experiments.
And, the great part is, almost all web clients are set up now for proxy
support. The way you enable it varies from client to client. I believe
most of the Mac and Windows clients have a preferences box which allows
you to put in the address of your proxy server. On Unix, you can set
environment variables. Here is the suggestion from the web page at
CERN:
#!/bin/sh
http_proxy="http://www.cern.ch:911/"; export http_proxy
ftp_proxy="http://www.cern.ch:911/"; export ftp_proxy
gopher_proxy="http://www.cern.ch:911/"; export gopher_proxy
wais_proxy="http://www.cern.ch:911/"; export wais_proxy
exec Mosaic
This is a little shell script which runs Mosaic, first setting four
environment variables to "http://www.cern.ch:911/", which is the proxy
server I was referring to, the one which accepts connections from the
rest of the world.
For the purpose of the experiment, only http_proxy needs to be set.
Try setting that one and then run lynx or mosaic on your unix
workstation, and connect to the printenv URL above. Compare the
information that is shown from what you got earlier without the
environment variable. Similarly, on other machines, try the printenv
test with and without proxy serving enabled using the CERN proxy.
I find that the proxy server does in fact prevent the remote site from
seeing my computer's address, and without that the IDENTD can't be used
to reveal my name.
This technique has many ramifications. For example, if a US proxy server
were available, ftp could be done via Mosaic to sites which only allowed
connections from American computers. People have been talking about
writing special IP redirectors for this, but here it turns out the
capability has been around all along.
Can anyone supply addresses of additional proxy servers to try? I had an
idea about how to find them. Many web servers log accesses. By
searching those access logs it might be possible to find proxy sites.
The server is given information about whether a proxy is used, as well.
This shows up in the HTTP_USER_AGENT environment variable on the
printenv page. Servers could look for references to proxies in that data
and collect proxy addresses in that way. There is a nice irony in using
server logging to collect data that would allow users to defeat much
server logging.
I got my information about proxies by reading:
http://info.cern.ch/hypertext/WWW/Proxies/. Specific information on
configuring CERN httpd as a proxy server is in:
http://info.cern.ch/hypertext/WWW/Daemon/User/Proxies/Proxies.html.
Modifications to the proxy server code would be necessary to provide some
additional features, such as support of encryption between user and proxy
server (via the SHTTP protocol extensions, perhaps; this way you could
get local privacy even when connecting to servers which did not support
encryption), or possibly chaining of proxies. I think this is a fertile
area for discussion and further work.
Hal