[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tools for Rendering Censorship Firewalls Ineffective



I've been trying to categorize the web censorship techniques available
to governments, such as Singapore and China's Firewall Curtains,
Germany's ISP Threats, etc.  The objective is to make information widely
and conveniently available to subjects of the censoring country
by building tools that will provide multiple paths to data that
are easy to find and hard to block because they're too pervasive.

I'm assuming that email is difficult to block, in-country public websites 
are blockable (through business licenses, lawsuits, rubber-hoses, or 
confiscation), in-country private websites are difficult to block but not 
very relevant, authors can afford and post to foreign web sites without 
effective blocking, and that the real "threat" to the government is foreign 
websites making banned information conveniently available to its subjects. 
I'm also assuming that the government can use humans to discover a moderate
quantity of banned info, but that blocking will mostly be done by bots rather
than human readers.  In particular, I'm ignoring the approach of
in-country hidden websites, since it's too vulnerable in some countries,
and focussing on the web rather than email because it's mass-market and
easier to use, and email is harder to stop, especially given crypto and
remailers.

Notation: "Attack" is the Censors trying to stop data; "defend" and "evade"
are the Good Guys trying to not get censored.

The obvious techniques I can see include
1) Filter on IP address (e.g. German attack on XS4ALL)
2) Filter on DNS Name
3) Filter on Patterns in URL
4) Filter on Patterns in PUT/GET Requests
5) Filter on Patterns in Response.
6) Traffic Analysis on reading patterns

Defenses -
The most important defenses to these attacks depend on ubiquity and volume -
they can block one or two of anything, they might be able to block a thousand,
and it's not possible to block millions of things everywhere.
So building bridges between systems and increasing multiplicity is a win.

1) It's easy to evade the crude version of this attack - use rolling IP
addresses, and use DNS to publish the new ones.  For the German model,
where the government has to tell the ISPs who to block, this wins.
They can counter by blocking your whole IP network, not just a single 
machine, which you can counter by hopping IP networks as well as hosts,
though that's more trouble (and blocking routes is probably easier
than blocking hosts, since you do it at the routers, and harder to
get people to turn back on.)  They can also enforce boycotts on the ISP,
as they did with XS4ALL - blocking most of the traffic to a site
can affect its other traffic enough to be economically annoying.

The attackers can counter by also tracking the address with a bot -
if you change addresses hourly, they can change blocking addresses hourly,
which may workable for somewhere high-tech like Singapore or very focussed
like China, but isn't very effective for somewhere porous like Germany
that has a system of laws that move at the speed of bureaucracy.

A very effective defense against this method is to deploy relay servers,
either anonymizers or simple non-anonymizing cgi scripts that take URLs like
     http://foo.bar.com/cgi-bin/relay.pl/http://banned.site.org/
and fetch and return the real URL (perhaps modifying any URLs in it
to connect through the relay.)  This works if there are lots of easy-to-find
relay servers.  An obvious approach would be to package the relay program with
Apache or other popular web server, so anybody who didn't bother turning it off
would have a relay named "relay.pl"; the attackers can't realistically block
everybody who's got one.

Another effective defense is to use web servers that gateway to AFS 
(Andrew File System) or other distributed file systems.  This lets 
        /stanford.edu/censored-mirrors/banned.html
be accessible from any site supporting AFS, such as
        http://www.cmu.edu/afs/stanford.edu/censored-mirrors/banned.html
This has the great advantage that AFS sites are usually at
major universities, which are important enough that lots of people
would complain if you blocked them.
This works better if there's an easy way to insert things into
the AFS tree - volunteers are fine, but if there are servers that
can automatically import material it becomes easier, either by
copying or by various kinds of indirection.  (On the other hand,
if you allow automated import, attackers can turn your site
into the Child Pornography NarcoTerrorist Bomb Info Mart and ban you...)

Do the major web crawlers index AFS?  Or do the sites use robots.txt
to prevent multiple crawls, e.g. by only allowing searching on the
local file systems and not on the remote ones?

2) Filtering on DNS names is an attack that proxy servers can use -
the HTTP spec [RFC1945] says that requests to proxies need to send
an Absolute URI (method://machine[:port][abs_path]) rather than
just the absolute path (/etc.), so the proxy servers can filter on DNS names,
defeating the rolling-IP-address defense.  Doesn't stop relays.

Non-proxy attacks don't have access to this method.  However, governments 
that use full-scale firewalls and not just http proxies can also restrict 
what DNS queries the National DNS Servers will pass.

How to defend against it for non-proxy attackers?  One way is to make it
easy to find the IP address of a server with banned material -
web indexers can find everything, so putting the IP form of the URL in 
a file that the popular indexers, along with useful keywords, 
is one way to make sure you get found.  Another would be to deploy
DNS servers widely (done:-) and maybe form-based interfaces
for users that will run Dig or whatever.  

3) Filtering on patterns in URLs - as with DNS filters, a proxy server
run by an attacker can block access based on patterns in the URL,
such as relay.pl.  This makes it tougher to use relays as a defense,
because they either need to have different names on different machines
(harder for users to find and for administrators to implement without
having to pay attention), or else to not need a name (either modify
the protocols or at least the servers so that
        http://foo.bar.com/http://banned.site.org/stuff.html
gets handled properly.)  The latter is doable, but probably
requires more administrator support?  

Pattern matching on URLs can make it easier to attack AFS -
you don't need to kill the whole AFS tree, just the banned parts.

Pattern matching on URLs already starts to have heavy volume issues -
can a proxy server take the extra time to search an ever-larger
banned list on every web hit?  Having used overloaded proxy servers
at work (:-), I'd expect the population to start acting like
disgruntled postal workers if the mandatory national firewall
is underpowered.  On the other hand, I suppose a government could
partially solve scale problems by requiring a license fee for
use of the proxy server; a few bucks per user could pay for
increasing numbers of servers as well as tracking who's reading what.

4) Filter on Patterns in PUT/GET Requests
As a defense against blocking URLs by patterns, defenders can
send requests as message-body in PUT/GET requests; this is also
useful for submitting banned material to cooperative sites.
Attackers can filter on this material, though they can't easily filter out
SSL or S-HTTP requests by content, depending on how much the protocols
pass material end-to-end rather than link-by-link where the proxy
can see it.

5) Filter on Patterns in Responses.
Similarly, attackers could just grep for banned material in HTTP responses,
though SSL/S-HTTP both interfere with this.  Defenders can also structure
banned writing in ways that don't trigger patterns easily (e.g. don't
refer to Lee Kwan Yew, just refer to That Bum or Mr. Big, etc.)
Filtering on picture content is obviously difficult, so including
text in graphics can help prevent attacks.  In general, I'd expect
filtering attacks to be extremely susceptible to volume.

On the other hand, attackers don't have to filter in real-time;
they could scan material as a background activity, and go arrest
the people who have received contraband after the fact,
or just block their National Firewall Passports if they're reading
too much.  In general, this attack is probably more useful for
overall study of what their subjects are reading than for real-time.

6) Traffic Analysis - who's reading what?  Where?  Who's reading a
lot of contraband?  What's popular foreign material?  Which of
the attackers' subjects are possible fellow travellers, based on
what they're reading?  This kind of material is useful for marketing
as well as for identifying malcontents - businesses aren't always
pro-privacy either, though they want their own secrets kept secret.
Many of the attacks above can work as after-the-fact analysis
more effectively than they can as real-time blocking, and volume
is less of a problem for crunching a sample of firewall logs
than for active blocking; this may be the hardest attack to counter,
though it's repression rather than censorship.  If you can't police
everybody, you can at least encourage the policeman in everybody's head.
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

What other kinds of attacks are there?  What other defenses?
What kinds of holes are there in these defenses?




#			Thanks;  Bill
# Bill Stewart, +1-415-442-2215 [email protected]
# <A HREF="http://idiom.com/~wcs"> 	
# You can get PGP software outside the US at ftp.ox.ac.uk/pub/crypto