tahoe-lafs/docs/anonymity-configuration.rst

456 lines
18 KiB
ReStructuredText

.. -*- coding: utf-8-with-signature; fill-column: 77 -*-
======================================================
Using Tahoe-LAFS with an anonymizing network: Tor, I2P
======================================================
#. `Overview`_
#. `Use cases`_
#. `Unresolved tickets`_
#. `Software Dependencies`_
#. `Tor`_
#. `I2P`_
#. `Connection configuration`_
#. `Anonymity configuration`_
#. `Client anonymity`_
#. `Server anonymity, manual configuration`_
#. `Server anonymity, automatic configuration`_
#. `Performance and security issues`_
Overview
========
Tor is an anonymizing network used to help hide the identity of internet
clients and servers. Please see the Tor Project's website for more information:
https://www.torproject.org/
I2P is a decentralized anonymizing network that focuses on end-to-end anonymity
between clients and servers. Please see the I2P website for more information:
https://geti2p.net/
Use cases
=========
There are three potential use-cases for Tahoe-LAFS on the client side:
1. User wishes to always use an anonymizing network (Tor, I2P) to protect
their anonymity when connecting to Tahoe-LAFS storage grids (whether or
not the storage servers are anonymous).
2. User does not care to protect their anonymity but they wish to connect to
Tahoe-LAFS storage servers which are accessible only via Tor Hidden Services or I2P.
* Tor is only used if a server connection hint uses ``tor:``. These hints
generally have a ``.onion`` address.
* I2P is only used if a server connection hint uses ``i2p:``. These hints
generally have a ``.i2p`` address.
3. User does not care to protect their anonymity or to connect to anonymous
storage servers. This document is not useful to you... so stop reading.
For Tahoe-LAFS storage servers there are three use-cases:
1. The operator wishes to protect their anonymity by making their Tahoe
server accessible only over I2P, via Tor Hidden Services, or both.
2. The operator does not *require* anonymity for the storage server, but they
want it to be available over both publicly routed TCP/IP and through an
anonymizing network (I2P, Tor Hidden Services). One possible reason to do
this is because being reachable through an anonymizing network is a
convenient way to bypass NAT or firewall that prevents publicly routed
TCP/IP connections to your server (for clients capable of connecting to
such servers). Another is that making your storage server reachable
through an anonymizing network can provide better protection for your
clients who themselves use that anonymizing network to protect their
anonymity.
3. Storage server operator does not care to protect their own anonymity nor
to help the clients protect theirs. Stop reading this document and run
your Tahoe-LAFS storage server using publicly routed TCP/IP.
See this Tor Project page for more information about Tor Hidden Services:
https://www.torproject.org/docs/hidden-services.html.en
See this I2P Project page for more information about I2P:
https://geti2p.net/en/about/intro
Unresolved tickets
==================
Tahoe's anonymity support does not yet include automatic configuration of
servers. This issue is tracked by Tahoe tickets `#2490`_ and `#2773`_: until
those are resolved, anonymous servers (running as Tor Onion services or I2P
servers) must be configured manually, as described below.
See also Tahoe-LAFS Tor related tickets `#1010`_ and `#517`_.
.. _`#2490`: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2490
.. _`#2773`: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2773
.. _`#1010`: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1010
.. _`#517`: https://tahoe-lafs.org/trac/tahoe-lafs/ticket/517
Software Dependencies
=====================
Tor
---
Clients who wish to connect to Tor-based servers must install the following.
* Tor (tor) must be installed. See here:
https://www.torproject.org/docs/installguide.html.en . On Debian/Ubuntu,
use ``apt-get install tor``. You can also install and run the Tor Browser
Bundle.
* Tahoe-LAFS must be installed with the ``[tor]`` "extra" enabled. This will
install ``txtorcon`` ::
pip install tahoe-lafs[tor]
Manually-configured Tor-based servers must install Tor, but do not need
``txtorcon`` or the ``[tor]`` extra. Automatic configuration, when
implemented, will need these, just like clients.
I2P
---
Clients who wish to connect to I2P-based servers must install the following.
As with Tor, manually-configured I2P-based servers need the I2P daemon, but
no special Tahoe-side supporting libraries.
* I2P must be installed. See here:
https://geti2p.net/en/download
* The SAM API must be enabled.
* Start I2P.
* Visit http://127.0.0.1:7657/configclients in your browser.
* Under "Client Configuration", check the "Run at Startup?" box for "SAM
application bridge".
* Click "Save Client Configuration".
* Click the "Start" control for "SAM application bridge", or restart I2P.
* Tahoe-LAFS must be installed with the ``[i2p]`` extra enabled, to get
``txi2p`` ::
pip install tahoe-lafs[i2p]
Both Tor and I2P
----------------
Clients who wish to connect to both Tor- and I2P-based servers must install
all of the above. In particular, Tahoe-LAFS must be installed with both
extras enabled::
pip install tahoe-lafs[tor,i2p]
Connection configuration
========================
See :ref:`configuration` "Client Configuration" for a description of the
``[tor]`` and ``[i2p]`` sections of ``tahoe.cfg``. These control how the
Tahoe client will connect to a Tor/I2P daemon, and thus make connections to
Tor/I2P -based servers.
The ``[tor]`` and ``[i2p]`` sections only need to be modified to use unusual
configurations, or to enable automatic server setup.
The default configuration will attempt to contact a local Tor/I2P daemon
listening on the usual ports (9050/9150 for Tor, 7656 for I2P). As long as
there is a daemon running on the local host, and the necessary support
libraries were installed, clients will be able to use Tor-based servers
without any special configuration.
However note that this default configuration does not improve the client's
anonymity: normal TCP connections will still be made to any server that
offers a regular address (it fulfills the second client use case above, not
the third). To protect their anonymity, users must configure the
``[connections]`` section as follows::
[connections]
tcp = tor
With this in place, the client will use Tor (instead of an
IP-address -revealing direct connection) to reach TCP-based servers.
Anonymity configuration
=======================
Tahoe-LAFS provides a configuration "safety flag" for explicitly stating
whether or not IP-address privacy is required for a node::
[node]
reveal-IP-address = (boolean, optional)
When ``reveal-IP-address = False``, Tahoe-LAFS will refuse to start if any of
the configuration options in ``tahoe.cfg`` would reveal the node's network
location:
* ``[connections] tcp = tor`` is required: otherwise the client would make
direct connections to the Introducer, or any TCP-based servers it learns
from the Introducer, revealing its IP address to those servers and a
network eavesdropper. With this in place, Tahoe-LAFS will only make
outgoing connections through a supported anonymizing network.
* ``tub.location`` must either be disabled, or contain safe values. This
value is advertised to other nodes via the Introducer: it is how a server
advertises it's location so clients can connect to it. In private mode, it
is an error to include a ``tcp:`` hint in ``tub.location``. Private mode
rejects the default value of ``tub.location`` (when the key is missing
entirely), which is ``AUTO``, which uses ``ifconfig`` to guess the node's
external IP address, which would reveal it to the server and other clients.
This option is **critical** to preserving the client's anonymity (client
use-case 3 from `Use cases`_, above). It is also necessary to preserve a
server's anonymity (server use-case 3).
This flag can be set (to False) by providing the ``--hide-ip`` argument to
the ``create-node``, ``create-client``, or ``create-introducer`` commands.
Note that the default value of ``reveal-IP-address`` is True, because
unfortunately hiding the node's IP address requires additional software to be
installed (as described above), and reduces performance.
Client anonymity
----------------
To configure a client node for anonymity, ``tahoe.cfg`` **must** contain the
following configuration flags::
[node]
reveal-IP-address = False
tub.port = disabled
tub.location = disabled
Once the Tahoe-LAFS node has been restarted, it can be used anonymously (client
use-case 3).
Server anonymity, manual configuration
--------------------------------------
To configure a server node to listen on an anonymizing network, we must first
configure Tor to run an "Onion Service", and route inbound connections to the
local Tahoe port. Then we configure Tahoe to advertise the ``.onion`` address
to clients. We also configure Tahoe to not make direct TCP connections.
* Decide on a local listening port number, named PORT. This can be any unused
port from about 1024 up to 65535 (depending upon the host's kernel/network
config). We will tell Tahoe to listen on this port, and we'll tell Tor to
route inbound connections to it.
* Decide on an external port number, named VIRTPORT. This will be used in the
advertised location, and revealed to clients. It can be any number from 1
to 65535. It can be the same as PORT, if you like.
* Decide on a "hidden service directory", usually in ``/var/lib/tor/NAME``.
We'll be asking Tor to save the onion-service state here, and Tor will
write the ``.onion`` address here after it is generated.
Then, do the following:
* Create the Tahoe server node (with ``tahoe create-node``), but do **not**
launch it yet.
* Edit the Tor config file (typically in ``/etc/tor/torrc``). We need to add
a section to define the hidden service. If our PORT is 2000, VIRTPORT is
3000, and we're using ``/var/lib/tor/tahoe`` as the hidden service
directory, the section should look like::
HiddenServiceDir /var/lib/tor/tahoe
HiddenServicePort 3000 127.0.0.1:2000
* Restart Tor, with ``systemctl restart tor``. Wait a few seconds.
* Read the ``hostname`` file in the hidden service directory (e.g.
``/var/lib/tor/tahoe/hostname``). This will be a ``.onion`` address, like
``u33m4y7klhz3b.onion``. Call this ONION.
* Edit ``tahoe.cfg`` to set ``tub.port`` to use
``tcp:PORT:interface=127.0.0.1``, and ``tub.location`` to use
``tor:ONION.onion:VIRTPORT``. Using the examples above, this would be::
[node]
reveal-IP-address = false
tub.port = tcp:2000:interface=127.0.0.1
tub.location = tor:u33m4y7klhz3b.onion:3000
[connections]
tcp = tor
* Launch the Tahoe server with ``tahoe start $NODEDIR``
The ``tub.port`` section will cause the Tahoe server to listen on PORT, but
bind the listening socket to the loopback interface, which is not reachable
from the outside world (but *is* reachable by the local Tor daemon). Then the
``tcp = tor`` section causes Tahoe to use Tor when connecting to the
Introducer, hiding it's IP address. The node will then announce itself to all
clients using ``tub.location``, so clients will know that they must use Tor
to reach this server (and not revealing it's IP address through the
announcement). When clients connect to the onion address, their packets will
flow through the anonymizing network and eventually land on the local Tor
daemon, which will then make a connection to PORT on localhost, which is
where Tahoe is listening for connections.
Follow a similar process to build a Tahoe server that listens on I2P. The
same process can be used to listen on both Tor and I2P (``tub.location =
tor:ONION.onion:VIRTPORT,i2p:ADDR.i2p``). It can also listen on both Tor and
plain TCP (use-case 2), with ``tub.port = tcp:PORT``, ``tub.location =
tcp:HOST:PORT,tor:ONION.onion:VIRTPORT``, and ``anonymous = false`` (and omit
the ``tcp = tor`` setting, as the address is already being broadcast through
the location announcement).
Server anonymity, automatic configuration
-----------------------------------------
(note: this is not yet implemented, see Tahoe tickets `#2490`_ and `#2773`_
for progress)
To configure a server node to listen on an anonymizing network, create the
node with the ``--listen=tor`` option. This requires a Tor configuration that
either launches a new Tor daemon, or has access to the Tor control port (and
enough authority to create a new onion service).
This option will set ``reveal-IP-address = False`` and ``[connections] tcp =
tor``. It will allocate the necessary ports, instruct Tor to create the onion
service (saving the private key somewhere inside NODEDIR/private/), obtain
the ``.onion`` address, and populate ``tub.port`` and ``tub.location``
correctly.
Performance and security issues
===============================
If you are running a server which does not itself need to be
anonymous, should you make it reachable via an anonymizing network or
not? Or should you make it reachable *both* via an anonymizing network
and as a publicly traceable TCP/IP server?
There are several trade-offs effected by this decision.
NAT/Firewall penetration
------------------------
Making a server be reachable via Tor or I2P makes it reachable (by
Tor/I2P-capable clients) even if there are NATs or firewalls preventing
direct TCP/IP connections to the server.
Anonymity
---------
Making a Tahoe-LAFS server accessible *only* via Tor or I2P can be used to
guarantee that the Tahoe-LAFS clients use Tor or I2P to connect
(specifically, the server should only advertise Tor/I2P addresses in the
``tub.location`` config key). This prevents misconfigured clients from
accidentally de-anonymizing themselves by connecting to your server through
the traceable Internet.
Clearly, a server which is available as both a Tor/I2P service *and* a
regular TCP address is not itself anonymous: the .onion address and the real
IP address of the server are easily linkable.
Also, interaction, through Tor, with a Tor Hidden Service may be more
protected from network traffic analysis than interaction, through Tor,
with a publicly traceable TCP/IP server.
**XXX is there a document maintained by Tor developers which substantiates or refutes this belief?
If so we need to link to it. If not, then maybe we should explain more here why we think this?**
Linkability
-----------
As of 1.12.0, the node uses a single persistent Tub key for outbound
connections to the Introducer, and inbound connections to the Storage Server
(and Helper). For clients, a new Tub key is created for each storage server
we learn about, and these keys are *not* persisted (so they will change each
time the client reboots).
Clients traversing directories (from rootcap to subdirectory to filecap) are
likely to request the same storage-indices (SIs) in the same order each time.
A client connected to multiple servers will ask them all for the same SI at
about the same time. And two clients which are sharing files or directories
will visit the same SIs (at various times).
As a result, the following things are linkable, even with ``reveal-IP-address
= false``:
* Storage servers can link recognize multiple connections from the same
not-yet-rebooted client. (Note that the upcoming Accounting feature may
cause clients to present a persistent client-side public key when
connecting, which will be a much stronger linkage).
* Storage servers can probably deduce which client is accessing data, by
looking at the SIs being requested. Multiple servers can collude to
determine that the same client is talking to all of them, even though the
TubIDs are different for each connection.
* Storage servers can deduce when two different clients are sharing data.
* The Introducer could deliver different server information to each
subscribed client, to partition clients into distinct sets according to
which server connections they eventually make. For client+server nodes, it
can also correlate the server announcement with the deduced client
identity.
Performance
-----------
A client connecting to a publicly traceable Tahoe-LAFS server through Tor
incurs substantially higher latency and sometimes worse throughput than the
same client connecting to the same server over a normal traceable TCP/IP
connection. When the server is on a Tor Hidden Service, it incurs even more
latency, and possibly even worse throughput.
Connecting to Tahoe-LAFS servers which are I2P servers incurs higher latency
and worse throughput too.
Positive and negative effects on other Tor users
------------------------------------------------
Sending your Tahoe-LAFS traffic over Tor adds cover traffic for other
Tor users who are also transmitting bulk data. So that is good for
them -- increasing their anonymity.
However, it makes the performance of other Tor users' interactive
sessions -- e.g. ssh sessions -- much worse. This is because Tor
doesn't currently have any prioritization or quality-of-service
features, so someone else's ssh keystrokes may have to wait in line
while your bulk file contents get transmitted. The added delay might
make other people's interactive sessions unusable.
Both of these effects are doubled if you upload or download files to a
Tor Hidden Service, as compared to if you upload or download files
over Tor to a publicly traceable TCP/IP server.
Positive and negative effects on other I2P users
------------------------------------------------
Sending your Tahoe-LAFS traffic over I2P adds cover traffic for other I2P users
who are also transmitting data. So that is good for them -- increasing their
anonymity. It will not directly impair the performance of other I2P users'
interactive sessions, because the I2P network has several congestion control and
quality-of-service features, such as prioritizing smaller packets.
However, if many users are sending Tahoe-LAFS traffic over I2P, and do not have
their I2P routers configured to participate in much traffic, then the I2P
network as a whole will suffer degradation. Each Tahoe-LAFS router using I2P has
their own anonymizing tunnels that their data is sent through. On average, one
Tahoe-LAFS node requires 12 other I2P routers to participate in their tunnels.
It is therefore important that your I2P router is sharing bandwidth with other
routers, so that you can give back as you use I2P. This will never impair the
performance of your Tahoe-LAFS node, because your I2P router will always
prioritize your own traffic.