Filename: 267-tor-consensus-transparency.txt
Title: Tor Consensus Transparency
Author: Linus Nordberg
Created: 2014-06-28
Status: Open

0. Introduction

   This document describes how to provide and use public, append-only,
   verifiable logs containing Tor consensus and vote status documents,
   much like what Certificate Transparency [CT] does for TLS
   certificates, making it possible for log monitors to detect false
   consensuses and votes.

   Tor clients and relays can refuse using a consensus not present in
   a set of logs of their choosing, as well as provide possible
   evidence of misissuance by submitting such a consensus to any
   number of logs.

1. Overview

   Tor status documents, consensuses as well as votes, are stored in
   one or more public, append-only, externally verifiable log using a
   history tree like the one described in [CrosbyWallach].

   Consensus-users, i.e. Tor clients and relays, expect to receive one
   or more "proof of inclusions" with new consensus documents. A proof
   of inclusion is a hash sum representing the tree head of a log,
   signed by the logs private key, and an audit path listing the nodes
   in the tree needed to recreate the tree head. Consensus-users are
   configured to use one or more logs by listing a log address and a
   public key for each log. This is enough for verifying that a given
   consensus document is present in a given log.

   Submission of status documents to a log can be done by anyone with
   an internet connection (and the Tor network, in case of logs only
   on a .onion address). The submitter gets a signed tree head and a
   proof of inclusion in return. Directory authorities are expected to
   submit to one or more logs and include the proofs when serving
   consensus documents. Directory caches and consensus-users receiving
   a consensus not including a proof of inclusion may submit the
   document and use the proof they receive in return.

   Auditing log behaviour and monitoring the contents of logs is
   performed in cooperation between the Tor network and external
   services. Relays act as log auditors with help from Tor clients
   gossiping about what they see. Directory authorities are good
   candidates for monitoring log content since they know what votes
   they have sent and received as well as what consensus documents
   they have issued. Anybody can run both an auditor and a monitor
   though, which is an important property of the proposed system.

2. Motivation

   Popping a handful of boxes (currently five) or factoring the same
   number of RSA keys should not be ruled out as a possible attack
   against a subset of Tor users. An attacker controlling a majority
   of the directory authorities signing keys can, using
   man-in-the-middle or man-on-the-side attacks, serve consensus
   documents listing relays under their control. If mounted on a small
   subset of Tor users on the internet, the chance of detection is
   probably low. Implementation of this proposal increases the cost
   for such an attack by raising the chances of it being detected.

   Note that while the proposed solution gives each individual some
   degree of protection against using a false consensus this is not
   the primary goal but more of a nice side effect. The primary goal
   is to detect correctly signed consensus documents which differ from
   the consensus of the directory authoritites. This raises the risk
   of exposure of an attacker capable of producing a consensus and
   feed it to users.

   The complexity of the proposed solution is motivated by the fact
   that the log key is not just another key on top of the directory
   authority keys since the log doesn't have to be trusted. Another
   value is the decentralisation given -- anybody can run their own
   log and use it. Anybody can audit all existing logs and verify
   their correct behaviour. This empowers people outside the group of
   Tor directory authority operators and the people who trust them for
   one reason or the other.

3. Design

   Communication with logs is done over HTTP using TLS or Tor onion
   services for transport, similar to what is defined in
   [rfc6962-bis-12]. Parameters for POSTs and all responses are
   encoded as name/value pairs in JSON objects [RFC4627].

   Summary of proposed changes to Tor:

   - Configuration is added for listing known logs and for describing
     policy for using them.

   - Directory authorities start submitting newly created consensuses
     to at least one public log.

   - Tor clients and relays receiving a consensus not accompanied by a
     proof of inclusion start submitting that consensus to at least
     one public log.

   - Consensus-users start rejecting consensuses accompanied by an
     invalid proof of inclusion.

   - A new cell type LOG_STH is defined, for clients and relays to
     exchange information about seen tree heads and their validity.

   - Consensus-users send seen tree heads to relays acting as log
     auditors.

   - Relays acting as log auditors validate tree heads (section 3.2.2)
     received from consensus-users and send results back.

   - Consensus-users start rejecting consensuses for which valid
     proofs of inclusion can not be obtained.

   Definitions:

   - Log id: The SHA-256 hash of the log's public key, to be treated
     as an opaque byte string identifying the log.

3.1. Consensus submission

   Logs accept consensus submissions from anyone as long as the
   consensus is signed by a majority of the Tor directory authorities
   of the Tor network that it's logging.

   Consensus documents are POST:ed to a well-known URL as defined in
   section 5.2.

   The output is what we call a proof of inclusion.

3.2. Verification

3.2.1. Log entry membership verification

   Calculate a tree head from the hash of the received consensus and
   the audit path in the accompanying proof. Verify that the
   calculated tree head is identical to the tree head in the
   proof. This can easily be done by consensus-users for each received
   consensus.

   We now know that the consensus is part of a tree which the log
   claims to be The Tree. Whether this tree is the same tree that
   everybody else see is unknown at this point.

3.2.2. Log consistency verification

   Ask the log for a consistency proof between the tree head to verify
   and a previously known good tree head from the pool. Section 5.3
   specifies how to fetch a consistency proof.

   [[TBD require auditors to fetch and store the tree head for the
   empty tree as part of bootstrapping, in order to avoid the case
   where there's no older tree to verify against?]]

   [[TODO description of verification of consistency goes here]]

   Relays acting as auditors cache results to minimise calculations
   and communication with log servers.

   [[TBD have clients verify consistency as well? NOTE: we still want
   relays to see tree heads in order to catch a lying log (the
   split-view attack)]]

   We now know that the verified tree is a superset of a known good
   tree.

3.3. Log auditing

   A log auditor verifies two things:

   - A logs append-only property, i.e. that no entries once accepted
   by a log are ever altered or removed.

   - That a log presents the same view to all of its users [[TODO
   describe the Tor networks role in auditing more than what's found
   in section 3.2.2]]

   A log auditor typically doesn't care about the contents of the log
   entries, other than calculating their hash sums for auditing
   purposes.

   Tor relays should act as log auditors.

3.4. Log monitoring

   A log monitor downloads and investigates each entry in a log
   searching for anomalies according to its monitoring policy.

   This document doesn't define monitoring policies but does outline a
   few strategies for monitoring in section [[TBD]].

   Note that there can be more than one valid consensus documents for
   a given point in time. One reason for this is that the number of
   signatures can differ due to consensus voting timing
   details. [[TODO Are there more reasons?]]

   [[TODO expand on monitoring strategies -- even if this is not part
   of the proposed extensions to the Tor network it's good for
   understanding. a) dirauths can verify consensus documents byte for
   byte; b) anyone can look for diffs larger than D per time T, where
   "diffs" certainly can be smarter than a plain text diff]]

3.5. Consensus-user behaviour

   [[TODO move most of this to section 5]]

   Keep an on-disk cache of consensus documents. Mark them as being in
   one of three states:

   LOG_STATE_UNKNOWN -- don't know whether it's present in enough logs
                        or not
   LOG_STATE_LOGGED -- have seen good proof(s) of inclusion
   LOG_STATE_LOGGED_GOOD -- confident about the tree head representing
                            a good tree

   Newly arrived consensus documents start in UNKNOWN or LOGGED
   depending on whether they are accompanied by enough proofs or
   not. There are two possible state transitions:

   - UNKNOWN --> LOGGED: When enough correctly verifying proofs of
     inclusion (section 3.2.1) have been seen. The number of good
     proofs required is a policy setting in the configuration of the
     consensus-user.

   - LOGGED --> LOGGED_GOOD: When the tree head in enough of the
     inclusion proofs have been verified (section 3.2.2) or enough
     LOG_STH cells vouching for the same tree heads have been
     seen. The number of verifications required is a policy setting in
     the configuration of the consensus-user.

   Consensuses in state UNKNOWN are not used but are instead submitted
   to one or more logs. If the submission succeeds, this will take the
   consensus to state LOGGED.

   Consensuses in state LOGGED are used despite not being fully
   verified with regard to logging. LOG_STH cells containing
   tree heads from received proofs are being sent to relays for
   verification. Clients send to all relays that they have a circuit
   to, i.e. their guard relay(s). Relays send to three random relays
   that they have a circuit to.

3.6. Relay behaviour when acting as an auditor

   In order to verify the append-only property of a log, relays acting
   as log auditors verify the consistency of tree heads received in
   LOG_STH cells. An auditor keeps a copy of 2+N known good tree heads
   in a pool stored on persistent media [[TBD where N is either a
   fixed number in the range 32-128 or is a function of the log
   size]]. Two of them are the oldest and newest tree heads seen,
   respectively. The rest, N, are randomly chosen from the tree heads
   seen.

   [[TODO describe or refer to an algorithm for "randomly chosen",
   hopefully not subjective to flushing attacks (or other attacks)]].

3.7. Notable differences from Certificate Transparency

   - The data logged is "strictly time-stamped", i.e. ordered.

   - Much shorter lifetime of logged data -- a day rather than a
     year. Is the effects of this difference of importance only for
     "one-shot attacks"?

   - Directory authorities have consensus about what they're
     signing -- there are no "web sites knowing better".

   - Submitters are not in the same hurry as CA:s and can wait minutes
     rather than seconds for a proof of inclusion.

4. Security implications

  TODO

5. Specification

5.0. Data structures

   Data structures are defined as described in [RFC5246] section 4,
   i.e. TLS 1.2 presentation language. While it is tempting to try to
   avoid yet another format, the cost of redefining the data
   structures in [rfc6962-bis-12] outweighs this consideration. The
   burden of redefining, reimplementing and testing is extra true for
   those structures which need precise definitions because they are to
   be signed.

5.1. Signed Tree Head (STH)

   An STH is a TransItem structure of type "signed_tree_head" as
   defined in [rfc6962-bis-12] section 5.8.

5.2. Submitting a consensus document to a log

   POST https://<log server>/tct/v1/add-consensus

   Input:

     consensus: A consensus status document as defined in [dir-spec]
       section 3.4.1 [[TBD gziped and base64 encoded to save 50%?]]

   Output:

     sth: A signed tree head as defined in section 5.1 refering to a
     tree in which the submitted document is included.

     inclusion: An inclusion proof as specified for the "inclusion"
     output in [rfc6962-bis-12] section 6.5.

5.3. Getting a consistency proof from a log

   GET https://<log server>/tct/v1/get-sth-consistency

   Input and output as specified in [rfc6962-bis-12] section 6.4.

5.x. LOG_STH cells

   A LOG_STH cell is a variable-length cell with the following
   fields:

     TBDname [TBD octets]
     TBDname [TBD octets]
     TBDname [TBD octets]

6. Compatibility

   TBD

7. Implementation

   TBD

8. Performance and scalability notes

   TBD

A. Open issues / TODOs

   - TODO: Add SCTs from CT, at least as a practical "cookie" (i.e. no
     need to send them around or include them anywhere). Logs should
     be given more time for distributing than we're willing to wait on
     an HTTP response for.

   - TODO: explain why no hash function and signing algorithm agility,
     [[rfc6962-bis-12] section 10

   - TODO: add a blurb about the values of publishing logs as onion
     services

   - TODO: discuss compromise of log keys

B. Acknowledgements

   This proposal leans heavily on [rfc6962-bis-12]. Some definitions
   are copied verbatim from that document. Valuable feedback has been
   received from Ben Laurie, Karsten Loesing and Ximin Luo.

C. References

   [CrosbyWallach] http://static.usenix.org/event/sec09/tech/full_papers/crosby.pdf
   [dir-spec] https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt
   [RFC4627] https://tools.ietf.org/html/rfc4627
   [rfc6962-bis-12] https://datatracker.ietf.org/doc/draft-ietf-trans-rfc6962-bis/12
   [CT] https://https://www.certificate-transparency.org/