Filename: 345-specs-in-mdbook.md
Title: Migrating the tor specifications to mdbook
Author: Nick Mathewson
Created: 2023-10-03
Status: Closed

Introduction

I'm going to propose that we migrate our specifications to a set of markdown files, specifically using the mdbook tool.

This proposal does not propose a bulk rewrite of our specs; it is meant to be a low-cost step forward that will produce better output, and make it easier to continue working on our specs going forward.

That said, I think that this change will enable rewrites in the future. I'll explain more below.

What is mdbook?

Mdbook is a tool developed by members of the Rust community to create books with Markdown. Each chapter is a single markdown file; the files are organized into a book using a SUMMARY.md file.

Have a look at the mdbook documentation; this is what the output looks like.

Have a look at this source tree: that's the input that produces the output above.

Markdown is extensible: it can use numerous plugins to enhance the semantics of the the markdown input, add diagrams, output in more formats, and so on.

What would using mdbook get us immediately?

There are a bunch of changes that we could get immediately via even the simplest migration to mdbook. These immediate benefits aren't colossal, but they are things we've wanted for quite a while.

  • We'll have a document that's easier to navigate (via the sidebars).

  • We'll finally have good HTML output.

  • We'll have all our specifications organized into a single "document", able to link to one another and cross reference one another.

  • We'll have perma-links to sections.

  • We'll have a built-in text search function. (Go to the mdbook documentation and hit "s" to try it out.)

How will mdbook help us later on as we reorganize?

Many of the benefits of mdbook will come later down the line as we improve our documentation.

  • Reorganizing will become much easier.

    • Our links will no longer be based on section number, so we won't have to worry about renumbering when we add new sections.
    • We'll be able to create redirects from old section filenames to new ones if we need to rename a file completely.
    • It will be far easier to break up our files into smaller files when we find that we need to reorganize material.
  • We will be able make our documents even easier to navigate.

    • As we improve our documentation, we'll be able to use links to cross-reference our sections.
  • We'll be able to include real diagrams and tables.

  • We'll be able to integrate proposals more easily.

    • New proposals can become new chapters in our specification simply by copying them into a new 'md' file or files; we won't have to decide between integrating them into existing files or creating a new spec.

    • Implemented but unmerged proposals can become additional chapters in an appendix to the spec. We can refer to them with permalinks that will still work when they move to another place in the specs.

How should we do this?

Strategy

My priorities here are:

  • no loss of information,
  • decent-looking output,
  • a quick automated conversion process that won't lose a bunch of time.
  • a process that we can run experimentally until we are satisfied with the results

With that in mind, I'm writing a simple set of torspec-converter scripts to convert our old torspec.git repository into its new format. We can tweak the scripts until we like the that they produce.

After running a recent torspec-converter on a fairly recent torspec.git, here is how the branch looks:

https://gitlab.torproject.org/nickm/torspec/-/tree/spec_conversion?ref_type=heads

And here's the example output when running mdbook on that branch:

https://people.torproject.org/~nickm/volatile/mdbook-specs/index.html

Note: these is not a permanent URL; we won't keep the example output forever. When we actually merge the changes, they will move into whatever final location we provide.

The conversion script isn't perfect. It only recognizes three kinds of content: headings, text, and "other". Content marked "other" is marked with ``` to reneder it verbatim.

The choice of which sections to split up and which to keep as a single page is up to us; I made some initial decisions in the file above, but we can change it around as we please. See the configuration section at the end of the grinder.py script for details on how it's set up.

Additional work that will be needed

Assuming that we make this change, we'll want to build an automated CI process to build it as a website, and update the website whenever there is a commit to the specifications.

(This automated CI process might be as simple as git clone && mdbook build && rsync -avz book/ $TARGET.)

We'll want to go through our other documentation and update links, especially the permalinks in spec.torproject.org.

It might be a good idea to use spec.torproject.org as the new location of this book, assuming weasel (who maintains spec.tpo) also thinks it's reasonable. If we do that, we need to decide on what we want the landing page to look like, and we need very much to get our permalink story correct. Right now I'm generating a .htaccess file as part of the conversion.

Stuff we shouldn't do.

I think we should continue to use the existing torspec.git repository for the new material, and just move the old text specs into a new archival location in torspec. (We could make a new repository entirely, but I don't think that's the best idea. In either case, we shouldn't change the text specifications after the initial conversion.)

We'll want to figure out our practices for keeping links working as we reorganize these documents. Mdbook has decent redirect support, but it's up to us to actually create the redicrets as necessary.

The transition, in detail

  • Before the transition:

    • Work on the script until it produces output we like.
    • Finalize this proposal and determine where we are hosting everything.
    • Develop the CI process as needed to keep the site up to date.
    • Get approval and comment from necessary stakeholders.
    • Write documentation as needed to support the new way of doing things.
    • Decide on the new layout we want for torspec.git.
  • Staging the transition:

    • Make a branch to try out the transition; explicitly allow force-pushing that branch. (Possibly nickm/torspec.git in a branch called mdbook-demo, or torspec.git in a branch called mdbook-demo assuming it is not protected.)
    • Make a temporary URL to target with the transition (possibly spec-demo.tpo)
    • Once we want to do the transition, shift the scripts to tpo/torspec.git:main and spec.tpo, possibly?
  • The transition:

    • Move existing specs to a new subdirectory in torspec.git.
    • Run the script to produce an mdbook instance in torspec.git with the right layout.
    • Install the CI process to keep the site up to date.
  • Post-transition

    • Update links elsewhere.
    • Continue to improve the specs.

Integrating proposals

We could make all of our proposals into a separate book, like rust does at https://rust-lang.github.io/rfcs/ . We could also leave them as they are for now.

(I don't currently think we should make all proposals part of the spec automatically.)

Timing

I think the right time to do this, if we decide to move ahead, is before November. That way we have this issue as something people can work on during the docs hackathon.

Alternatives

I've tried experimenting with Docusaurus here, which is even more full-featured and generates pretty react sites like this. (We're likely to use it for managing the Arti documentation and website.)

For the purposes we have here, it seems slightly overkill, but I do think a migration is feasible down the road if we decide we do want to move to docusaurus. The important thing is the ability to keep our URLs working, and I'm confident we could do that

The main differences for our purposes here seem to be:

  • The markdown implementation in Docusaurus is extremely picky about stuff that looks like HTML but isn't; it rejects it, rather than passing it on as text. Thus, using it would require a more painstaking conversion process before we could include text like "<state:on>" or "A <-> B" as our specs do in a few places.

  • Instead of organizing our documents in a SUMMARY.md with an MD outline format, we'd have to organize them in a sidebar.js with a javascript syntax.

  • Docusaurus seems to be far more flexible and have a lot more features, but also seems trickier to configure.

<-- References -->