Overview and tools needed

At the core of PDF generation process is Prince, which converts HTML/CSS (and even some JS) to print quality PDF with bookmarks, links, page numbers etc. Unfortunately there are no open-source alternatives that operate at the same level as Prince (yet) - however, Prince offers a non-commercial license that is perfectly suited for our use-case. As a requisite a water-mark is displayed on the first page of the PDF and we must place prominent links to the Prince website at the places we intend to serve the PDF.

Prince can be downloaded here. The minimum version for MathJax 3 support is 20210624 which you can find here.

On a Windows environment it makes sense to add the location of the Prince executable, e.g. C:\Program Files (x86)\Prince\engine\bin, to the PATH.

Build process

The bash script ‘pdf-docs.sh’ in root can be used to build the PDF. It goes through the following steps

Build a web target with jekll

First, let’s kill all running instances of jekyll:

kill -9 $(ps aux | grep '[j]ekyll' | awk '{print $2}')

Then build the site locally with jekyll:

bundle exec jekyll serve --detach --config _config.yml,pdfconfigs/config_docs_pdf.yml

The --detach option is not strictly necessary, it detaches jekyll from the terminal. For debugging reasons it might actually be preferable to use two separate shells in parallel.

Note that in a Windows environment the --detach option will likely fail and that the --config option, which specifies a list of config.ymls to be used, needs to be enclosed in ":

bundle exec jekyll serve --config "_config.yml,pdfconfigs/config_docs_pdf.yml"

At this point the site is up and can be accessed at http://localhost:4000/.

In the directory pdfconfigs there is a file called prince-list.txt that looks like this:

{% for entry in sidebar %}
    {% for folder in entry.folders %}
        {% if folder.output contains "pdf" %}
            {% for folderitem in folder.folderitems %}
                {% if folderitem.output contains "pdf" %}

                {{folderitem.url}}

                {% endif %}
            {% endfor %}
        {% endif %}
      {% endfor %}
 {% endfor %}

I.e. we loop through every item in the appropriate sidebar.yml, check whether output is pdf, and if so print the item’s URI. As a result, after running jekyll, prince-list.txt contains a list of URIs for all items in sidebar.yml:

http://localhost:4000/titlepage.html
http://localhost:4000/tocpage.html
http://localhost:4000/docs.html
[...]

Convert to PDF

This list is consumed by Prince and converted into PDF:

prince --javascript --input-list=_site/pdfconfigs/prince-list.txt -o pdf/docs.pdf

The final PDF can be found in pdf/docs.pdf as specified. The --javascript option enables JavaScript support.

Styling

The web target built by

bundle exec jekyll serve --config "_config.yml,pdfconfigs/config_docs_pdf.yml"

is different from a usual build only in the way it specifies a second config.yml namely pdfconfigs/config_docs_pdf.yml. From a styling perspective the latter includes a different default layout of type page_print:

defaults:
  -
    scope:
      path: ""
      type: "pages"
    values:
      layout: "page_print"
      comments: true
      search: true

This layout type includes a separate <head></head> in the form of _includes/head_print.html. Apart from some minor changes, e.g. by referencing resources such as stylesheets, scripts and fonts locally, it also points to css/printstyles.css. This is where styles specific to the PDF are stored.

For further reference consult the documentation of documentation-theme-jekyll.

Contents, title page and table of contents

The PDF will contain every page of type pdf that is referenced in the sidebar pdf_sidebar in pdfconfigs/config_docs_pdf.yml. E.g. if

pdf_sidebar: docs_sidebar

and docs_sidebar.yml is

  - title: "A selection of fruits"
    output: web, pdf
    folderitems:

    - title: Apples
      url: /apples.html
      output: web, pdf

    - title: Oranges
      url: /oranges.html
      output: web

the PDF will contain apples.html but not oranges.html.

Furthermore two more pages have to be included in docs_sidebar.yml:

  - title:
    output: pdf
    type: frontmatter
    folderitems:

    - title:
      url: /titlepage.html
      output: pdf
      type: frontmatter

    - title:
      url: /tocpage.html
      output: pdf
      type: frontmatter

These two pages are located in pdfconfigs/ and govern the layout of the title page as well as the table of contents.

For further information see here.

Troubleshooting and common issues

Warnings by prince

Prince XML is different from a browser in the way it handles HTML/CSS more rigorously and will warn about every CSS property that are not 100% W3C compliant. Bootstrap, say, on the other hand, does make use of CSS hacks deliberately.

Prince also consumes one page at a time and doesn’t cache common resources (stylesheets, scripts, etc) like a browser.

Due to these two points, taken together, Prince can output a long number of warnings in the conversion process.

Make resources available locally

Because Prince consumes HTML pages one at a time, it is convenient to make resources (stylesheets, scripts, fonts etc.) available locally and not have them fetched from a CDN. For this purpose a separate _includes/head_print.html exists. Note that also secondary resources (resources referenced in resources) need to be held locally, e.g. bootstrap.min.css references bootstrap.min.css.map.

Missing glyphs or fonts

If Prince complains about missing glyphs or fonts make sure that the specified fonts are either available as resources (as a *.ttf, *.woff etc) or installed on the local machine. In our case, at the time of writing, this includes

  • Fira Sans Light, Regular, Medium, Bold, Italic (in ./fonts)
  • Fira Mono Regular (in ./fonts)
  • Font Awesome 5 (in ./webfonts)
  • Glyphicons halflings (in ./fonts)

The location of these fonts has to be relative to where they are referenced, e.g. css/fontawesome5.14.0.min.css mentions url(../webfonts/fa-brands-400.woff2).

For further information see the Prince documentation on missing fonts.

Troubleshooting

A useful hack is to modify prince-list.txt to only contain reference to a single html page, that can then be easily troubleshooted.

For further information see the Prince documentation on troubleshooting.

Overriding bootstrap print styles

Bootstrap by default sets color values to black for all elements for @media print. To override this behaviour modify the (local) minified bootstrap.min.css from

@media print{*,:after,:before{color:#000!important;text-shadow:none!important;background:0 0!important;-webkit-box-shadow:none!important;box-shadow:none!important}

to

@media print{*,:after,:before{/*color:#000!important;*/text-shadow:none!important;/*background:0 0!important*/;-webkit-box-shadow:none!important;box-shadow:none!important}

This is considered a hack and needs to be repeated with every update of Bootstrap. For more information see documentation-theme-jekyll docs.

KaTeX and MathJax

While we usually use KaTeX for the rendering of LaTeX syntax to SVG, this type of rendering does not seem to be supported by Prince at present. However, since Prince 20210624 MathJax 3 is supported and for that reason MathJax is used instead of KaTeX for the PDF generation.

A side effect of this change seems to be a noticeable slow-down in the PDF generation process.