.. _Preprocessor:

The Document Processor
======================

The OpenDSA textbook compilation pipeline includes custom
preprocessing of module files into compileable ReStructuredText source.
The main motivation for using our own document pre-processor was to
support integration beyond the file level in ways that Sphinx does (or
at least, did) not do.
This includes the ability to number document objects (figures, and
tables, and equations), and display numbered references.
When we started the OpenDSA project, DocUtils did not providie such
features.
Some of the pre-processor features might be added over time to Sphinx,
in which case we might eventually remove them from the pre-processor.
You can view the DocUtils To Do list at
`<http://docutils.sourceforge.net/docs/dev/todo.html>`_.

Overview
--------

The document processor works as a three-pass compiler.
The first two passes are executed on ``rst`` files before running
Sphinx, and the last pass is run against ``html`` files produced
by Sphinx.
The process results in three files,
two containing ducuments and objects numbers and one to check if 
the document has been modified.
All global variables are declared in a separate file (config.py).

First Pass
----------

INPUT
    Modules as ``rst`` source files.

OUTPUT
    A file JSON (page_chapter.json) containing a dictionary of modules
    and their associate chapter.

DESCRIPTION
    During the first pass, the document processor creates a dictionary
    of the highlest level elements in the document (modules).
    The dictionary contains tuples defined as
    ``(module_name, [chapter_name, chapter_number])``.

Second Pass
-----------

INPUT
    Modules as ``rst`` source files.

OUTPUT
    A JSON file (table.json) containing a dictionary of all documents
    objects and their appearance number.

DESCRIPTION
    During the second pass, the document processor creates a
    dictionary of all the objects inside modules.
    The appearance number is the concatenation of ``chapter_number``,
    ``module_number``, and ``object_number``.
    The dictionary contains tuples defined as
    ``(object_name, appearance_number.)``.

Integration with Sphinx
-----------------------

The ``numref`` (:ref:`numref`) directive adds numbers to document
objects (figures, tables, and equations) to the output of the
document preprocessor and uses it as hyperlink text for cross
referencing. 

Third Pass
----------

INPUT
    Modules as ``html`` files generated by Sphinx.

OUTPUT
    Modified ``html`` files with an updated table of contents and
    navigation bar, and section numbers augmented with a chapter
    number prefix.

DESCRIPTION
   During the third pass, the document processor parses the html files
   and replaces headers and section numbers as appropriate from the
   dictionaries created during the first two passes.
   Since our processor does not modify the Sphinx document tree, we
   have to modify ``html`` files to replace the raw Sphinx section
   number with our own numbering scheme.
   This phase applies only to the Table Of Content, the navigation
   bar, page headers, and sections.
   The document processor perform a third pass only if the
   html file has been modified by Sphinx.
   The file ``count.txt`` stores the latest modification times for the
   html files.

Where things are
----------------

There are many files that affect the eventual HTML output.
Here is a list of places to look if you are trying to make changes.

  OpenDSA/RST/source/_themes/haiku/basic/layout.html

  OpenDSA/RST/source/_themes/haiku/static/haiku.css_t

  OpenDSA/RST/preprocessor.py

  OpenDSA/RST/ODSAextensions

  OpenDSA/tools/configure.py
