Continuous Documentation

Overview of section contents:

Section	Description
Markdown	The Markdown language for lightweight documentation
Documentation as code	NSDF recommendation is to follow the “Documentation of code” philosophy as closely as possible
Code as Documentation	Code in a way that is more readable and self-explaining, particularly practices for C++ and python
Documentation tools	Documentation using Jupyter Notebook and Web service API

Say what you mean, simply and directly. Don’t comment on bad code, rewrite it, Make sure comments and code agree (The Elements of Programming Style by Brian W. Kernighan and P. J. Plauger)

Documenting Software is an important activity of development and it is fundamental for software maintenance _and _knowledge transfer.

But writing too much and too verbose documentation could be a problem itself, and for this reason, NSDF recommends following some principles contained in the “Agile Manifesto” (written by seventeen software developers on February 11-13, 2001, at The Lodge at Snowbird, a ski resort in the Wasatch mountains of Utah see Manifesto for Agile Software Development):

We embrace documentation, but not hundreds of pages of never-maintained and rarely-used tomes
[…] While there is value in the comprehensive documentation we value working software more.

Documenting software is always an imperfect compromise: too much documentation would be a waste of time, and developers will rarely trust it anyway because it’s usually out of sync with the actual code. On the other hand, too little documentation is always a source of problems with team communication, learning, and knowledge sharing.

So NSDF’s major recommendation is to “document code efficiently”:

Write only the minimum, useful, accurate documentation
- Make sure documentation is “just barely good enough”. Any document will need to be maintained later on.
  If the documentation is light it’s easier to comprehend and update.
Write it “just in time” (JIT).
- Wait before documenting.
- Produce documentation when it is needed, not before.
- System overviews and support documentation are best written towards the end of the software development life cycle.
Cut out anything unnecessary
- documentation is only useful if it’s accessible.
Follow code changes; have documents that are always shippable
Keep documents in one place and make them accessible online.
- Store your product documentation in a place where all the members and external contributors can find it.
Collaborate. Writing documentation is a collaborative and instructive process
- Every team member should be encouraged to contribute.

Traditional waterful model to document code vs Agile approach.

Markdown

Markdown is a lightweight markup language that allows you to create web pages, wikis, and user documentation with a minimum of effort. Documentation written in markdown looks exactly like a plain-text document and is perfectly human-readable.

In addition, it can also be automatically converted to HTML, latex, pdf, etc.

Documentation as Code

There has long been a mindset to treat documentation and code as separate functions. But this thinking is obsolete.

Traditional methods of documentation focus on the concept of a printed page. But most documentation in today’s age is never printed: the documentation created with page-oriented methods does not adapt well to different electronic always-online devices.

It’s very instructive to learn from what big IT companies did in the recent past facing the “documenting code in an efficient way” problem:

Twitter 2014 talk described how they solved the documentation maintainability problem. Indeed they were probably the first to end up treating their documents like code.
Google 2015 talk (Documentation disrupted: how two technical writers change google engineering culture”) Riona Macnamara, a technical writer at Google, confirmed that the major problem was not the lack of documentation, but rather that it was outdated, untrustworthy, and scattered across wikis, Google Sites, Google Docs, etc.
Spotify 2019 talk described how they changed their approach in writing internal technical documentation (cit. “We conducted a company-wide productivity survey. The third-largest problem according to all our engineers? Not being able to find the technical information they needed to do their work.” ) and announced the TechDocs open-source Cloud Native Computing Foundation (CNCF) platform

Documentation as code addresses the need for multiple formats and ease of maintenance.

It makes the documentation part of the Continuous Integration (CI) pipeline; and it empowers developers to apply the same methods and tools, such as

Issue Trackers
Version Control (Git)
Plain Text Markup (Markdown, reStructuredText, Asciidoc)
Code Reviews
Automated Tests

Therefore NSDF recommendation is to follow the “Documentation of code” philosophy as closely as possible:

Use plain text files (e.g. Markdown file format). This way the documentation can be consumed on any device.
Use open-source static site generators to build the files locally (e.g. Sphinx, Jekyll, Hugo)
Work with files through a text editor (e.g. Visual Studio Code, Sublime Text, etc.)
Store documents in a version control repository (e.g. GitHub) or a collaborative documental environment (e.g. Slite, Docs in ClickUp™, etc.).
- Online content makes documents easy to consume.
- Content exists in one place but can be pulled into other documents as needed
- Content is searchable within and across documents,
- Content keeps updated with code changes.
Collaborate using version control to branch, merge, push, and pull updates.
We can add validation tests to check for broken links, improper terms/styles, and formatting errors
- As an example, https://slateci.io/ is the website of a successful NSF project that is using the approach described here.

This approach will help to build and maintain the NSDF Web Site too: we can store Jekyll templates into a central GitHub repository and, on git-push events, a workflow will run and automatically build the NSDF website (see the GitHub Pages project for more details).

As an example, https://slateci.io/, another NSF-funded project for “Federated Operation of Science Platforms” is using Jekyll to produce their website, and collaborating/modifying it’s just a matter of editing markdown documents on GitHub.

Netflix Open Source Software Center (OSS) is made with GitHub Pages (https://github.com/Netflix/netflix.github.com)

Code as Documentation

Code as documentation is a principle that advocates making code more readable and self-explaining. But it does not mean that the code should not be documented, or that it is the only source of documentation.

Regarding this matter, we are quoting the words of a famous post by Martin Fowler (see https://martinfowler.com/bliki/CodeAsDocumentation.html):

One of the common elements of agile methods is that they raise programming to a central role in software development - one much greater than the software engineering community usually does Part of this is classifying the code as a major, if not the primary documentation of a software system. [..]
I think part of the reason that code is often so hard to read is that people aren’t taking it seriously as documentation. […] So the first step to clear code is to accept that code is documentation, and then put the effort in to make it clear. […] We as a whole industry need to put much more emphasis on valuing the clarity of code.

There are several ways we can assure that our code is clean and easy to understand (For more extensive descriptions see Clean Code: A Handbook of Agile Software Craftsmanship: Martin, Robert C. ).

Use intention-revealing names

Give meaningful names to variables, methods, and classes (even if they become long). If the class or method name describes what it does, and if the field name informs what it has, it is not necessary to write a comment to inform it. The idea is that, when we read a variable or method name, we can understand what it does. Below, two code examples show the same variable declaration was written in two different ways.

Bad code:

int d; //elapsed time in days

Good code:

int elapsedTimeInDays;

Refactor long blocks

Big methods are hard to read and understand, mainly if it has a lot of responsibilities. Write small methods. Each method should have only one responsibility and its name should describe it.

If some method is big, extract each functionality in smaller methods. When writing a code, keep in mind that it can be reused in another part of the system itself or other systems.

Use informative comments

A clean code tells you what it does, but it does not show clearly your intention and why it was done that way. For this, you can use comments, it is an important resource to complement the understanding of the code.

There are special comments like TODO and FIXME, that are used to record reminders to future improvements and corrective tasks. Use them when a code is incomplete, incorrect, or can be improved but you do not have time to make it at this time.

Avoid redundant, _misleading, _and _noisy _comments.

NSDF suggest reading this interesting post Writing system software: code comments which differentiate between positive forms of commenting) Functional, Design, Why, Teacher, CheckList, Guide) and somewhat questionable comments (Trivia, Debt, Backup); and it points out a reasonable vision:

Many comments don’t explain what the code is doing. They explain what you can’t understand just from what the code does. Often this missing information is why the code is doing a certain action, or why it’s doing something that is clear instead of something else that would feel more natural.

While it is not generally useful to document, line by line, what the code is doing because it is understandable just by reading it, a key goal in writing readable code is to lower the amount of effort and the number of details the reader should take into her or his head while reading some code. So comments can be, for me, a tool for lowering the cognitive load of the reader.

Use class-level documentation

Class-level documentation should describe the purpose of this unit of work and how to use it. There are conventions to use and best practices; make sure you follow your choice of programming language’s convention and best practices.

Use method-level documentation

Method documentation describes the purpose of the method and is a more specialized description than the class documentation. There are conventions to use and best practices; make sure you follow your choice of programming language’s convention and best practices.

Use proper formatting

You should take care that your code is nicely formatted. No one will care to read the code if it’s ugly formatted because it’s a clear sign of a poorly maintained project. NSDF should choose a set of simple rules that govern the format of the code, and these rules should be consistently applied. NSDF developers should agree to a single set of formatting rules and all members should comply.

Use error handling

Make error handling clean and short: it shows what can go wrong. Use, if possible, exceptions rather than return codes. Add informative and self-contained error messages; add some context (e.g. in C++ FILE, LINE); classify types and gravity of errors. Use a serious and production-ready logging system to keep track of errors and for _post-mortem _debugging.

C++ Documentation

Doxygen is the most widely used C++ documentation tool.

The generated documentation makes it easier to navigate and understand the code as it may contain all public functions, classes, namespaces, enumerations, side notes, and code examples.

Doxygen:

Supports a variety of output formats, including HTML and PDF.
It can extract the code structure from undocumented source files.
It can visualize the relations between the various elements using include dependency graphs, inheritance diagrams, and collaboration diagrams.
supports multiple languages (C/C++, Fortran, Objective-C, C#. PHP, Python, etc.)

Several well-known C++ libraries use Doxygen for their documentation (e.g. Apache Portable Runtime, CppUnit, Free Image, GNU Standard C++ Library, KDE, LLVM, OGRE, VTK, here a full list https://www.doxygen.nl/projects.html ).

But one pain point consists in the fact that generated documents tend to be visually noisy, with a style that struggles to represent complex template-based APIs. There are also some limitations to the Doxygen markup language.

To solve this issue, the C++ community is recently switching to Sphinx, the most used Python documentation tool, that can be adapted to use the Doxygen parser (see https://devblogs.microsoft.com/cppblog/clear-functional-c-documentation-with-sphinx-breathe-doxygen-cmake/):

The Sphinx module Breathe parses Doxygen XML output and produces Sphinx documentation.
Breathe can be integrated with Read the Docs to post documentation online (cit “Technical documentation lives here”)
it supports hybrid syntax, i.e., using reStructuredText in Doxygen markup

The Sphinx-generated documents look more modern and minimal and it’s much easier to swap to a different theme and modify the layout of the pages.

NSDF recommends using the mix Doxygen/Sphinx to document its C++ code.

Visual comparison of Doxygen and Sphinx output

Python Documentation

There are several ways NSDF can produce Python documentation (see this link https://wiki.python.org/moin/DocumentationTools for an exhaustive list).

Some key factors that must influence our choice are:

Visual appeal and ease-of-use (where case studies and/or screenshots are available)
Potential dependency fragility (most importantly which versions of Python)
Community size/engagement and availability of tool support
Run-time introspection vs static analysis

Sphinx is the most used and comprehensive Python documentation generator. It supports reStructuredText in docstrings and produces an HTML output with a very clean visual style.

Several Python libraries are using Sphinx document generator (e.g. Flask Django, PyCuda, OpenCV, PyQt5, MatplotLib, Pandas, Conda, Pip, Pillow, PyPy, NumPy, SciPy). Python itself uses Sphinx. Full list is available here http://www.sphinx-doc.org/en/master/examples.html.

The main features of Sphinx are:

Supports a variety of output formats, including HTML and PDF.
Easy cross-referencing via semantic markup and automatic links for functions, classes, citations.
The simple hierarchical structure of the documentation tree with automatic links to siblings, parents, children.
Automatic indices.
Extensible.
Easy configuration, mostly automatic.

Setting it up requires a bit of configuration. For a quick tutorial look here.

Python 3.x and NumPy use Sphinx for documenting their classes

NSDF recommends the use of Sphinx to generate Python documentation. But alternatives are considered in the following table:

pdoc
- is probably the second-most popular Python-exclusive doc tool
- its code is a fraction of Sphinx’s complexity and the output is not quite as polished,
- it works with zero configuration in a single step.
- It also supports docstrings for variables through source code parsing. Otherwise, it uses introspection.
- It is worth checking out if Sphinx is too complicated for the NSDF use case.
pydoctor
- is an API documentation generator that works by static analysis.
- The main benefit is that it traces inheritances particularly well, even for multiple interfaces.
- It can pass the resulting object model to Sphinx if you prefer its output style.
Doxygen
- is a not Python exclusive documentation generator.
- It can generate documentation from undocumented source code (mostly inheritances and dependencies).
- Many teams already know this tool from its wide use in multiple languages, particularly C++.

Jupyter Notebooks Documentation

A Jupyter notebook is a document that supports mixing executable code, equations, visualizations, and narrative text. Jupyter notebooks allow users to “bring together data, code, and prose, to tell an interactive, computational story“.

Jupyter Notebook can combine codes and explanations with the interactivity of the application. This makes it a handy tool for data scientists for streamlining end-to-end data science workflows.

Jupyter Notebooks have played an essential role in the democratization of data science, making it more accessible by removing barriers of entry for data scientists.

Jupyter Notebook uses ipywidgets packages for “live interactions with code”: code can be edited by users and can also be sent for a re-run, making Jupyter’s code non-static.

Jupyter Notebook makes it easy to explain codes line-by-line with feedback attached all along the way. Users can add interactivity along with explanations, while the code is fully functional.

For all the above reasons, Jupyter Notebooks will be a very important source of documentation about the NSDF software stack, and it is the best instrument to document code and algorithms.

Web Services API Documentation

NSDF web services will follow the philosophy of “microservice architecture” that is defined as a set of loosely coupled, collaborating services.

The benefits of using such architecture include:

Microservices are small, loosely coupled.
- Each has its base code, meaning a smaller team is needed to manage this codebase.
Microservices can be deployed independently.
- This also means that services can be scaled independently as needed.
Microservices do not need to share the same technology stack.
- One service could be written in one language (e.g. C++); another could be written in a different language (e.g. Python).

To implement microservices NSDF recommends the RESTful API standard.

Representational State Transfer (RESTful) API

One of the most popular types of APIs for building microservices applications is known as “RESTful API” or “REST API.”

REST API is a popular standard among developers because it uses HTTP commands, which most developers are familiar with and has an easy time using.

Here are the defining characteristics of RESTful API:

An API that uses the REST (Representational State Transfer) model.
Relies on HTTP coding which is familiar to web developers.
Uses _Secure Sockets Layer (_SSL) encryption.
It is language-agnostic, it connects apps and microservices written in different programming languages.
it simplifies the creation of web applications through _Create, Retrieve, Update, Delete _(CRUD) operations

The HTTP commands, or “verbs”, common to REST API include PUT, POST, DELETE, GET, PATCH.

Developers use these RESTful API commands to perform actions on different “resources” within an application or service. RESTful APIs use standard URLs to locate resources.

The familiarity and reliability of RESTful API commands, rules, and protocols make it easier to develop applications that integrate with applications that have an associated API.

RESTful API will be used by NSDF to make its services available to the community and to integrate with third-party applications.

RESTful documentation

RESTful APIs tend to evolve rapidly during development and release cycles.

Maintaining and updating API documentation for the development team and external users is a difficult but necessary process.

To document NSDF API, we are recommending using the OpenAPI Specification (OAS).

The OpenAPI Specification (formerly known as Swagger Specification) defines a standard and language-agnostic interface to RESTful APIs which allows both humans and computers to discover and understand the capabilities of the service without access to source code, documentation, or through network traffic inspection.

A consumer can understand and interact with the remote service with a minimal amount of implementation logic.

OpenAPI definitions can be written in JSON or YAML. We recommend YAML since it’s also the file format used for Kubernetes deployment.

A simple OpenAPI specification looks like this:

openapi: 3.0.0
info:
  version: 1.0.0
  title: NSDF API
  description: A sample API to illustrate an NSDF service
paths:
  /list:
    get:
      description: Returns a list of cloud resources
      responses:
        '200':
          description: Successful response

The adoption of OpenAPI will:

Help members understand the NSDF APIs and agree on their attributes.
Help users to experiment with the APIs (for example using the Swagger UI_ _open-source tool)
Simplify the creation of automatic tests
Accelerate the development by automatically generating stub code (i.e. temporary substitute for yet-to-be-developed code)

Links/Bibliography

List: