Introduction

The scope of this document is to create and document guidelines, norms, and procedures for the software engineering aspects of development, evolution, and long-term operation of the NSDF software stack, in particular regarding::

  • Use of repositories, branching, and versioning methodologies

  • Use of programming languages and frameworks

  • System and code documentation, and their continued sustainment

  • Style guidelines for user interfaces, code construction

  • Code review procedures

  • Deployment staging procedures

  • Test requirements: Unit, Regression, Integration, and Top-level validation approaches

  • Development project methodologies (e.g. agile practices)

  • Continuous integration and deployment (CI/CD) practices

  • Package management practices

  • Container management practices

  • Change management recommendations

Authors in alphabetical order:

Name

Email

Daniel Balouek

daniel.balouek@utah.edu

Kevin Coakley

kcoakley@ucsd.edu

Jakob Luettgau

jluettga@utk.edu

Paula Olaya

polaya@vols.utk.edu

Giorgio Scorzelli

scrgiorgio@gmail.com

Glenn Tarcea

gtarcea@umich.edu

Naweiluo Zhou

naweiluo.zhou@utk.edu

This is a guide to software development at the NSDF. It both serves as a source of information for exactly how NSDF works, and as a basis for discussions and reaching consensus about how to_ develop software_.

A Software Development Life Cycle (SDLC) is a methodology followed to create high-quality software. By adhering to a standard set of tools, processes, and duties, a software development team can build, design, and develop products that meet or exceed their clients’ expectations.

Source: https://brocoders.com/blog/agile-software-development-life-cycle

The most famous SDLC models are:

  • Waterfall: Follows a sequential model of phases, each of which has its tasks and objectives

  • Cleanroom: A process model that removes defects before they cause serious issues

  • Incremental: Requirements are divided into multiple standalone modules

  • V-Model: Processes are executed sequentially in a V-shape i.e. each step comes with its testing phase

  • Prototype: A working replication of the product is used to evaluate developer proposals

  • Big Bang: Requires very little planning and has no formal procedures; however, it’s a high-risk model

  • Agile: Uses cyclical, iterative progression to produce working software

This document specifies the software development procedures for the NSDF project and includes all development procedures between high-level requirements and either software release or the initiation of a DevOps deployment process.

Software checklist

In this section, we provide a short checklist for software projects, and the rest of this document elaborates on the various points in this list.

The bare minimum that every NSDF software project should do is:

  • choose and include an open-source license

  • use version control to enable collaborative developing

  • use a publicly accessible version-controlled repository

  • add a README.md file describing the project. This file is targeted towards developers. Keeping basic documentation in README.md can be useful for other developers to track steps and design decisions. Therefore it is convenient to create it from the beginning of the project when initializing a git repository.

NSDF also recommends doing the following, from the start of the project:

  • use code quality tools

  • use testing

  • use standards (protocols, conventions, tools, etc.)

  • Release user and development documentation

  • Provide issue trackers

  • Make the software citable adding a DOI

  • Release the software to a public registry

  • Add a public channel for communication

  • Implement and add a code of conduct

  • Add a contribution guideline document.

Programming languages and conventions

From the beginning of the project, a decision on the code style has to be made and then should be documented.

Not having a documented code style will highly increase the chance of inconsistent style across the codebase, even when only one developer writes code.

The NSDF should have a sane suggestion of coding style for each programming language we use. Coding styles are about consistency and making a choice, and not so much about the superiority of one style over the other

If your programming language supports namespaces, use nsdf.* to clarify the origin of the software.

NSDF wants to limit the development to a few core languages and frameworks.

At the NSDF we prefer C++, Python, Go, and JavaScript.

C/C++

C/C++ is the NSDF programming language for fast and core services such as the visualization and low-level storage of multi-resolution data.

NSDF C/C++ environment is built on:

  • C++ version: C++11 or C++17

  • Visual Studio for Windows, gcc/clang on other platforms

  • Code style: CppCoreGuidelines

  • Minimal self-contained dependencies (e.g. STL, boost, etc.)

  • Cross-platform make tool: CMake

Libraries:

  • Open MPI . to enable parallelism

  • Boost C++ is a popular collection of peer-reviewed, free, open-source C++ libraries.

    • Code is generally very high-quality, is widely portable, and fills many important gaps in the C++ standard library, such as type traits and better binders.

    • Maybe can hamper readability excessively “functional” style of programming

  • JSON for Modern C++

  • hdf5-cpp : The popular HDF5 binary format C++ interface.

  • ZeroMQ: lower level flexible communication library with a unified interface for message passing between threads and processes, but also between separate machines via TCP.

Python

Python is the NSDF dynamic language of choice.

We use it for data analysis and data science projects using the SciPy stack and Jupyter notebooks, and for many other types of projects: workflow management, visualization, web-based tools, etc.. It is not the language of maximum performance, although in many cases performance-critical components can be easily replaced by modules written in faster, compiled languages like C/C++ or CPython.

Python is very flexible and the most used programming language for scientific applications: a large number of useful frameworks and libraries are written in Python. Python allows easy integration with low-level bindings (e.g., C/C++) if efficiency is critical.

NSDF Python environment is built on:

  • Python 3.7+

  • Web applications: Django, Flask

  • Packaging: PiPy, Manager: pip (would avoid conda when possible)

  • Other services: Tornado

  • Templating: Jinja

  • Code style: PEP 8

Notebooks:

IDE:

Core scientific packages:

Visualization packages

  • Matplotlib. the standard in scientific visualization. It supports plotting through the pyplot submodule. It is highly customizable and runs natively on many platforms, making it compatible with all major OSes and environments. It supports most sources of data, including native Python objects, NumPy, and Pandas.

  • Seaborn is a Python visualization library based on Matplotlib and aimed towards statistical analysis. It supports NumPy, pandas, scipy, and statmodels.

  • bokeh is Interactive Web Plotting for Python.

  • Plotly is a platform for interactive plotting through a web browser, including in Jupyter notebooks.

Parallelization packages:

Go

Go is a statically typed, compiled programming language that is open-sourced and maintained by Google. Go uses a garbage collector to handle memory leaks.

Go is very fast and mostly used for server-side applications.

NSDF Go environment is built on:

  • Go 1.17+ (Recommend upgrading to the latest version whenever it becomes available. Versions are backward compatible. Version 1.18 will release “generics” for Go)

  • Code style: The Go community has standardized around the “go fmt” tool. All code should be run through the “go fmt“ tool to properly format it.

  • Dependencies Management: Use Go Modules for dependencies management.

  • Builds: Even though Go has a toolchain for builds it is recommended that a Makefile be created to hide the options.

  • Background “Daemon” processes: This is an area it is too easy to get wrong. Instead, use supervisord to handle background processes.

  • Web services: There are many different web service frameworks available. I’ve standardized on using Echo (echo.labstack.com). It has less boilerplate than the standard library and decent documentation.

  • Databases: Go has a standard DB library. I recommend using either the “sqlx“ package or gorm. I expect once Go 1.18 is released that many of the packages that currently rely on reflection will see a lot of changes.

JavaScript

JavaScript is the programming language for the World Wide Web, alongside HTML and CSS. All web browsers have a dedicated JavaScript engine to execute the code on users’ devices.

On the server side, there is Node.js, an open-source cross-platform JavaScript runtime environment with an event-driven architecture capable of asynchronous I/O.

NSDF JavaScript environment is built on:

  • ECMAScript 6

  • Packaging: NPM, Resolver: Yarn (faster)/npm

  • Cross-compiler: Babel

  • Code style: Airbnb

  • MVC/SPA Clientside frameworks:  React, Angular, Vue

  • Angular is an application framework by Google written in TypeScript.

  • React is a library that can be used to create interactive User Interfaces by combining components. It is developed by Facebook.

  • Vue.js is an open-source JavaScript framework for building user interfaces.

Security Considerations:

  • XSS

Awesome List

On GitHub, there is a concept of an _awesome list, that collects awesome libraries and tools on some topic. For instance, here is a subset: