Some Thoughts on Software Documentation
Software documentation should serve to communicate all necessary aspects of software which cannot be more clearly explained by the code itself. Generally, that means it should not explain how code, works, but rather why a system was written the way that it was. If it is necessary to explain how code works, then it would be better to refactor the code until it is sufficiently clear on its own. Documentation can accomplish several purposes:
- comunicate what the project is for
- explain the main use-cases of the project, its configuration options, what specific commands or steps are necessary to meet those use-cases
- how to get set up to begin development
- the high level system architecture
- Explain the overall design of specific modules or, in the case of in-line comments, explain
Documentation should not do the following
- Document remaining work in the form of TODOs, except over very short periods ie. They should be resolved before official code review. Work should be planned and documented through work tracking software.
- Document manual steps which could be automated instead. For example, configuring a work laptop need not have a 15 step confluence page, rather a single shell (or other language) script should be created. 1. Another example is project configuration; to the greatest extent possible configuration should be automated, either in the form of using nix-shell, or including all dependencies in the node_modules folder. An alternative approach is to use containerization for local development, in which case project setup should involve three steps: 1) install docker, 2) clone repo, 3)
docker compose up --build
2. The result is that work station and project setup should generally be a 1-2 step process. 1) clone repo, 2) execute setup script. 3. A caveat is that documentation comes in many forms; while a markdown file should not explain detailed setup steps, it is nonetheless useful to know what all of the setup steps are. The result is that the steps should be documented as part of the output of the setup script. 4. If you find yourself documenting many manual steps, the real problem might not be one of documentation, but of project architecture. - Document long and comprehensive configuration options or countless use cases in a single README.md file 1. It is necessary for coders and users to be able to find the specific information they require quickly. If it is necessary to include a great deal of information, then it should be organized well, and broken down into pieces, with the existence and location of the pieces indicated by the main README, and any pieces of significant length should also include a table of contents.
Useful practices
- Use tooling and standards to improve the utility and accuracy of your documentation
- For example, use LLMs to verify that the codebase works the way your documentation says it works
- When building a REST API, always create (or generate) an OpenAPI specification. This specification can then be provided to tooling to display the API in a web app
- Similarly, some languages provide tooling to help visualize documentation. For example, rust provides
cargo doc
which generates files documenting the modules, structs, functions, etc. based on the docstrings in the project. The generated documentation files can the be served and accessed as a web app. If your languages provides such a tool by default, or if a compatible tool is available, then use it!
- Although the phrase “code is self-documenting” is sometimes used to incorrectly reject the need for all in-line documentation, it is nonetheless a fact that clear design of deep modules, combined with descriptive class, function, and variable names, makes code much easier to understand. If some code seems to require documentation to explain how it works, consider whether it could be refactored instead.
- There are also cases where using superior technology makes self-documenting code much easier. For example, in a JavaScript project it’s important to write and read documentation about what arguments a function expects. On the other hand, when using typescript, it is preferred to create a Props type, with descriptive field names. The correct use of the function is then statically guaranteed. The end result is that typescript functions may tend to have much less documentation nearby.
- Another typescript example is the use of branded types. If a string is documented as a widget ID, either through variable name or a comment, instead the type of the variable can be set with
let id: WidgetID = “1234” as string & {__type: “widget”}
. This construct comes at no runtime cost, since there cannot exist at runtime a type that is both a string and an object. However, if a function exists with signature
getWidgetOrder(id1: WidgetID, id2: OrderID) -> void
then that function’s params cannot be mixed up. In Rust this could be done with newtypes ex.struct WidgetID(UUID)
. Similar to the TS example, this is a Zero-Cost Abstraction. In this way, type-driven development can be used to statically guarantee the correct use of a function which would otherwise need to be documented. - Another example of self-documenting code is test-driven development, in which the tests themselves constitute a form of documentation. Tests-as-documentation should be kept in mind when writing test names and failure messages, so that failing tests don’t just indicate that the code is broken, but also, to the extend possible, how it is broken, and how to fix it. Related to this is the messages given by the Rust compiler, which are extremely descriptive, and are often helpful in fixing the problem, giving rise to the phrase “compiler-driven development”. Contrast this with many other languages, in which compiler errors are often less than helpful.
- All of this is to say that documentation exists to explain what can’t be explained in code, but by selecting tooling and adopting practices that result in code needing to be explained less, then we are still adopting the main function of documentation: making it easier to use and reason about software. We also ensure that the documentation that we do write is less cluttered.
- Find a way to record discussions about and solutions to common bugs. While the obvious better solution to bugs is to eliminate their occurrence, individually and as a class, this is not always possible, or it may be that the required time would be prohibitive. In that case, it is desirable to ensure that conversations are available and easily searchable by the team. That is, conversations about bugs should not occur as DMs between two developers, since others will not be aware of and cannot benefit in the future from this conversation.
- Break down documentation into README.md, FAQs, ARCHITECTURE.md, CONTRIBUTING.md files, etc. Something should go in the README only if everyone should read it.