Natural language linters are command line tools that analyze written human language and check if the language complies with standard or custom writing rules. This blog posts explains why natural language linters are useful and introduces two state-of-the-art natural language linting tools.
Introduction to linters
If you’re a programmer, you are most likely used to static code analysis tools (also known as linters), which automatically check some qualitative aspects of your code.
Here’s an example of a linting error for a disallowed word
Code linters are similar to the spelling and grammar checkers in text processing software (as in Microsoft Word, below):
The two language checkers differ from each other in two ways:
- The code linter analyzes formal language, such as a programming language, whereas the Microsoft Word feature analyzes human (natural) language.
- Most code linters use Unix-style open source technology that is plugged into an IDE or text editor. The analysis feature of Microsoft Word is proprietary and tied to Word and other Microsoft products.
A natural language linter has the technical properties of a code linter, but lints human language instead of program source code. Below you see a natural language linter that checks a semi-fictional job description for sensationalistic language.
You might ask yourself why people need tools like natural language linters if basic language-checking functionality comes with most text processing apps. The answers are: configurability and testability.
Natural language linters don’t only allow you to check text documents for compliance with a predefined set of rules; you can also define rules yourself. The snippet below is an (incomplete) excerpt of a configuration that makes the Vale linter check if a text complies with E-Prime, a subset of the English language that doesn’t use any form of to be:
extends: existence message: "As a form of 'to be', '%s' doesn't comply with E-Prime." ignorecase: true level: error tokens: - be - being - been - am - is - isn't - are # more tokens here…
For example, you can use custom rules to ensure that:
- you removed all occurrences of outdated product names,
- domain-specific terms are spelled correctly (it’s
- you use language consistently (
click the buttonvs
click on the button).
Most linters can run on any popular desktop or server operating system and provide machine-readable output. This makes them great tools for automated testing. Setting up a system that continuously checks your Markdown-based website content against your configured linter rules only requires standard test automation skills. Still, a detailed explanation on how to do this might be worth its own blog post ;-)
While natural language linters are still a niche phenomenon, a couple of them gained some traction during the last years. I had a closer look at two of them:
Vale is (at the time of writing) a relatively new tool and has recently gained some supporters in the technical writing community, most notably for its easy-to-use YML-based configuration files (as shown above). While Vale is and not attracting many contributors - the average techie seems to shy away from writing Go - its core contributor is constantly improving the tool and even makes use of advanced natural language processing techniques like sentence breaking. This lets me hope to see Vale evolve into a powerful tool for automated content quality checks.
Conclusion: bleeding edge technology for automated content quality checks
Natural language linters allow you to configure custom formalized writing rules for text documents and set up automated tests to ensure the documents actually comply with these rules. Natural language linters are still limited in their ability to conduct advanced text analysis. Nevertheless, they are already useful to enforce basic writing rules in document sets like web pages generated with static site generators or manuals that are maintained with a docs-as-code approach.