Improving Code Quality in Python Codebases
A living guide to improving code quality in Python codebases
NOTE: This is a living document.
Last Updated: March 03, 2021
Code formatting
Black
Black is an opinionated Python code formatter. Sit back and let Black do its thing. It’s also nice that…
Black makes code review faster by producing the smallest diffs possible (Black).
Black has extensive code editor and IDE support. See Black Editor Integration for how to integrate Black into your preferred editor or IDE.
Why use an opinionated code editor?
- Code uniformity across your codebase
- No time wasted bikeshedding. Code formatting is no longer the programmer’s responsibility.
- Set it and forget it. Add the code formatter to your editor/IDE and/or put it in a pre-commit hook and forget about it.
Code linting
Pylint
Pylint is a Python static code analysis tool which looks for programming errors, helps enforcing a coding standard, sniffs for code smells and offers simple refactoring suggestions (Pylint).
Pylint is a highly configurable tool, but you can get a lot of its benefits without ever configuring it.
Pylint has extensive code editor and IDE support. See Pylint Editor and IDE Integration for how to integrate Pylint into your preferred editor or IDE.
Why lint?
- Ship code that conforms to a standard and adheres to best practices
- Catch potential bugs
Type checking
Why using types?
- Types help programmers write more robust code (e.g. seeing a reduced number of NoneType errors in Python)
- Types provide programmers with very useful information to help them interpret code more quickly and accurately.
- The type checker plays the role of a refactoring companion, minimizing the risk of introducing regressions due to refactors that involve changing the types (implicit or explicit) of functions and variables.
- Types increase the visibility of domain objects and their structures through the use of type aliases and new types.
- Type hinting is part of the Python language’s syntax in Python 3.6+ as opposed to an external add-on.
- Any valid Python class is a valid type (except None, which is both a value and a type).
- Using types eliminate a whole class of possible runtime bugs through type checking and reduces the need for tests that essentially test for the types of inputs/outputs of functions/methods.
How to adopt types in an existing untyped codebase and stay sane
Introducing a types and a type checker into an existing codebase can get very noisy and prove more difficult than necessary if not done intentionally. The last thing you want is for programmers to spend more time than necessary fighting with the type checker. Here are some recommendations for leveraging the gradual typing capability of Python type checkers like MyPy.
- Isolate the use of the types and the type checker initially. Find a part of your codebase that is already self-contained and start there (e.g. a cloud lambda).
- Type hint and check new code, and back-type old code gradually. Type checkers like MyPy support configuration options that allow it to silence the noise from untyped code.
- Don’t type hint everything. Type checkers like MyPy also do type inference, so one only needs give it enough type hints for it to do its job. The rule of thumb is to type hint function/method definitions, and variables only as directed by the type checker.
- Disable type checking for third-party libraries that don’t have type stubs. The type checker will let you know which ones those are.
- Add a type checking step to your CI/CD pipeline. If you don’t, you haven’t really committed.
Choosing a type checker
There are multiple options, but MyPy is the most mature and is probably the easiest to start with.
Frequently used types
- Built-in types such as
None
,int
,float
,bool
,str
, etc typing.List
typing.Dict
typing.TypedDict
typing.Optional
— this is the workhorse for helping Python programmers eliminate errors due to not handling theNone
type in a disciplined way.
Escape hatches — because you’ll need them in an environment with gradual typing
- Use the
Any
type (but really try not to though) - Use the
typing.cast
function to help the type checker where necessary
# The possible types for this structure are:
# Dict[str, Dict[str, Any]] <- more specific type
# Dict[Any, Dict[Any, Any]] <- more general type
max_values = {
"companies": { "group_id": "", "company_count": 0 },
"employees": { "group_id": "", "employee_count": 0 }
}
# Given the max_values nested dictionary,
# you may want to compare the counts before updating the max values,
# but the type checker will complain because you're trying to
# compare an int (comp_count and emp_count) with an
# object of the Any type (the counts in the dictionary.company_count > max_values["companies"]["company_count"]
employee_count > max_values["employees"]["employee_count"]
# Since we, the programmer, know better, we can give the type
# checker some help.
# from typing import cast
comp_count > cast(int, max_values["companies"]["company_count"])
emp_count > cast(int, max_values["employees"]["employee_count"])
- Configure the type checker using its config file. Here’s a minimal example of a
mypy.ini
file.
[mypy]
no_implicit_optional = True
check_untyped_defs = True
[mypy-thirpartylib.*]
ignore_missing_imports = True
- Use in-line pragmas such as MyPy’s
# type: ignore
pragma to tell the type checker to ignore a particular line of codemypy_type_error('but the programmer knows better') # type: ignore
Some not-so-obvious typing situations
Typing *args and **kwargs
Testing
Pytest
… makes it easy to write small tests, yet scales to support complex functional testing for applications and libraries (Pytest).