Choosing the Structure of a Python Package
September 30, 2021
Overview
Correctly structuring a Python package is an important way to ensure that your code functions as intended and is easy to maintain. There are two common directory layouts used to structure packages in Python, the flat layout and the src layout. Either layout affects how packages are installed and imported.
Package Installations
In Python, it is common practice to install dependencies inside a virtual environment. Virtual environments help isolate package installations from the system and other virtual environments, allowing you to maintain different dependencies from your system and between projects.
When a virtual environment is created and activated, dependencies are installed within .venv/Lib/site-packages
(assuming you named the environment .venv
). Furthermore, when a package is installed from a distribution, the installer will, in addition to the modules, install a .dist-info
directory within the site-packages directory. This can be seen as follows:
.
└── /.venv
├── Include
├── /Lib
│ └── /site-packages
│ ├── /package
│ └── /package-0.0.1.dist-info
├── /Scripts
│ └── ...
└── pyvenv.cfg
Package Imports
Once a package has been installed, you will want to import that package and use it in your code. When the Python interpreter executes an import statement, it searches for that module in a list of directories assembled by sys.path
from the following sources:
- the current (root) directory in which the script was run from
- directories listed in your
PYTHONPATH
environment variable - installation-dependent default directories configured when Python is installed
With this high-level understanding of how package installations and imports work, we will now look at the differences between the flat and src layouts.
The Flat Layout
A minimal example of a package using the flat layout:
.
├── /package
│ ├── __init__.py
│ └── ...
├── /tests
│ ├── __init__.py
│ └── ...
└── pyproject.toml
The main problem with using the flat layout is that the current (root) directory is implicitly included in sys.path
, and, therefore, so is the package itself. When package imports are resolved, the python interpreter imports the package from the current (root) directory and not from site-packages
inside the virtual environment.
This results in non-deterministic development, for example:
- local changes to your package may or may not be built and work as expected when your package is installed
- local tests run against the package from the current (root) directory and not the installed package from
site-packages
- this prevents testing the code that people will be using (ie: the installed package) and whether the built distribution works or not
Furthermore, using the flat layout has no separation between code and non-code files. This will pollute your package installation by making any configuration or test files importable.
In summary, users of your package will never have the same working directory as you and therefore development and testing should be against the installed package that these users would install and not the package from your current (root) directory.
The Src Layout
A minimal example of a package using the src layout:
.
├── /src
│ └── /package
│ ├── __init__.py
│ └── ...
├── /tests
│ ├── __init__.py
│ └── ...
└── pyproject.toml
The main benefit of using the src layout is that your package is seperated from the current (root) directory and therefore is not implicitly included in sys.path
. Thus, when developing locally, you are forced to be explicit about how your package is installed and used.
Benefits associated with using the src layout:
- makes navigating around a repository intuitive knowing that all package logic is contained within the
src
directory - ensures you are explicit about making your package accessible to
sys.path
- import parity
- the package you develop and test against is the same package that users will install
- separates code and non-code files
Troubleshooting the Src Layout
When using the src layout, you may receive errors that modules within your package don't exist or are not importable. This is because your package is no longer located in the current (root) directory and is therefore no longer implicitly added to sys.path.
Therefore, you are forced to explicitly add your package to sys.path
, either by installing your package locally in editable mode or by appending
the PYTHONPATH
environment variable with the path to your package.
The directory name src
should never appear in the import statements within your package (ie: don't reference a module import with src.package.module
)
Editable Mode
It is common to locally install your package when working on it, allowing the package to be both installed and editable. To install your package in editable mode (from within the project root directory):
python -m pip install -e ./src/
If the package is installed in editable mode, it will install a path configuration .pth
file that contains one line: the path to the package (think of this as a symlink). These .pth
files are installed into site-packages
(which is already on Python's path), and, is therefore appended to sys.path
at runtime.
Given that there is an explicit link between the package installed in your virtual environment and the package you are developing locally, changes to any modules inside your local package will automatically be reflected in the package installed in your virtual environment.
The Python Packaging Authority
For greater objectiveness, the Python Packaging Authority (PyPA) recommends the following:
- single module packages should use the flat layout (for example, scripts)
- anything beyond a single module should use the src layout