Choosing the Structure of a Python Package

September 30, 2021

Overview

Correctly structuring a Python package is an important way to ensure that your code functions as intended and is easy to maintain. There are two common directory layouts used to structure packages in Python, the flat layout and the src layout. Either layout affects how packages are installed and imported.

Package Installations

In Python, it is common practice to install dependencies inside a virtual environment. Virtual environments help isolate package installations from the system and other virtual environments, allowing you to maintain different dependencies from your system and between projects.

When a virtual environment is created and activated, dependencies are installed within .venv/Lib/site-packages (assuming you named the environment .venv). Furthermore, when a package is installed from a distribution, the installer will, in addition to the modules, install a .dist-info directory within the site-packages directory. This can be seen as follows:

.
└── /.venv
    ├── Include
    ├── /Lib
       └── /site-packages
           ├── /package
           └── /package-0.0.1.dist-info
    ├── /Scripts
       └── ...
    └── pyvenv.cfg

Package Imports

Once a package has been installed, you will want to import that package and use it in your code. When the Python interpreter executes an import statement, it searches for that module in a list of directories assembled by sys.path from the following sources:

  • the current (root) directory in which the script was run from
  • directories listed in your PYTHONPATH environment variable
  • installation-dependent default directories configured when Python is installed

With this high-level understanding of how package installations and imports work, we will now look at the differences between the flat and src layouts.

The Flat Layout

A minimal example of a package using the flat layout:

.
├── /package
   ├── __init__.py
   └── ...
├── /tests
   ├── __init__.py
   └── ...
└── pyproject.toml

The main problem with using the flat layout is that the current (root) directory is implicitly included in sys.path, and, therefore, so is the package itself. When package imports are resolved, the python interpreter imports the package from the current (root) directory and not from site-packages inside the virtual environment.

This results in non-deterministic development, for example:

  • local changes to your package may or may not be built and work as expected when your package is installed
  • local tests run against the package from the current (root) directory and not the installed package from site-packages
    • this prevents testing the code that people will be using (ie: the installed package) and whether the built distribution works or not

Furthermore, using the flat layout has no separation between code and non-code files. This will pollute your package installation by making any configuration or test files importable.

In summary, users of your package will never have the same working directory as you and therefore development and testing should be against the installed package that these users would install and not the package from your current (root) directory.

The Src Layout

A minimal example of a package using the src layout:

.
├── /src
   └── /package
       ├── __init__.py
       └── ...
├── /tests
   ├── __init__.py
   └── ...
└── pyproject.toml

The main benefit of using the src layout is that your package is seperated from the current (root) directory and therefore is not implicitly included in sys.path. Thus, when developing locally, you are forced to be explicit about how your package is installed and used.

Benefits associated with using the src layout:

  • makes navigating around a repository intuitive knowing that all package logic is contained within the src directory
  • ensures you are explicit about making your package accessible to sys.path
  • import parity
    • the package you develop and test against is the same package that users will install
  • separates code and non-code files

Troubleshooting the Src Layout

When using the src layout, you may receive errors that modules within your package don't exist or are not importable. This is because your package is no longer located in the current (root) directory and is therefore no longer implicitly added to sys.path.

Therefore, you are forced to explicitly add your package to sys.path, either by installing your package locally in editable mode or by appending the PYTHONPATH environment variable with the path to your package.

The directory name src should never appear in the import statements within your package (ie: don't reference a module import with src.package.module)

Editable Mode

It is common to locally install your package when working on it, allowing the package to be both installed and editable. To install your package in editable mode (from within the project root directory):

python -m pip install -e ./src/

If the package is installed in editable mode, it will install a path configuration .pth file that contains one line: the path to the package (think of this as a symlink). These .pth files are installed into site-packages (which is already on Python's path), and, is therefore appended to sys.path at runtime.

Given that there is an explicit link between the package installed in your virtual environment and the package you are developing locally, changes to any modules inside your local package will automatically be reflected in the package installed in your virtual environment.

The Python Packaging Authority

For greater objectiveness, the Python Packaging Authority (PyPA) recommends the following:

  • single module packages should use the flat layout (for example, scripts)
  • anything beyond a single module should use the src layout