-
Notifications
You must be signed in to change notification settings - Fork 920
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Create main developer guide for Python (#11235)
This PR adds a primary developer guide for Python. It provides a more complete and informative landing page for new developers. When #11217, #11199, and #11122 are merged, they will all be linked from this page to provide a complete set of developer documentation. There is one main point of discussion that I would like reviewer comments on, and that is the section on directory and file organization. How do we want that aspect of cuDF to look? Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Lawrence Mitchell (https://github.com/wence-) - Ashwin Srinath (https://github.com/shwina) URL: #11235
- Loading branch information
Showing
5 changed files
with
140 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
# Contributing Guide | ||
|
||
This document focuses on a high-level overview of best practices in cuDF. | ||
|
||
## Directory structure and file naming | ||
|
||
cuDF generally presents the same importable modules and subpackages as pandas. | ||
All Cython code is contained in `python/cudf/cudf/_lib`. | ||
|
||
## Code style | ||
|
||
cuDF employs a number of linters to ensure consistent style across the code base. | ||
We manage our linters using [`pre-commit`](https://pre-commit.com/). | ||
Developers are strongly recommended to set up `pre-commit` prior to any development. | ||
The `.pre-commit-config.yaml` file at the root of the repo is the primary source of truth linting. | ||
Specifically, cuDF uses the following tools: | ||
|
||
- [`flake8`](https://github.com/pycqa/flake8) checks for general code formatting compliance. | ||
- [`black`](https://github.com/psf/black) is an automatic code formatter. | ||
- [`isort`](https://pycqa.github.io/isort/) ensures imports are sorted consistently. | ||
- [`mypy`](http://mypy-lang.org/) performs static type checking. | ||
In conjunction with [type hints](https://docs.python.org/3/library/typing.html), | ||
`mypy` can help catch various bugs that are otherwise difficult to find. | ||
- [`pydocstyle`](https://github.com/PyCQA/pydocstyle/) lints docstring style. | ||
|
||
Linter config data is stored in a number of files. | ||
We generally use `pyproject.toml` over `setup.cfg` and avoid project-specific files (e.g. `setup.cfg` > `python/cudf/setup.cfg`). | ||
However, differences between tools and the different packages in the repo result in the following caveats: | ||
|
||
- `flake8` has no plans to support `pyproject.toml`, so it must live in `setup.cfg`. | ||
- `isort` must be configured per project to set which project is the "first party" project. | ||
|
||
Additionally, our use of `versioneer` means that each project must have a `setup.cfg`. | ||
As a result, we currently maintain both root and project-level `pyproject.toml` and `setup.cfg` files. | ||
|
||
For more information on how to use pre-commit hooks, see the code formatting section of the | ||
[overall contributing guide](https://github.com/rapidsai/cudf/blob/main/CONTRIBUTING.md#python--pre-commit-hooks). | ||
|
||
## Deprecating and removing code | ||
|
||
cuDF follows the policy of deprecating code for one release prior to removal. | ||
For example, if we decide to remove an API during the 22.08 release cycle, | ||
it will be marked as deprecated in the 22.08 release and removed in the 22.10 release. | ||
All internal usage of deprecated APIs in cuDF should be removed when the API is deprecated. | ||
This prevents users from encountering unexpected deprecation warnings when using other (non-deprecated) APIs. | ||
The documentation for the API should also be updated to reflect its deprecation. | ||
When the time comes to remove a deprecated API, make sure to remove all tests and documentation. | ||
|
||
Deprecation messages should: | ||
- emit a FutureWarning; | ||
- consist of a single line with no newline characters; | ||
- indicate replacement APIs, if any exist | ||
(deprecation messages are an opportunity to show users better ways to do things); | ||
- not specify a version when removal will occur (this gives us more flexibility). | ||
|
||
For example: | ||
```python | ||
warnings.warn( | ||
"`Series.foo` is deprecated and will be removed in a future version of cudf. " | ||
"Use `Series.new_foo` instead.", | ||
FutureWarning | ||
) | ||
``` | ||
|
||
```{warning} | ||
Deprecations should be signaled using a `FutureWarning` **not a `DeprecationWarning`**! | ||
`DeprecationWarning` is hidden by default except in code run in the `__main__` module. | ||
``` | ||
|
||
## `pandas` compatibility | ||
|
||
Maintaining compatibility with the [pandas API](https://pandas.pydata.org/docs/reference/index.html) is a primary goal of cuDF. | ||
Developers should always look at pandas APIs when adding a new feature to cuDF. | ||
When introducing a new cuDF API with a pandas analog, we should match pandas as much as possible. | ||
Since we try to maintain compatibility even with various edge cases (such as null handling), | ||
new pandas releases sometimes require changes that break compatibility with old versions. | ||
As a result, our compatibility target is the latest pandas version. | ||
|
||
However, there are occasionally good reasons to deviate from pandas behavior. | ||
The most common reasons center around performance. | ||
Some APIs cannot match pandas behavior exactly without incurring exorbitant runtime costs. | ||
Others may require using additional memory, which is always at a premium in GPU workflows. | ||
If you are developing a feature and believe that perfect pandas compatibility is infeasible or undesirable, | ||
you should consult with other members of the team to assess how to proceed. | ||
|
||
When such a deviation from pandas behavior is necessary, it should be documented. | ||
For more information on how to do that, see [our documentation on pandas comparison](./documentation.md#comparing-to-pandas). | ||
|
||
## Python vs Cython | ||
|
||
cuDF makes substantial use of [Cython](https://cython.org/). | ||
Cython is a powerful tool, but it is less user-friendly than pure Python. | ||
It is also more difficult to debug or profile. | ||
Therefore, developers should generally prefer Python code over Cython where possible. | ||
|
||
The primary use-case for Cython in cuDF is to expose libcudf C++ APIs to Python. | ||
This Cython usage is generally composed of two parts: | ||
1. A `pxd` file declaring C++ APIs so that they may be used in Cython, and | ||
2. A `pyx` file containing Cython functions that wrap those C++ APIs so that they can be called from Python. | ||
|
||
The latter wrappers should generally be kept as thin as possible to minimize Cython usage. | ||
For more information see [our Cython layer design documentation](./library_design.md#the-cython-layer). | ||
|
||
In some rare cases we may actually benefit from writing pure Cython code to speed up particular code paths. | ||
Given that most numerical computations in cuDF actually happen in libcudf, however, | ||
such use cases are quite rare. | ||
Any attempt to write pure Cython code for this purpose should be justified with benchmarks. | ||
|
||
## Exception handling | ||
|
||
This section is under development, see https://github.com/rapidsai/cudf/pull/7917. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,26 @@ | ||
# Developer Guide | ||
|
||
```{note} | ||
At present, this guide only covers the main cuDF library. | ||
In the future, it may be expanded to also cover dask_cudf, cudf_kafka, and custreamz. | ||
``` | ||
|
||
cuDF is a GPU-accelerated, [Pandas-like](https://pandas.pydata.org/) DataFrame library. | ||
Under the hood, all of cuDF's functionality relies on the CUDA-accelerated `libcudf` C++ library. | ||
Thus, cuDF's internals are designed to efficiently and robustly map pandas APIs to `libcudf` functions. | ||
For more information about the `libcudf` library, a good starting point is the | ||
[developer guide](https://github.com/rapidsai/cudf/blob/main/cpp/docs/DEVELOPER_GUIDE.md). | ||
|
||
This document assumes familiarity with the | ||
[overall contributing guide](https://github.com/rapidsai/cudf/blob/main/CONTRIBUTING.md). | ||
The goal of this document is to provide more specific guidance for Python developers. | ||
It covers the structure of the Python code and discusses best practices. | ||
Additionally, it includes longer sections on more specific topics like testing and benchmarking. | ||
|
||
```{toctree} | ||
:maxdepth: 2 | ||
library_design | ||
documentation | ||
options | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters