# Make sure you are a member of the khiops-python GitHub project and that you have set up the SSH keys
# for Git
# Clone the repo
git clone ssh://github.com/khiopsml/khiops-python
# Set up the commit message template
git config commit.template .git_commit_template
# Install pre-commit and the pre-commit scripts
pip install -U pre-commit
pre-commit install
# To run all tests
python -m unittest
# To run all test suites in a particular test file, for instance:
python -m unittest tests/test_core.py
# To run a particular test suite, for instance:
python -m unittest tests.test_sklearn.KhiopsSklearnParameterPassingTests
We code in English (comments included).
The coding style of the Khiops Python library is almost PEP8 compliant. To enforce these guidelines we use three tools:
We strongly recommend to automate the execution of these tools with pre-commit which setups Git
hooks that run before each git commit
. To create the hook scripts in your local repository copy
execute at the root of the repo:
pip install pre-commit
pre-commit install
The pre-commit
configuration is located at ./.pre-commit-config.yaml.
If either black
or isort
fail the commit will be aborted and you must execute a git add
on the
files corrected by these tools and then re-commit. As for pylint
, it is configured with a score
threshold so that only code with serious errors will abort the commit.
It is useful to manually run Black as the layout of the formatted code sometimes suggests
improvements. Note that Black does not solve all issues (eg. PEP8 variable names, docstring line
length), so you should check the issues reported by pylint
and address all errors (code E).
You can address other less serious issues reported by a full run of Pylint only if you have the time. The important point is not to be a slave of the linter.
Black will shorten the lines to 88 chars, except for long literal strings. Since we enforce the
line-length with pylint
you must fix this by hand or it will pass neither the pre-commit nor the
CI/CD workflows. To quickly find the lines to fix execute:
pylint --disable=all --enable=line-too-long
We encourage a programming style that increases readability at the cost of some verbosity. The idea is to create paragraphs of code, which consist in a header and body:
- The body is the code to be executed.
- The header is a comment describing what the code does at a high level of abstraction.
Each paragraph is separated by a single empty line, except when it starts an indented code snippet
(notably if
expressions) or when it follows a docstring that has the same indentation level.
Most of the code should go into paragraphs; exceptions are:
- return statements
- loop variable assignments (usually on
while
loops) - very short and obvious methods: the docstring suffices as header
Additionally, the number of paragraphs should be kept as small as possible, because technically commenting every line conforms to this style.
The advantages of this approach are:
- It allows to quickly skim through the code while understanding the big picture
- It forces to code coherent blocks of code
- It forces to understand what you are coding
and the disadvantages:
- Comments need maintenance, especially when refactoring
- Sometimes single lines of obvious code must be commented
def value_count(values):
"""Prints the counts of each unique value in an array"""
# Initialize the counts dictionary
counts = {}
# Count the unique occurrences in values
for value in values:
# If the value count exists: update it
if value in counts:
counts[value] += 1
# Otherwise create a new count of 1
else:
counts[value] = 1
# Print the counts
for value, count in counts.items():
print(f"{value}: {count}")
Note that the first paragraph could be fused into the second. These kinds of decisions are rather subjective and code review may help to resolve them.
See the documentation practices and tools here.
The commit history must be the cleanest possible
The
dev
branch must pass all tests
The commit message title should answer the question What do these changes do? Respect the following format when writing it:
- use English in imperative form (eg. "Add new feature", "Fix old bug")
- start with upper-case
- respect the git commit message length if possible
- put no periods at the end and avoid any other punctuation
- do not use Markdown
The optional commit description should answer the questions How are these changes made? and Why are these changes made?. Use preferably bullet points to write this part. The description can eventually answer more thoroughly What do these changes do? but try create commits that are "atomic" enough so that their title suffices to answer this question.
Rewriting commit history is only allowed in feature branches and dev
. Rewriting the history in
dev
is very limited: it should be only done to serious problems in the history. It is forbidden to
rebase before the last merge commit.
A usual application of history rewriting is to eliminate commits named "Fix previous commit". In
this case use rebase
to rewrite the history as follows:
git rebase -i <ROOT-REF> # Usually ROOT-REF is either "dev", "HEAD~2" or a specific hash
to interactively move and squash fix commits with the fixup
or squash
operator. Usually this
operation will make your feature branch diverge from origin
(GitHub repo), so a git push --force
is necessary to update it.
More generally, rewriting allows to obtain a commit history that is the cleanest possible. See Git Rewriting History for more details.
Warning: Do not rebase on a commit before the branching point of your feature branch as it may screw up your local repository by erasing merge commits.
We use feature branches that branch/merge from/to dev
. The main
branch is only used for final
releases.
The rules for the branches are:
- Feature branches: You can commit and rewrite the branch at will. It should pass the CI/CD workflows
before merging to
dev
. Note that only short tests are run in feature branches. dev
branch: You should avoid directly commit and rewrite this branch (and only maintainers can do it). This branch may not succeed all the CI/CD workflows since short and long tests are run. You may release beta versions from this branch.main
branch: Only merges fromdev
are accepted. All CI/CD workflows must succeed (this includes the long tests).
As for the use of feature branches, GitHub makes it easy to implement the following workflow:
- Create an issue for the particular feature/fix/etc.
- Create a branch for the issue by clicking the "Create a branch" link in the issue's right column.
- Develop, commit changes locally. Don't hesitate to rewrite the branch history if it is clearer.
This will create a feature branch from
dev
with a name starting with the issue number followed by the issue title. - Once you have some commits (
WIP
s or otherwise), you may push your branch and create a pull request (PR) for the issue. - pull
dev
and rebase on it if there are new changes (see below) - Push your feature branch to GitHub (if necessary, use
push --force
to rewrite the feature branch history) and ensure that it passes the CI/CD jobs. - Ask the PR be reviewed by another team member:
- The reviewer creates comment threads with the issues.
- Discuss the issues and/or make fixes to your code to address them; push the new code to GitHub.
- The reviewer closes the thread once the issue is settled.
- Merge once all review threads are closed and the reviewer LGTM-ed the PR.
Use the automatic branch names that GitHub creates from an issue. Name issues succinctly so the branches have short names. If you name branches by yourself try informative and succinct.
This project's GitHub repository is configured to create merge commits for PR and to remove the feature branches by default after merging. This strategy has the following pros and cons:
Pros:
- The merges of feature branches are tracked in the history.
- The repository branch set is small and clean with no stale ones.
Cons:
- Extra commits in history.
- Non-linear
dev
andmain
history.
It may happen that the dev
branch was updated while you are developing your feature branch.
To avoid extra merge commits do the following to update your local copy of your feature branch:
# on my-feat-branch
git stash # only when you have non-committed changes
git switch dev
git pull
git switch my-feat-branch
git rebase dev
# fix any conflicts you may have
git stash pop # only when you have non-committed changes
We should strive to minimize external package dependencies to minimize installation problems. The current dependency policy is:
khiops.core
should only depend on python built-in modules.khiops.sklearn
should only depend on python built-in modules and the following mainstream data-science packages:
Note that these 4 packages already have a sizable number of dependencies. We should discuss thoroughly the pros and cons of any new external package dependency before adding it.
For development dependencies (eg. black
, isort
, sphinx
, wrapt
, furo
) we can be more
carefree while still trying to not add too many dependencies.
Our Python support policy is the following:
- We start supporting a new Python version within 3 months of its official release.
- We support an old Python version up to 6 months from its official end of support by the Python developers.
By supporting a Python version we mean that we pass all our test batteries in the specified version.
We follow a non-standard MAJOR.MINOR.PATCH.INCREMENT[-PRE_RELEASE]
versioning convention. The
first three numbers MAJOR.MINOR.PATCH
are the latest Khiops version that is compatible with the
package. The number INCREMENT
indicates the evolution of the Khiops Python library followed by an
optional [-PRE_RELEASE]
version for alpha, beta and release candidate releases (eg. b.2
).
Important Note: For the versions of packages we remove the dash -
before the PRE_RELEASE
atom to be compatible with the Python version specifiers, which is required
by pip
and conda
.
When tagging a revision the CI will create the packages and upload them to the khiops-dev
channel.
Prefer to augment the pre-release revision number to re-create a tag because the CI overwrites
packages with the same tag in the khiops-dev
channel. Do not forget to clean any temporary
pre-releases from khiops-dev
and the releases GitHub page.
Checklist:
- Release issue and its related PR
- Update the API Docs if necessary
- Update
CHANGELOG.md
- Update the default
khiops-core
version in [.github/workflows/conda.yml] - Update the default value for
version
in thedownload_datasets
function in [khiops/tools.py]
- Git manipulations
- Update your local repo and save your work:
git stash # if necessary
git fetch --tags --prune --prune-tags
git switch dev
git pull
git switch main
git pull
- Merge the
dev
branch intomain
git switch main
git merge dev
- Tag the merge commit with the release version (see Versioning above)
git switch main
git tag 10.3.0.1 # Just an example
- Make
dev
point to the merge commit just created inmain
- This is necessary to include the merge commit into master to calculate intermediary versions with Versioneer.
- Steps:
git switch dev
git reset --hard main
git push dev
(you need to remove the protections ofdev
for this step)
- Update your local repo and save your work:
- Workflows
- Execute the
Conda Package
workflow specifying:- The release tag
khiops
as the release channel
- Execute the
API Docs
workflow specifying "Deploy GH Pages".
- Execute the
To make a public release, you must execute the Conda Package
CI workflow manually on a tag and
specify the khiops
anaconda channel for upload. These uploads do not overwrite any packages in
this channel, so you must correct any mistake manually.