diff --git a/CONTRIBUTE.md b/CONTRIBUTE.md index d7e483511..3f43a2060 100644 --- a/CONTRIBUTE.md +++ b/CONTRIBUTE.md @@ -11,7 +11,7 @@ Everyone is welcome to contribute to Chronon. We value all forms of contribution - Test cases to make the codebase more robust - Tutorials, blog posts, talks that promote the project. - Functionality extensions, new features, etc. -- Optimizations +- Optimizations - Support for new aggregations and data types - Support for connectors to different storage systems and event buses @@ -22,11 +22,11 @@ In the interest of keeping Chronon a stable platform for users, some changes are - Changes that could break online fetching flows, including changing the timestamp watermarking or processing in the lambda architecture, or Serde logic. - Changes that would interfere with existing Airflow DAGs, for example changing the default schedule in a way that would cause breakage on recent versions of Airflow. -There are exceptions to these general rules, however, please be sure to follow the “major change” guidelines if you wish to make such a change. +There are exceptions to these general rules, however, please be sure to follow the “major change” guidelines if you wish to make such a change. ## General Development Process -Everyone in the community is welcome to send patches, documents, and propose new features to the project. +Everyone in the community is welcome to send patches, documents, and propose new features to the project. Code changes require a stamp of approval from Chronon contributors to be merged, as outlined in the project bylaws. @@ -38,9 +38,9 @@ The process for reporting bugs and requesting smaller features is also outlined Pull Requests (PRs) should follow these guidelines as much as possible: -**Code Guidelines** +### Code Guidelines -- Follow our (code style guidelines)[docs/source/Code_Guidelines.md] +- Follow our [code style guidelines](docs/source/Code_Guidelines.md) - Well scoped (avoid multiple unrelated changes in the same PR) - Code should be rebased on the latest version of the latest version of the master branch - All lint checks and test cases should pass @@ -56,18 +56,17 @@ Although these guidelines apply essentially to the PRs’ title and body message The rules below will help to achieve uniformity that has several benefits, both for review and for the code base maintenance as a whole, helping you to write commit messages with a good quality suitable for the Chronon project, allowing fast log searches, bisecting, and so on. -**PR title** +#### PR title -Guarantee a title exists -Don’t use Github usernames in the title, like @username (enforced); -Include tags as a hint about what component(s) of the code the PRs / commits “touch”. For example [BugFix], [CI], [Streaming], [Spark], etc. If more than one tag exist, multiple brackets should be used, like [BugFix][CI] +- Guarantee a title exists +- Don’t use Github usernames in the title, like @username (enforced) +- Include tags as a hint about what component(s) of the code the PRs / commits “touch”. For example [BugFix], [CI], [Streaming], [Spark], etc. If more than one tag exist, multiple brackets should be used, like [BugFix][CI] -**PR body** - -Guarantee a body exists -Include a simple and clear explanation of the purpose of the change -Include any relevant information about how it was tested +#### PR body +- Guarantee a body exists +- Include a simple and clear explanation of the purpose of the change +- Include any relevant information about how it was tested ## Release Guidelines @@ -83,8 +82,8 @@ Issues need to contain all relevant information based on the type of the issue. - Summary of what the user was trying to achieve - Sample data - Inputs, Expected Outputs (by the user) and Current Output - - Configuration - StagingQuery / GroupBy or Join -- Repro steps + - Configuration - StagingQuery / GroupBy or Join +- Repro steps - What commands were run and what was the full output of the command - PR guidelines - Includes a failing test case based on sample data @@ -92,14 +91,15 @@ Issues need to contain all relevant information based on the type of the issue. ### Crash Reports - Summary of what the user was trying to achieve - - Sample data - Inputs, Expected Outputs (by the user) - - Configuration - StagingQuery / GroupBy or Join -- Repro steps + - Sample data - Inputs, Expected Outputs (by the user) + - Configuration - StagingQuery / GroupBy or Join +- Repro steps - What commands were run and the output along with the error stack trace - PR guidelines - Includes a test case for the crash ## Feature requests and Optimization Requests + We expect the proposer to create a CHIP / Chronon Improvement Proposal document as detailed below # Chronon Improvement Proposal (CHIP) @@ -147,7 +147,6 @@ For the most part monitoring, command line tool changes, and configs are added w ## What should be included in a CHIP? - A CHIP should contain the following sections: - Motivation: describe the problem to be solved @@ -163,13 +162,13 @@ Anyone can initiate a CHIP but you shouldn't do it unless you have an intention ## Process Here is the process for making a CHIP: -1. Create a PR in chronon/proposals with a single markdown file.Take the next available CHIP number and create a file “CHIP-42 Monoid caching for online & real-time feature fetches”. This is the document that you will iterate on. -2. Fill in the sections as described above and file a PR. These proposal document PRs are reviewed by the committer who is on-call. They usually get merged once there is enough detail and clarity. + +1. Create a PR in chronon/proposals with a single markdown file.Take the next available CHIP number and create a file “CHIP-42 Monoid caching for online & real-time feature fetches”. This is the document that you will iterate on. +2. Fill in the sections as described above and file a PR. These proposal document PRs are reviewed by the committer who is on-call. They usually get merged once there is enough detail and clarity. 3. Start a [DISCUSS] issue on github. Please ensure that the subject of the thread is of the format [DISCUSS] CHIP-{your CHIP number} {your CHIP heading}. In the process of the discussion you may update the proposal. You should let people know the changes you are making. -4. Once the proposal is finalized, tag the issue with the “voting-due” label. These proposals are more serious than code changes and more serious even than release votes. In the weekly committee meetings we will vote for/against the CHIP - where Yes, Veto-no, Neutral are the choices. The criteria for acceptance is 3+ “yes” vote count by the members of the committee without a veto-no. Veto-no votes require in-depth technical justifications to be provided on the github issue +4. Once the proposal is finalized, tag the issue with the “voting-due” label. These proposals are more serious than code changes and more serious even than release votes. In the weekly committee meetings we will vote for/against the CHIP - where Yes, Veto-no, Neutral are the choices. The criteria for acceptance is 3+ “yes” vote count by the members of the committee without a veto-no. Veto-no votes require in-depth technical justifications to be provided on the github issue. 5. Please update the CHIP markdown doc to reflect the current stage of the CHIP after a vote. This acts as the permanent record indicating the result of the CHIP (e.g., Accepted or Rejected). Also report the result of the CHIP vote to the github issue thread. - It's not unusual for a CHIP proposal to take long discussions to be finalized. Below are some general suggestions on driving CHIP towards consensus. Notice that these are hints rather than rules. Contributors should make pragmatic decisions in accordance with individual situations. - The progress of a CHIP should not be long blocked on an unresponsive reviewer. A reviewer who blocks a CHIP with dissenting opinions should try to respond to the subsequent replies timely, or at least provide a reasonable estimated time to respond. @@ -180,40 +179,48 @@ It's not unusual for a CHIP proposal to take long discussions to be finalized. B # Resources Below is a list of resources that can be useful for development and debugging. -## Docs -(Docsite)[https://chronon.ai] -(doc directory)[https://github.com/airbnb/chronon/tree/main/docs/source] -(Code of conduct)[TODO] +## Docs -## Links: +[Docsite](https://chronon.ai)\ +[doc directory](https://github.com/airbnb/chronon/tree/main/docs/source)\ +[Code of conduct](TODO) -(pip project)[https://pypi.org/project/chronon-ai/] -(maven central)[https://mvnrepository.com/artifact/ai.chronon/]: (publishing)[https://github.com/airbnb/chronon/blob/main/devnotes.md#publishing-all-the-artifacts-of-chronon] -(Docsite: publishing)[https://github.com/airbnb/chronon/blob/main/devnotes.md#chronon-artifacts-publish-process] +## Links +[pip project](https://pypi.org/project/chronon-ai/)\ +[maven central](https://mvnrepository.com/artifact/ai.chronon/): [publishing](https://github.com/airbnb/chronon/blob/main/devnotes.md#publishing-all-the-artifacts-of-chronon)\ +[Docsite: publishing](https://github.com/airbnb/chronon/blob/main/devnotes.md#chronon-artifacts-publish-process) ## Code Pointers -Api - (Thrift)[https://github.com/airbnb/chronon/blob/main/api/thrift/api.thrift#L180], (Python)[https://github.com/airbnb/chronon/blob/main/api/py/ai/chronon/group_by.py] -(CLI driver entry point for job launching.)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/Driver.scala] - -**Offline flows that produce hive tables or file output** -(GroupBy)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/GroupBy.scala] -(Staging Query)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/StagingQuery.scala] -(Join backfills)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/Join.scala] -(Metadata Export)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/MetadataExporter.scala] -Online flows that update and read data & metadata from the kvStore -(GroupBy window tail upload )[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala] -(Streaming window head upload)[https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala] -(Fetching)[https://github.com/airbnb/chronon/blob/main/online/src/main/scala/ai/chronon/online/Fetcher.scala] -Aggregations -(time based aggregations)[https://github.com/airbnb/chronon/blob/main/aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala] -(time independent aggregations)[https://github.com/airbnb/chronon/blob/main/aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala] -(integration point with rest of chronon)[https://github.com/airbnb/chronon/blob/main/aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala#L223] -(Windowing)[https://github.com/airbnb/chronon/tree/main/aggregator/src/main/scala/ai/chronon/aggregator/windowing] - -**Testing** -(Testing - sbt commands)[https://github.com/airbnb/chronon/blob/main/devnotes.md#testing] -(Automated testing - circle-ci pipelines)[https://app.circleci.com/pipelines/github/airbnb/chronon] -(Dev Setup)[https://github.com/airbnb/chronon/blob/main/devnotes.md#prerequisites] +### API + +[Thrift](https://github.com/airbnb/chronon/blob/main/api/thrift/api.thrift#L180), [Python](https://github.com/airbnb/chronon/blob/main/api/py/ai/chronon/group_by.py)\ +[CLI driver entry point for job launching.](https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/Driver.scala) + +### Offline flows that produce hive tables or file output + +[GroupBy](https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/GroupBy.scala)\ +[Staging Query](https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/StagingQuery.scala)\ +[Join backfills](https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/Join.scala)\ +[Metadata Export](https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/MetadataExporter.scala) + +### Online flows that update and read data & metadata from the kvStore + +[GroupBy window tail upload](https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/GroupByUpload.scala)\ +[Streaming window head upload](https://github.com/airbnb/chronon/blob/main/spark/src/main/scala/ai/chronon/spark/streaming/GroupBy.scala)\ +[Fetching](https://github.com/airbnb/chronon/blob/main/online/src/main/scala/ai/chronon/online/Fetcher.scala) + +### Aggregations + +[time based aggregations](https://github.com/airbnb/chronon/blob/main/aggregator/src/main/scala/ai/chronon/aggregator/base/TimedAggregators.scala)\ +[time independent aggregations](https://github.com/airbnb/chronon/blob/main/aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala)\ +[integration point with rest of chronon](https://github.com/airbnb/chronon/blob/main/aggregator/src/main/scala/ai/chronon/aggregator/row/ColumnAggregator.scala#L223)\ +[Windowing](https://github.com/airbnb/chronon/tree/main/aggregator/src/main/scala/ai/chronon/aggregator/windowing) + +### Testing + +[Testing - sbt commands](https://github.com/airbnb/chronon/blob/main/devnotes.md#testing)\ +[Automated testing - circle-ci pipelines](https://app.circleci.com/pipelines/github/airbnb/chronon)\ +[Dev Setup](https://github.com/airbnb/chronon/blob/main/devnotes.md#prerequisites) diff --git a/README.md b/README.md index 8d4a1f11a..2d4b693c1 100644 --- a/README.md +++ b/README.md @@ -417,7 +417,7 @@ With Chronon you can use any data available in your organization, including ever # Contributing -We welcome contributions to the Chronon project! Please read our (CONTRIBUTING.md)[CONTRIBUTING.md] for details. +We welcome contributions to the Chronon project! Please read [CONTRIBUTE](CONTRIBUTE.md) for details. # Support