Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 3 additions & 68 deletions docs/source/contributor-guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,15 @@ You can find how to setup build and testing environment [here](https://datafusio
## Finding and Creating Issues to Work On

You can find a curated [good-first-issue] list to help you get started.
You can read about how we plan larger projects in the [Roadmap and Improvement Proposals](roadmap.md) section.

[good-first-issue]: https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22

### Open Contribution and Assigning tickets

DataFusion is an open contribution project, and thus there is no particular
project imposed deadline for completing any issue or any restriction on who can
work on an issue, nor how many people can work on an issue at the same time.
project imposed deadline for completing issues or restrictions on who can
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by cleanup

work on an issue, nor limits to how many people can work on an issue at the same time.

Contributors drive the project forward based on their own priorities and
interests and thus you are free to work on any issue that interests you.
Expand All @@ -62,72 +63,6 @@ unable to make progress you should unassign the issue by using the `unassign me`
link at the top of the issue page (and ask for help if are stuck) so that
someone else can get involved in the work.

### Discussing New Features
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved this content out from a sub bullet and into a more discoverable place on Roadmap


If you plan to work on a new feature that doesn't have an existing ticket, it is
a good idea to open a ticket to discuss the feature. Advanced discussion often
helps avoid wasted effort by determining early if the feature is a good fit for
DataFusion before too much time is invested. Discussion on a ticket can help
gather feedback from the community and is likely easier to discuss than a 1000
line PR.

If you open a ticket and it doesn't get any response, you can try `@`-mentioning
recently active community members in the ticket to get their attention.

### What Contributions are Good Fits?

DataFusion is designed to be highly extensible, and many features can be
implemented as extensions without changes or additions to the core. Support for
new functions, data formats, and similar functionality can be added using those
extension APIs, and there are already many existing community supported
extensions listed in the [extensions list].

Query engines are complex pieces of software to develop and maintain. Given our
limited maintenance bandwidth, we try to keep the DataFusion core as simple and
focused as possible, while still satisfying the [design goal] of an easy to
start initial experience.

With that in mind, contributions that meet the following criteria are more likely
to be accepted:

1. Bug fixes for existing features
2. Test coverage for existing features
3. Documentation improvements / examples
4. Performance improvements to existing features (with benchmarks)
5. "Small" functional improvements to existing features (if they don't change existing behavior)
6. Additional APIs for extending DataFusion's capabilities
7. CI improvements

Contributions that will likely involve more discussion (see Discussing New
Features above) prior to acceptance include:

1. Major new functionality (even if it is part of the "standard SQL")
2. New functions, especially if they aren't part of "standard SQL"
3. New data sources (e.g. support for Apache ORC)

[extensions list]: ../library-user-guide/extensions.md
[design goal]: https://docs.rs/datafusion/latest/datafusion/index.html#design-goals

### Design Build vs. Big Up Front Design

Typically, the DataFusion community attacks large problems by solving them bit
by bit and refining a solution iteratively on the `main` branch as a series of
Pull Requests. This is different from projects which front-load the effort
with a more comprehensive design process.

By "advancing the front" the community always makes tangible progress, and the strategy is
especially effective in a project that relies on individual contributors who may
not have the time or resources to invest in a large upfront design effort.
However, this "bit by bit approach" doesn't always succeed, and sometimes we get
stuck or go down the wrong path and then change directions.

Our process necessarily results in imperfect solutions being the "state of the
code" in some cases, and larger visions are not yet fully realized. However, the
community is good at driving things to completion in the long run. If you see
something that needs improvement or an area that is not yet fully realized,
please consider submitting an issue or PR to improve it. We are always looking
for more contributions.

# Developer's guide

## Pull Request Overview
Expand Down
72 changes: 70 additions & 2 deletions docs/source/contributor-guide/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ specific language governing permissions and limitations
under the License.
-->

# Roadmap
# Roadmap and Improvement Proposals

The [project introduction](../user-guide/introduction) explains the
overview and goals of DataFusion, and our development efforts largely
Expand All @@ -44,7 +44,7 @@ make review efficient and avoid surprises.

[The current list of `EPIC`s can be found here](https://github.com/apache/datafusion/issues?q=is%3Aissue+is%3Aopen+epic).

# Quarterly Roadmap
## Quarterly Roadmap

The DataFusion roadmap is driven by the priorities of contributors rather than
any single organization or coordinating committee. We typically discuss our
Expand All @@ -56,3 +56,71 @@ For more information:
1. [Search for issues labeled `roadmap`](https://github.com/apache/datafusion/issues?q=is%3Aissue%20%20%20roadmap)
2. [DataFusion Road Map: Q3-Q4 2025](https://github.com/apache/datafusion/issues/15878)
3. [2024 Q4 / 2025 Q1 Roadmap](https://github.com/apache/datafusion/issues/13274)

## Improvement Proposals

### Discussing New Features

If you plan to work on a new feature that doesn't have an existing ticket, it is
a good idea to open a ticket to discuss the feature. Advanced discussion often
helps avoid wasted effort by determining early if the feature is a good fit for
DataFusion before too much time is invested. Discussion on a ticket can help
gather feedback from the community and is likely easier to discuss than a 1000
line PR.

If you open a ticket and it doesn't get any response, you can try `@`-mentioning
recently active community members in the ticket to get their attention.

### What Contributions are Good Fits?

DataFusion is designed to be highly extensible, and many features can be
implemented as extensions without changes or additions to the core. Support for
new functions, data formats, and similar functionality can be added using those
extension APIs, and there are already many existing community supported
extensions listed in the [extensions list].

Query engines are complex pieces of software to develop and maintain. Given our
limited maintenance bandwidth, we try to keep the DataFusion core as simple and
focused as possible, while still satisfying the [design goal] of an easy to
start initial experience.

With that in mind, contributions that meet the following criteria are more likely
to be accepted:

1. Bug fixes for existing features
2. Test coverage for existing features
3. Documentation improvements / examples
4. Performance improvements to existing features (with benchmarks)
5. "Small" functional improvements to existing features (if they don't change existing behavior)
6. Additional APIs for extending DataFusion's capabilities
7. CI improvements

Contributions that will likely involve more discussion (see Discussing New
Features above) prior to acceptance include:

1. Major new functionality (even if it is part of the "standard SQL")
2. New functions, especially if they aren't part of "standard SQL"
3. New data sources (e.g. support for Apache ORC)

[extensions list]: ../library-user-guide/extensions.md
[design goal]: https://docs.rs/datafusion/latest/datafusion/index.html#design-goals

### Design Build vs. Big Up Front Design

Typically, the DataFusion community attacks large problems by solving them bit
by bit and refining a solution iteratively on the `main` branch as a series of
Pull Requests. This is different from projects which front-load the effort
with a more comprehensive design process.

By "advancing the front" the community always makes tangible progress, and the strategy is
especially effective in a project that relies on individual contributors who may
not have the time or resources to invest in a large upfront design effort.
However, this "bit by bit approach" doesn't always succeed, and sometimes we get
stuck or go down the wrong path and then change directions.

Our process necessarily results in imperfect solutions being the "state of the
code" in some cases, and larger visions are not yet fully realized. However, the
community is good at driving things to completion in the long run. If you see
something that needs improvement or an area that is not yet fully realized,
please consider submitting an issue or PR to improve it. We are always looking
for more contributions.