Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs to include SparkStreamingDataSet and ManagedTableDataSet #2679

Merged
merged 25 commits into from
Jul 3, 2023

Conversation

noklam
Copy link
Contributor

@noklam noklam commented Jun 13, 2023

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

Development notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

@noklam noklam marked this pull request as ready for review June 14, 2023 11:01
@noklam noklam requested review from AhdraMeraliQB and removed request for idanov and yetudada June 14, 2023 11:01
@noklam
Copy link
Contributor Author

noklam commented Jun 14, 2023

As usual, never underestimate the time to fix an indentation error with Sphinx, but here we are! Big thanks to @AhdraMeraliQB !

@noklam noklam enabled auto-merge (squash) June 14, 2023 11:02
@noklam noklam disabled auto-merge June 14, 2023 11:02
@noklam noklam enabled auto-merge (squash) June 14, 2023 11:02
Copy link
Contributor

@stichbury stichbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🏆 and yes, Sphinx errors :trollface:

@noklam
Copy link
Contributor Author

noklam commented Jun 14, 2023

It's very hard to debug since the build keep failing, I add a pip list so it's easier to know if it is installing the correction version of kedro-datasets

@noklam
Copy link
Contributor Author

noklam commented Jun 14, 2023

We need to wait until the next kedro-datasets release.

@noklam noklam marked this pull request as draft June 14, 2023 16:05
auto-merge was automatically disabled June 14, 2023 16:05

Pull request was converted to draft

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved when/if the RTD build passes 😄

@astrojuanlu
Copy link
Member

Of course:

sphinx.errors.SphinxWarning: /home/docs/checkouts/readthedocs.org/user_builds/kedro/envs/2679/lib/python3.8/site-packages/kedro_datasets/databricks/managed_table_dataset.py:docstring of kedro_datasets.databricks.managed_table_dataset.ManagedTableDataSet:30:Unexpected indentation.

@stichbury
Copy link
Contributor

@jmholzer @noklam I'm afraid I probably need your help to get this resolved, please. Sphinx is emitting the following:

sphinx.errors.SphinxWarning: /home/docs/checkouts/readthedocs.org/user_builds/kedro/envs/2679/lib/python3.8/site-packages/kedro_datasets/databricks/managed_table_dataset.py:docstring of kedro_datasets.databricks.managed_table_dataset.ManagedTableDataSet:30:Unexpected indentation.

And I can't see a docstring at that location in the latest version of the databricks dataset.

I want to get these docs built into our set because we have a blog post to publish that should point to them. Even if they don't get into release 0.18.11 it would be sensible to have them on latest, which needs a kedro-datasets release to fix any problematic docstrings.

Please could you take a look at the code and see what is causing Sphinx to fail and maybe separately we can review how to avoid this in future by running a suitable linter over each new dataset that we add?

jmholzer added 6 commits June 30, 2023 14:14
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
…dro into fix-spark-streaming-dataset

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
@stichbury
Copy link
Contributor

Thank you @noklam and @jmholzer for getting this building!

@noklam
Copy link
Contributor Author

noklam commented Jun 30, 2023

@stichbury Not sure where is the best place to fit this documentation. I can go through the doc build process with you.

  1. The build process only happens when kedro have a new release
  2. kedro will pull latest kedro-datasets, which means any change in kedro-datasets will require a new release in kedro-datasets before it shows up in our docs.

How do you fix Dataset docs?

Kedro's GitPod is the best place to test it. In our setup.py, it points to the latest kedro-datasets. When you need to test changes on a feature branch, you need to update the reference to point to your kedro-datasets feature branch.

  1. You can either open a PR on kedro or just start GitPod and do make build-docs to check if documentation is build successfully. If you have a local setup you can also use your local environment, but I found GitPod convenient for these short-lived isolated environment.

  2. Update the reference to point to your feature branch

  3. Add new datasets in the relevant .rst file, usually docs/source/kedro_datasets.rst

  4. Check if the build is success

  5. Merge your doc PR in kedro-datasets

  6. (Do a release)

  7. Revert the reference to latest kedro-datasets -

# setup.py
-         # "kedro-datasets[all]~=1.4.1",
+        "kedro-datasets[all]@git+https://github.com/kedro-org/kedro-plugins@fix/rst-indentation-managedtabledataset#subdirectory=kedro-datasets",

@noklam noklam mentioned this pull request Jul 3, 2023
8 tasks
@noklam noklam changed the title Update kedro_datasets.rst Update docs to include SparkStreamingDataSet and ManagedTableDataSet Jul 3, 2023
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrojuanlu astrojuanlu marked this pull request as ready for review July 3, 2023 10:42
@noklam noklam enabled auto-merge (squash) July 3, 2023 10:51
@noklam
Copy link
Contributor Author

noklam commented Jul 3, 2023

Let's merge it then we can go with the release

@noklam noklam merged commit 45fee94 into main Jul 3, 2023
@noklam noklam deleted the fix-spark-streaming-dataset branch July 3, 2023 11:27
@noklam noklam mentioned this pull request Aug 1, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants