Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add _SINGLE_PROCESS property to CachedDataSet #1905

Merged
merged 2 commits into from
Oct 7, 2022

Conversation

carlaprv
Copy link
Contributor

@carlaprv carlaprv commented Oct 5, 2022

Description

Solves #1888

Development notes

The CachedDataSet cannot be used with the ParellelRunner this PR adds the _SINGLE_PROCESS property just like in DeltaTableDataSet

Before, this PR trying to use CachedDataSet and ParallelRunner together was failing.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

@carlaprv carlaprv force-pushed the cprv-issue-CachedDataSet branch 4 times, most recently from 83f784e to a47b514 Compare October 5, 2022 21:44
@carlaprv
Copy link
Contributor Author

carlaprv commented Oct 5, 2022

Linting is failing for a file that I didn't modify docs/source/development/automated_testing.md

image

Fix proposed by the make lint command:

image

@carlaprv carlaprv marked this pull request as ready for review October 5, 2022 21:57
@carlaprv carlaprv requested a review from idanov as a code owner October 5, 2022 21:57
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! 😄 I've fixed the linter in another branch, so it should pass now.

I've left one minor suggestion and then it can be merged!

Comment on lines 32 to 33
# for parallelism within a Spark pipeline please consider
# ``ThreadRunner`` instead
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove these two sentences, because for the CachedDataSet this is not related to Spark in any way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Thanks for the suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MerelTheisenQB may we could keep the suggestion to use ThreadRunner?

# for parallelism please consider ``ThreadRunner`` instead

Copy link
Contributor

@jmholzer jmholzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congratulations on your first PR! 🎉 Great work.

For future reference, working on a branch on kedro-org/kedro repo is totally fine and is the way we normally do it. It simplifies the workflow by quite a bit 🙂.

@carlaprv
Copy link
Contributor Author

carlaprv commented Oct 6, 2022

Congratulations on your first PR! 🎉 Great work.

For future reference, working on a branch on kedro-org/kedro repo is totally fine and is the way we normally do it. It simplifies the workflow by quite a bit 🙂.

Thanks @jmholzer! I've followed the process in the contribution guidelines. Next time, I'll create a branch direct on kedro repo.

@jmholzer
Copy link
Contributor

jmholzer commented Oct 6, 2022

Congratulations on your first PR! 🎉 Great work.
For future reference, working on a branch on kedro-org/kedro repo is totally fine and is the way we normally do it. It simplifies the workflow by quite a bit 🙂.

Thanks @jmholzer! I've followed the process in the contribution guidelines. Next time, I'll create a branch direct on kedro repo.

Ohh I see 😃 I did the same for my first PR. Thanks for reminding us, let me see about creating an issue to update our contributor guidelines.

@carlaprv carlaprv requested review from merelcht and removed request for idanov October 6, 2022 17:52
@carlaprv carlaprv force-pushed the cprv-issue-CachedDataSet branch from a341e89 to 05ffe6a Compare October 6, 2022 18:00
@carlaprv carlaprv requested a review from yetudada as a code owner October 6, 2022 18:00
Signed-off-by: Carla Vieira <carlaprv@hotmail.com>

Signed-off-by: carlaprv <carlaprv@hotmail.com>
Signed-off-by: carlaprv <carlaprv@hotmail.com>
@carlaprv carlaprv force-pushed the cprv-issue-CachedDataSet branch from 05ffe6a to 28f9ee8 Compare October 6, 2022 18:07
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the contribution! ⭐ ⭐ ⭐

@merelcht merelcht merged commit e1da30f into kedro-org:main Oct 7, 2022
AhdraMeraliQB pushed a commit that referenced this pull request Oct 21, 2022
Signed-off-by: Carla Vieira <carlaprv@hotmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
nickolasrm pushed a commit to ProjetaAi/kedro that referenced this pull request Oct 26, 2022
Signed-off-by: Carla Vieira <carlaprv@hotmail.com>
Signed-off-by: nickolasrm <nickolasrochamachado@gmail.com>
AhdraMeraliQB added a commit that referenced this pull request Nov 9, 2022
* Release/0.18.3 (#1856)

* Update release version and release notes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Update missing release notes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* update vresion

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* update release notes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove comment from code example

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove more comments

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add YAML formatting

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add missing import

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove even more comments

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove more even more comments

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add pickle requirement to extras_require

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Try fix YAML docs

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Try fix YAML docs pt 2

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Fix code snippets in docs (#1876)

* Fix code snippets

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Separate code blocks

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Lint

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Fix issue with specifying format for SparkHiveDataSet (#1857)

Signed-off-by: jstammers <jimmy.stammers@cgastrategy.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update RELEASE.md (#1883)

* Update RELEASE.md

* fix broken link

* Update RELEASE.md

Co-authored-by: Merel Theisen <49397448+MerelTheisenQB@users.noreply.github.com>

Co-authored-by: Merel Theisen <49397448+MerelTheisenQB@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Deprecate `kedro test` and `kedro lint` (#1873)

* Deprecating `kedro test` and `kedro lint`

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Deprecate commands

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Make kedro looks prettier

* Update Linting

Signed-off-by: Nok <nok_lam_chan@mckinsey.com>

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Fix micro package pull from PyPI (#1848)

Signed-off-by: Florian Gaudin-Delrieu <florian.gaudindelrieu@gmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update Error message for `VersionNotFoundError` to handle Permission related issues better (#1881)

* Update message for VersionNotFoundError

Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>

* Add test for VersionNotFoundError for cloud protocols

* Update test_data_catalog.py

Update NoVersionFoundError test

* minor linting update

* update docs link + styling changes

* Revert "update docs link + styling changes"

This reverts commit 6088e00.

* Update test with styling changes

* Update RELEASE.md

Signed-off-by: ankatiyar <ankitakatiyar2401@gmail.com>

Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Signed-off-by: ankatiyar <ankitakatiyar2401@gmail.com>
Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update experiment tracking documentation with working examples (#1893)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add NHS AI Lab and ReSpo.Vision to companies list (#1878)

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Document how users can use pytest instead of kedro test (#1879)

* Add best_practices.md with introductory sections

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add pytest and pytest-cov sections

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add pytest-cov coverage report

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add sections on pytest-cov

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add automated_testing to index.rst

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Reformat third-party library names and clean grammar.

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add link to virtual environment docs

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add example of good test naming

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Improve link accessibility

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Improve pytest docs link accessibility

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add reminder link to virtual environment docs

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Fix formatting in link to coverage docs

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Remove reference to /src under 'Run your tests'

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Modify references to <project_name> to <package_name>

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Fix sentence structure

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Fix broken databricks doc link

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Capitalise Kedro-Viz in the "Visualize layers" section (#1899)

* Capitalised kedro-viz

Signed-off-by: yash6318 <yash.agrawal.cse21@iitbhu.ac.in>

* capitalised Kedro viz

Signed-off-by: yash6318 <yash.agrawal.cse21@iitbhu.ac.in>

* Updated set_up_experiment_tracking.md

Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: yash6318 <yash.agrawal.cse21@iitbhu.ac.in>

Signed-off-by: yash6318 <yash.agrawal.cse21@iitbhu.ac.in>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Fix linting on autmated test page (#1906)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add _SINGLE_PROCESS property to CachedDataSet (#1905)

Signed-off-by: Carla Vieira <carlaprv@hotmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update the tutorial of "Visualise pipelines" (#1913)

* Change a file extention to match the previous article

Signed-off-by: dinotuku <kuan.tung@epfl.ch>

* Add a missing import

Signed-off-by: dinotuku <kuan.tung@epfl.ch>

* Change both preprocessed datasets to parquet files

Signed-off-by: dinotuku <kuan.tung@epfl.ch>

* Change data type to ParquetDataSet for parquet files

Signed-off-by: dinotuku <kuan.tung@epfl.ch>

* Add a note for installing seaborn if it is not installed

Signed-off-by: dinotuku <kuan.tung@epfl.ch>

Signed-off-by: dinotuku <kuan.tung@epfl.ch>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Document how users can use linting tools instead of `kedro lint` (#1904)

* Add documentation for linting tools

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Revert changes to commands_reference.md

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Update linting docs with suggestions

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Update linting doc

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Make core config accessible in dict get way  (#1870)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Create dependabot.yml configuration file for version updates (#1862)

* Create dependabot.yml configuration file

* Update dependabot.yml

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* add target-branch

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* Update dependabot.yml

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* limit dependabot to just dependency folder

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* Update test_requirements.txt

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* Update MANIFEST.in

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* fix e2e

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* Update continue_config.yml

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* Update requirements.txt

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* Update requirements.txt

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* fix link

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* revert

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* Delete requirements.txt

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update dependabot config (#1928)

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update robots.txt (#1929)

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* fix broken link (#1950)

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update dependabot.yml config  (#1938)

* Update dependabot.yml

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* pin jupyterlab_services to requirments

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

* lint

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update setup.py Jinja2 dependencies (#1954)

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update pip-tools requirement from ~=6.5 to ~=6.9 in /dependency (#1957)

Updates the requirements on [pip-tools](https://github.com/jazzband/pip-tools) to permit the latest version.
- [Release notes](https://github.com/jazzband/pip-tools/releases)
- [Changelog](https://github.com/jazzband/pip-tools/blob/master/CHANGELOG.md)
- [Commits](jazzband/pip-tools@6.5.0...6.9.0)

---
updated-dependencies:
- dependency-name: pip-tools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Update toposort requirement from ~=1.5 to ~=1.7 in /dependency (#1956)

Updates the requirements on [toposort]() to permit the latest version.

---
updated-dependencies:
- dependency-name: toposort
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Add deprecation warning to package_name argument in session create() (#1953)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Remove redundant `resolve_load_version` call (#1911)

* remove a redundant function call

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Remove redundant resolove_load_version & fix test

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Fix HoloviewWriter tests with more specific error message pattern & Lint

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Rename tests

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Make docstring in test starter match real starters (#1916)

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>

* Try to fix formatting error

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>

* Specify pickle import

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: jstammers <jimmy.stammers@cgastrategy.com>
Signed-off-by: Nok <nok_lam_chan@mckinsey.com>
Signed-off-by: Florian Gaudin-Delrieu <florian.gaudindelrieu@gmail.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Signed-off-by: ankatiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: yash6318 <yash.agrawal.cse21@iitbhu.ac.in>
Signed-off-by: Carla Vieira <carlaprv@hotmail.com>
Signed-off-by: dinotuku <kuan.tung@epfl.ch>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Nok <mediumnok@gmail.com>
Co-authored-by: Jimmy Stammers <jimmy.stammers@gmail.com>
Co-authored-by: Merel Theisen <49397448+MerelTheisenQB@users.noreply.github.com>
Co-authored-by: Florian Gaudin-Delrieu <9217921+FlorianGD@users.noreply.github.com>
Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Co-authored-by: Yetunde Dada <43755008+yetudada@users.noreply.github.com>
Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Co-authored-by: Yash Agrawal <96697569+yash6318@users.noreply.github.com>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Carla Vieira <carlaprv@hotmail.com>
Co-authored-by: Kuan Tung <kuan.tung@epfl.ch>
Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Co-authored-by: Merel Theisen <merel.theisen@quantumblack.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants