Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate code snippets in datasets documentation - Part 1 #1962

Merged
merged 43 commits into from
Nov 9, 2022
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
14afd5d
Release/0.18.3 (#1856)
noklam Sep 20, 2022
570c574
Remove comment from code example
Oct 21, 2022
981dcfe
Remove more comments
Oct 21, 2022
c60aef6
Add YAML formatting
Oct 21, 2022
07159ff
Add missing import
Oct 21, 2022
09b0d46
Remove even more comments
Oct 21, 2022
48ba471
Remove more even more comments
Oct 21, 2022
770e3b9
Add pickle requirement to extras_require
Oct 21, 2022
66efe0d
Try fix YAML docs
Oct 21, 2022
80151f8
Try fix YAML docs pt 2
Oct 21, 2022
6e52ba6
Fix code snippets in docs (#1876)
AhdraMeraliQB Sep 27, 2022
4e3e7b4
Fix issue with specifying format for SparkHiveDataSet (#1857)
jstammers Sep 28, 2022
d955135
Update RELEASE.md (#1883)
noklam Sep 28, 2022
10972d4
Deprecate `kedro test` and `kedro lint` (#1873)
noklam Sep 29, 2022
57384cc
Fix micro package pull from PyPI (#1848)
FlorianGD Sep 30, 2022
d6feaac
Update Error message for `VersionNotFoundError` to handle Permission …
ankatiyar Oct 3, 2022
ce070f5
Update experiment tracking documentation with working examples (#1893)
merelcht Oct 4, 2022
4f45905
Add NHS AI Lab and ReSpo.Vision to companies list (#1878)
yetudada Oct 4, 2022
1108411
Document how users can use pytest instead of kedro test (#1879)
jmholzer Oct 4, 2022
f0a6f9f
Capitalise Kedro-Viz in the "Visualize layers" section (#1899)
yash6318 Oct 6, 2022
6fa2048
Fix linting on autmated test page (#1906)
merelcht Oct 6, 2022
ed06e8d
Add _SINGLE_PROCESS property to CachedDataSet (#1905)
carlaprv Oct 7, 2022
6de1e01
Update the tutorial of "Visualise pipelines" (#1913)
dinotuku Oct 7, 2022
5de89d5
Document how users can use linting tools instead of `kedro lint` (#1904)
ankatiyar Oct 10, 2022
72d9b96
Make core config accessible in dict get way (#1870)
merelcht Oct 11, 2022
ed37d70
Create dependabot.yml configuration file for version updates (#1862)
SajidAlamQB Oct 12, 2022
5d8bd9b
Update dependabot config (#1928)
SajidAlamQB Oct 12, 2022
81cf5a4
Update robots.txt (#1929)
SajidAlamQB Oct 12, 2022
0803e74
fix broken link (#1950)
noklam Oct 18, 2022
4dfb14e
Update dependabot.yml config (#1938)
SajidAlamQB Oct 19, 2022
3ce5dcb
Update setup.py Jinja2 dependencies (#1954)
noklam Oct 19, 2022
5051e1c
Update pip-tools requirement from ~=6.5 to ~=6.9 in /dependency (#1957)
dependabot[bot] Oct 19, 2022
252fb3a
Update toposort requirement from ~=1.5 to ~=1.7 in /dependency (#1956)
dependabot[bot] Oct 19, 2022
0760011
Add deprecation warning to package_name argument in session create() …
merelcht Oct 19, 2022
35d5c28
Remove redundant `resolve_load_version` call (#1911)
noklam Oct 20, 2022
011b5bb
Make docstring in test starter match real starters (#1916)
deepyaman Oct 20, 2022
14fbc85
Merge branch 'main' into fix/check-code-snippets
AhdraMeraliQB Oct 21, 2022
46d1c30
Merge branch 'main' into fix/check-code-snippets
AhdraMeraliQB Oct 21, 2022
571866d
Merge branch 'main' into fix/check-code-snippets
merelcht Oct 28, 2022
9b1a9e4
Merge branch 'main' into fix/check-code-snippets
AhdraMeraliQB Nov 7, 2022
2ec5c20
Try to fix formatting error
merelcht Nov 8, 2022
84ce260
Merge branch 'main' into fix/check-code-snippets
AhdraMeraliQB Nov 8, 2022
d0fa348
Specify pickle import
AhdraMeraliQB Nov 9, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion kedro/extras/datasets/email/message_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ class EmailMessageDataSet(
>>> msg["From"] = '"sin studly17"'
>>> msg["To"] = '"strong bad"'
>>>
>>> # data_set = EmailMessageDataSet(filepath="gcs://bucket/test")
>>> data_set = EmailMessageDataSet(filepath="test")
>>> data_set.save(msg)
>>> reloaded = data_set.load()
Expand Down
5 changes: 1 addition & 4 deletions kedro/extras/datasets/geopandas/geojson_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,7 @@ class GeoJSONDataSet(
>>>
>>> data = gpd.GeoDataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]}, geometry=[Point(1,1), Point(2,4)])
>>> # data_set = GeoJSONDataSet(filepath="gcs://bucket/test.geojson",
>>> save_args=None)
>>> data_set = GeoJSONDataSet(filepath="test.geojson",
>>> save_args=None)
>>> data_set = GeoJSONDataSet(filepath="test.geojson", save_args=None)
>>> data_set.save(data)
>>> reloaded = data_set.load()
>>>
Expand Down
5 changes: 0 additions & 5 deletions kedro/extras/datasets/json/json_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,13 @@ class JSONDataSet(AbstractVersionedDataSet[Any, Any]):
>>> json_dataset:
>>> type: json.JSONDataSet
>>> filepath: data/01_raw/location.json
>>> load_args:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

>>> lines: True
>>>
>>> cars:
>>> type: json.JSONDataSet
>>> filepath: gcs://your_bucket/cars.json
>>> fs_args:
>>> project: my-project
>>> credentials: my_gcp_credentials
>>> load_args:
>>> lines: True

Example using Python API:
::
Expand All @@ -47,7 +43,6 @@ class JSONDataSet(AbstractVersionedDataSet[Any, Any]):
>>>
>>> data = {'col1': [1, 2], 'col2': [4, 5], 'col3': [5, 6]}
>>>
>>> # data_set = JSONDataSet(filepath="gcs://bucket/test.json")
>>> data_set = JSONDataSet(filepath="test.json")
>>> data_set.save(data)
>>> reloaded = data_set.load()
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/pandas/csv_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ class CSVDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]):
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>>
>>> # data_set = CSVDataSet(filepath="gcs://bucket/test.csv")
>>> data_set = CSVDataSet(filepath="test.csv")
>>> data_set.save(data)
>>> reloaded = data_set.load()
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/pandas/excel_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ class ExcelDataSet(
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>>
>>> # data_set = ExcelDataSet(filepath="gcs://bucket/test.xlsx")
>>> data_set = ExcelDataSet(filepath="test.xlsx")
>>> data_set.save(data)
>>> reloaded = data_set.load()
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/pandas/feather_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ class FeatherDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]):
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>>
>>> # data_set = FeatherDataSet(filepath="gcs://bucket/test.feather")
>>> data_set = FeatherDataSet(filepath="test.feather")
>>>
>>> data_set.save(data)
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/pandas/generic_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ class GenericDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]):
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>>
>>> # data_set = GenericDataSet(filepath="s3://test.csv", file_format='csv')
>>> data_set = GenericDataSet(filepath="test.csv", file_format='csv')
>>> data_set.save(data)
>>> reloaded = data_set.load()
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/pandas/hdf_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ class HDFDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]):
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>>
>>> # data_set = HDFDataSet(filepath="gcs://bucket/test.hdf", key='data')
>>> data_set = HDFDataSet(filepath="test.h5", key='data')
>>> data_set.save(data)
>>> reloaded = data_set.load()
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/pandas/json_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ class JSONDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]):
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>>
>>> # data_set = JSONDataSet(filepath="gcs://bucket/test.json")
>>> data_set = JSONDataSet(filepath="test.json")
>>> data_set.save(data)
>>> reloaded = data_set.load()
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/pandas/parquet_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ class ParquetDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]):
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>>
>>> # data_set = ParquetDataSet(filepath="gcs://bucket/test.parquet")
>>> data_set = ParquetDataSet(filepath="test.parquet")
>>> data_set.save(data)
>>> reloaded = data_set.load()
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/pandas/xml_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ class XMLDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame]):
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>>
>>> # data_set = XMLDataSet(filepath="gcs://bucket/test.xml")
>>> data_set = XMLDataSet(filepath="test.xml")
>>> data_set.save(data)
>>> reloaded = data_set.load()
Expand Down
2 changes: 0 additions & 2 deletions kedro/extras/datasets/pickle/pickle_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,11 @@ class PickleDataSet(AbstractVersionedDataSet[Any, Any]):
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]})
>>>
>>> # data_set = PickleDataSet(filepath="gcs://bucket/test.pkl")
>>> data_set = PickleDataSet(filepath="test.pkl", backend="pickle")
>>> data_set.save(data)
>>> reloaded = data_set.load()
>>> assert data.equals(reloaded)
>>>
>>> # Add "compress_pickle[lz4]" to requirements.txt
>>> data_set = PickleDataSet(filepath="test.pickle.lz4",
>>> backend="compress_pickle",
>>> load_args={"compression":"lz4"},
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/pillow/image_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ class ImageDataSet(AbstractVersionedDataSet[Image.Image, Image.Image]):

>>> from kedro.extras.datasets.pillow import ImageDataSet
>>>
>>> # data_set = ImageDataSet(filepath="gcs://bucket/test.png")
>>> data_set = ImageDataSet(filepath="test.png")
>>> image = data_set.load()
>>> image.show()
Expand Down
26 changes: 13 additions & 13 deletions kedro/extras/datasets/plotly/plotly_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,21 @@ class PlotlyDataSet(JSONDataSet):
the JSON file directly from a pandas DataFrame through ``plotly_args``.

Example configuration for a PlotlyDataSet in the catalog:
::
.. code-block:: yaml

>>> bar_plot:
>>> type: plotly.PlotlyDataSet
>>> filepath: data/08_reporting/bar_plot.json
>>> plotly_args:
>>> type: bar
>>> fig:
>>> x: features
>>> y: importance
>>> orientation: h
>>> layout:
>>> xaxis_title: x
>>> yaxis_title: y
>>> title: Test
>>> type: plotly.PlotlyDataSet
>>> filepath: data/08_reporting/bar_plot.json
>>> plotly_args:
>>> type: bar
>>> fig:
>>> x: features
>>> y: importance
>>> orientation: h
>>> layout:
>>> xaxis_title: x
>>> yaxis_title: y
>>> title: Title
"""

# pylint: disable=too-many-arguments
Expand Down
1 change: 1 addition & 0 deletions kedro/extras/datasets/redis/redis_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ class PickleDataSet(AbstractDataSet[Any, Any]):
::

>>> from kedro.extras.datasets.redis import PickleDataSet
>>> import pandas as pd
>>>
>>> data = pd.DataFrame({'col1': [1, 2], 'col2': [4, 5],
>>> 'col3': [5, 6]})
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/text/text_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ class TextDataSet(AbstractVersionedDataSet[str, str]):
>>>
>>> string_to_write = "This will go in a file."
>>>
>>> # data_set = TextDataSet(filepath="gcs://bucket/test.md")
>>> data_set = TextDataSet(filepath="test.md")
>>> data_set.save(string_to_write)
>>> reloaded = data_set.load()
Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/tracking/json_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ class JSONDataSet(JDS):
>>>
>>> data = {'col1': 1, 'col2': 0.23, 'col3': 0.002}
>>>
>>> # data_set = JSONDataSet(filepath="gcs://bucket/test.json")
>>> data_set = JSONDataSet(filepath="test.json")
>>> data_set.save(data)

Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/tracking/metrics_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ class MetricsDataSet(JSONDataSet):
>>>
>>> data = {'col1': 1, 'col2': 0.23, 'col3': 0.002}
>>>
>>> # data_set = MetricsDataSet(filepath="gcs://bucket/test.json")
>>> data_set = MetricsDataSet(filepath="test.json")
>>> data_set.save(data)

Expand Down
1 change: 0 additions & 1 deletion kedro/extras/datasets/yaml/yaml_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ class YAMLDataSet(AbstractVersionedDataSet[Dict, Dict]):
>>>
>>> data = {'col1': [1, 2], 'col2': [4, 5], 'col3': [5, 6]}
>>>
>>> # data_set = YAMLDataSet(filepath="gcs://bucket/test.yaml")
>>> data_set = YAMLDataSet(filepath="test.yaml")
>>> data_set.save(data)
>>> reloaded = data_set.load()
Expand Down
3 changes: 3 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ def _collect_requirements(requires):
"pandas.XMLDataSet": [PANDAS, "lxml~=4.6"],
"pandas.GenericDataSet": [PANDAS],
}
pickle_require = {"pickle.PickleDataSet": ["compress-pickle~=2.1.0"]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original comment said: "# Add "compress_pickle[lz4]" to requirements.txt" Do we need to specify [lz4] for this requirement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - this is what the pip list showed after manually installing compress_pickle[lz4], I'll make the change to follow the dask example further up

pillow_require = {"pillow.ImageDataSet": ["Pillow~=9.0"]}
plotly_require = {
"plotly.PlotlyDataSet": [PANDAS, "plotly>=4.8.0, <6.0"],
Expand Down Expand Up @@ -121,6 +122,7 @@ def _collect_requirements(requires):
"holoviews": _collect_requirements(holoviews_require),
"networkx": _collect_requirements(networkx_require),
"pandas": _collect_requirements(pandas_require),
"pickle": _collect_requirements(pickle_require),
"pillow": _collect_requirements(pillow_require),
"plotly": _collect_requirements(plotly_require),
"redis": _collect_requirements(redis_require),
Expand All @@ -135,6 +137,7 @@ def _collect_requirements(requires):
**holoviews_require,
**networkx_require,
**pandas_require,
**pickle_require,
**pillow_require,
**plotly_require,
**spark_require,
Expand Down