-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datasets): support Polars lazy evaluation #350
feat(datasets): support Polars lazy evaluation #350
Conversation
Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
You are killing it with this rate of PR 🔥 |
And just as I created this PR, Polars releases a new version with native support to read parquet files from AWS/GCP/Azure... From what I can see, it should also work for the lazy API. So, this brings us to the next question: how do we proceed with this PR?
|
Maybe wait a couple of releases to see if it goes from experimental to mature? If we are going to throw away this code in a couple of months, I wouldn't say it makes a lot of sense to continue working on it. They also said that the first release candidate of polars 1.0 is coming before the end of the year pola-rs/polars#6616 (comment) so maybe things will estabilize soon :) |
I gave it a little thought and if anyone is willing to review, I would like to finish it, despite Polars’ native support for reading/writing to object store. The reason is simple; if this functionality stabilises on Polars’ side, we only have change (read, simplify) our load/save methods! The way I see it moving forward is that we need 2 datasets: one for eager loading and one for lazy loading. So I would keep the Curious to hear other opinions! |
There's two options:
as @noklam suggested in #224 (comment) Since the return type is going to be completely different, I'd rather have a different dataset indeed. |
Is it ok if I:
To make it clear that one uses the eager API while the other one uses the Lazy API Shouldn't we then also consider removing (or mark as deprecated) the |
Add PolarsDataSet as an alias for PolarsDataset with deprecation warning. Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
This is a discussion worth having but it has some implications in the docs, starters etc, could you open a separate issue about it? |
Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
Corrected PolarsDataSet to PolarsDataset in the pattern to match in test_load_missing_file Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this contribution @MatthiasRoels! I've left some comments, the main being around the naming syntax (classes ending in DataSet
are old, any new ones should end in Dataset
).
In terms of naming the datasets, how about:
LazyPolarsDataset
EagerPolarsDataset
No worries, I am always happy to contribute!
@merelcht That a nice suggestion! But that would introduce a breaking change (renaming the GenericDataset to EagerPolarsDataset). Should I already rename it while keeping the old names with deprecation warnings (like we did for DataSet vs Dataset)? |
Remove reference to PolarsDataSet as this is not required for new dataset implementations. Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
Yes that sounds good! And then we can remove the alias in the next breaking kedro-datasets release |
Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
d2409df
to
1193218
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried running this, but the code looks good! Can you also add this change to the release note + mention that polars.GenericDataSet
will be deprecated and be replaced by polars.EagerPolarsDataset
?
kedro-datasets/setup.py
Outdated
@@ -59,6 +59,10 @@ def _collect_requirements(requires): | |||
[ | |||
POLARS, "pyarrow>=4.0", "xlsx2csv>=0.8.0", "deltalake >= 0.6.2" | |||
], | |||
"polars.LazyPolarsDataset": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should EagerPolarsDataset
be added here as well? If someone uses that directly I'm guessing the requirements would now not be picked up?
Signed-off-by: Matthias Roels <mroels2@its.jnj.com>
Read the Docs build failed because of a dependency conflict with Dask in test vs docs. Weird that this does not occur in other PR's (or is it?)... |
We are aware of this issue, it has nothing to do with your PR! |
Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Is there an open issue for this already? |
It got fixed already in #396 ✨ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried with a local CSV and it works, but the same CSV on a Minio bucket failed:
In [1]: import polars as pl
In [2]: from kedro_datasets.polars import LazyPolarsDataset
In [3]: ds = LazyPolarsDataset(
...: filepath="s3://temp-openrepair/OpenRepairData_v0.3_aggregate_202210.csv",
...: file_format="csv",
...: load_args=dict(dtypes=dict(product_age=pl.Float64, group_identifier=pl.Utf8), try_parse_dates=True),
...: )
In [4]: df_l = ds.load()
In [5]: df_l
Out[5]: <LazyFrame [14 cols, {"id": Utf8 … "problem": Utf8}] at 0x107F1ABB0>
In [6]: df_l.collect().head()
---------------------------------------------------------------------------
ComputeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 df_l.collect().head()
File ~/.micromamba/envs/kedro38-dev2/lib/python3.8/site-packages/polars/utils/deprecation.py:96, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
91 @wraps(function)
92 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
93 _rename_keyword_argument(
94 old_name, new_name, kwargs, function.__name__, version
95 )
---> 96 return function(*args, **kwargs)
File ~/.micromamba/envs/kedro38-dev2/lib/python3.8/site-packages/polars/lazyframe/frame.py:1787, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, _eager)
1774 comm_subplan_elim = False
1776 ldf = self._ldf.optimization_toggle(
1777 type_coercion,
1778 predicate_pushdown,
(...)
1785 _eager,
1786 )
-> 1787 return wrap_df(ldf.collect())
ComputeError: ArrowInvalid: In CSV column #11: Row #9889: CSV conversion error to int64: invalid value 'Fixit Clinic'
Notice that it's trying to use int64
for the group_identifier
column, despite having specified pl.Utf8
. Maybe load_args
is not being properly passed for fsspec files?
The EagerPolarsDataset
doesn't have this problem:
In [16]: ds = EagerPolarsDataset(
...: filepath="s3://temp-openrepair/OpenRepairData_v0.3_aggregate_202210.csv",
...: file_format="csv",
...: load_args=dict(dtypes=dict(product_age=pl.Float64, group_identifier=pl.Utf8), try_parse_dates=True),
...: )
In [17]: ds.load().head()
Out[17]:
shape: (5, 14)
┌─────────────────┬───────────────┬─────────┬─────────────────┬───┬─────────────────┬─────────────────┬────────────┬─────────────────┐
│ id ┆ data_provider ┆ country ┆ partner_product ┆ … ┆ repair_barrier_ ┆ group_identifie ┆ event_date ┆ problem │
│ --- ┆ --- ┆ --- ┆ _category ┆ ┆ if_end_of_life ┆ r ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ --- ┆ ┆ --- ┆ --- ┆ date ┆ str │
│ ┆ ┆ ┆ str ┆ ┆ str ┆ str ┆ ┆ │
╞═════════════════╪═══════════════╪═════════╪═════════════════╪═══╪═════════════════╪═════════════════╪════════════╪═════════════════╡
│ anstiftung_2749 ┆ anstiftung ┆ DEU ┆ Elektro divers ┆ … ┆ null ┆ 5073 ┆ 2012-06-20 ┆ Funktionierte │
│ ┆ ┆ ┆ ~ Nähmaschine ┆ ┆ ┆ ┆ ┆ nicht mehr. │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ Fehler… │
│ anstiftung_2750 ┆ anstiftung ┆ DEU ┆ Computer ~ ┆ … ┆ null ┆ 5073 ┆ 2012-06-20 ┆ Wurde schnell │
│ ┆ ┆ ┆ Laptop ┆ ┆ ┆ ┆ ┆ heiß. Der │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ Lüfter w… │
│ anstiftung_2746 ┆ anstiftung ┆ DEU ┆ Computer ~ ┆ … ┆ null ┆ 5073 ┆ 2012-06-20 ┆ Funktionierte │
│ ┆ ┆ ┆ Drucker ┆ ┆ ┆ ┆ ┆ nicht mehr. │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ Fehler… │
│ anstiftung_2747 ┆ anstiftung ┆ DEU ┆ Unterhaltungsel ┆ … ┆ null ┆ 5073 ┆ 2012-06-20 ┆ Funktionierte │
│ ┆ ┆ ┆ ektronik ~ ┆ ┆ ┆ ┆ ┆ nicht mehr. │
│ ┆ ┆ ┆ Kopfhö… ┆ ┆ ┆ ┆ ┆ Fehler… │
│ anstiftung_2742 ┆ anstiftung ┆ DEU ┆ Haushaltsgeräte ┆ … ┆ null ┆ 5073 ┆ 2012-09-19 ┆ Die Beine der │
│ ┆ ┆ ┆ ~ Spielzeug ┆ ┆ ┆ ┆ ┆ Puppe waren ab. │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ Si… │
└─────────────────┴───────────────┴─────────┴─────────────────┴───┴─────────────────┴─────────────────┴────────────┴─────────────────┘
This is using https://github.com/astrojuanlu/workshop-jupyter-kedro/tree/workshop-steps/data by the way |
Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
@astrojuanlu: You were correct that the load args were not properly passed when loading files from object stores. However, this will not fix the issue you had because for object stores, we leverage Arrow to load the dataset. This means that you actually have to pass an Arrow schema (instead of polars So you should be able to do the following now: import polars as pl
import pyarrow as pa
from kedro_datasets.polars import LazyPolarsDataset
pa_schema = pa.schema(
[
("id", pa.string()),
("data_provider", pa.string()),
("country", pa.string()),
("partner_product_category", pa.string()),
("product_category", pa.string()),
("product_category_id", pa.int64()),
("brand", pa.string()),
("year_of_manufacture", pa.int64()),
("product_age", pa.float64()),
("repair_status", pa.string()),
("repair_barrier_if_end_of_life", pa.string()),
("group_identifier", pa.string()),
("event_date", pa.date32()),
("problem", pa.string()),
]
)
ds = LazyPolarsDataset(
filepath="s3://temp-openrepair/OpenRepairData_v0.3_aggregate_202210.csv",
file_format="csv",
load_args={"schema": pa_schema},
) |
I see, thanks! it's a bit annoying to specify the full schema for remote filepaths, do you think this is enough reason to embrace the new method, even if it's beta? |
@astrojuanlu: I think there are pro's and cons of both approaches Current approach
Cons:
Approach when we force a newer version of Polars
Cons:
Given the pros/cons of each, we need to decide how we proceed. I think it comes down to stability vs convenience, no? |
Good assessment, thanks for the writeup. I keep coming back and forth about this, because I'm pretty confident the new approach will stabilise, but to be honest there's no 100 % guarantee. Let's proceed with what you have created here, and we can revisit when Polars 1.0 is out 👍🏽 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny comment, but otherwise LGTM! Thank you so much for your contribution @MatthiasRoels it's truly awesome ⭐ 😄
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Matthias Roels <matthias.roels21@gmail.com>
6ea5ac5
to
d3f7e5c
Compare
Congrats on your first contribution @MatthiasRoels ! 🎉 |
* feat(datasets) add PolarsDataset to support Polars's Lazy API Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * Fix(datasets): rename PolarsDataSet to PolarsDataSet Add PolarsDataSet as an alias for PolarsDataset with deprecation warning. Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * Fix(datasets): apply ruff linting rules Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * Fix(datasets): Correct pattern matching when Raising exceptions Corrected PolarsDataSet to PolarsDataset in the pattern to match in test_load_missing_file Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * fix(datasets): clean up PolarsDataset related code Remove reference to PolarsDataSet as this is not required for new dataset implementations. Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * feat(datasets): Rename Polars Datasets to better describe their intent Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * feat(datasets): clean up LazyPolarsDataset Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * fix(datasets): increase test coverage for PolarsDataset classes Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * docs(datasets): add renamed Polars datasets to docs Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * docs(datasets): Add new polars datasets to release notes Signed-off-by: Matthias Roels <mroels2@its.jnj.com> * fix(datasets): load_args not properly passed to LazyPolarsDataset.load Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * docs(datasets): fix spelling error in release notes Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> --------- Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> Signed-off-by: Matthias Roels <mroels2@its.jnj.com> Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Matthias Roels <mroels2@its.jnj.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com>
* refactor(datasets): deprecate "DataSet" type names (#328) * refactor(datasets): deprecate "DataSet" type names (api) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (biosequence) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (dask) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (databricks) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (email) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (geopandas) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (holoviews) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (json) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (matplotlib) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (networkx) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.csv_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.deltatable_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.excel_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.feather_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.gbq_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.generic_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.hdf_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.json_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.parquet_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.sql_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.xml_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pickle) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pillow) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (plotly) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (polars) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (redis) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (snowflake) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (spark) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (svmlight) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (tensorflow) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (text) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (tracking) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (video) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (yaml) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): ignore TensorFlow coverage issues Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added basic code for geotiff Signed-off-by: tgoelles <thomas.goelles@gmail.com> * renamed to xarray Signed-off-by: tgoelles <thomas.goelles@gmail.com> * renamed to xarray Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added load and self args Signed-off-by: tgoelles <thomas.goelles@gmail.com> * only local files Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added empty test Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added test data Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added rioxarray requirements Signed-off-by: tgoelles <thomas.goelles@gmail.com> * reformat with black Signed-off-by: tgoelles <thomas.goelles@gmail.com> * rioxarray 0.14 Signed-off-by: tgoelles <thomas.goelles@gmail.com> * rioxarray 0.15 Signed-off-by: tgoelles <thomas.goelles@gmail.com> * rioxarray 0.12 Signed-off-by: tgoelles <thomas.goelles@gmail.com> * rioxarray 0.9 Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fixed dataset typo Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fixed docstring for sphinx Signed-off-by: tgoelles <thomas.goelles@gmail.com> * run black Signed-off-by: tgoelles <thomas.goelles@gmail.com> * sort imports Signed-off-by: tgoelles <thomas.goelles@gmail.com> * class docstring Signed-off-by: tgoelles <thomas.goelles@gmail.com> * black Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fixed pylint Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added release notes Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added yaml example Signed-off-by: tgoelles <thomas.goelles@gmail.com> * improve testing WIP Signed-off-by: tgoelles <thomas.goelles@gmail.com> * basic test success Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test reloaded Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test exists Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added version Signed-off-by: tgoelles <thomas.goelles@gmail.com> * basic test suite Signed-off-by: tgoelles <thomas.goelles@gmail.com> * run black Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added example and test it Signed-off-by: tgoelles <thomas.goelles@gmail.com> * deleted duplications Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fixed position of example Signed-off-by: tgoelles <thomas.goelles@gmail.com> * black Signed-off-by: tgoelles <thomas.goelles@gmail.com> * style: Introduce `ruff` for linting in all plugins. (#354) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * feat(datasets): create custom `DeprecationWarning` (#356) * feat(datasets): create custom `DeprecationWarning` Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * feat(datasets): use the custom deprecation warning Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): show Kedro's deprecation warnings Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * fix(datasets): remove unused imports in test files Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): add note about DataSet deprecation (#357) Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): skip `tensorflow` tests on Windows (#363) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci: Pin `tables` version (#370) * Pin tables version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Also fix kedro-airflow Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Revert trying to fix airflow Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(datasets): Release `1.7.1` (#378) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs: Update CONTRIBUTING.md and add one for `kedro-datasets` (#379) Update CONTRIBUTING.md + add one for kedro-datasets Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(datasets): Run tensorflow tests separately from other dataset tests (#377) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat: Kedro-Airflow convert all pipelines option (#335) * feat: kedro airflow convert --all option Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * docs: release docs Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> --------- Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): blacken code in rst literal blocks (#362) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs: cloudpickle is an interesting extension of the pickle functionality (#361) Signed-off-by: H. Felix Wittmann <hfwittmann@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Fix secret scan entropy error (#383) Fix secret scan entropy error Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * style: Rename mentions of `DataSet` to `Dataset` in `kedro-airflow` and `kedro-telemetry` (#384) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Migrated `PartitionedDataSet` and `IncrementalDataSet` from main repository to kedro-datasets (#253) Signed-off-by: Peter Bludau <ptrbld.dev@gmail.com> Co-authored-by: Merel Theisen <merel.theisen@quantumblack.com> * fix: backwards compatibility for `kedro-airflow` (#381) Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added metadata Signed-off-by: tgoelles <thomas.goelles@gmail.com> * after linting Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ignore ruff PLR0913 Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Don't warn for SparkDataset on Databricks when using s3 (#341) Signed-off-by: Alistair McKelvie <alistair.mckelvie@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Hot fix for RTD due to bad pip version (#396) fix RTD Signed-off-by: Nok <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Pin pip version temporarily (#398) * Pin pip version temporarily Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Hive support failures Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Also pin pip on lint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Temporary ignore databricks spark tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * perf(datasets): don't create connection until need (#281) * perf(datasets): delay `Engine` creation until need Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore: don't check coverage in TYPE_CHECKING block Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * perf(datasets): don't connect in `__init__` method Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): fix tests to touch `create_engine` Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * perf(datasets): don't connect in `__init__` method Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * style(datasets): exec Ruff on sql_dataset.py :dog: Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Undo changes to `engines` values type (for Sphinx) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Patch Sphinx build by removing `Engine` references * perf(datasets): don't connect in `__init__` method Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): don't require coverage for import * chore(datasets): del unused `TYPE_CHECKING` import * docs(datasets): document lazy connection in README * perf(datasets): remove create in `SQLQueryDataset` Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): do not return the created conn Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore: Drop Python 3.7 support for kedro-plugins (#392) * Remove references to Python 3.7 Signed-off-by: lrcouto <laurarccouto@gmail.com> * Revert kedro-dataset changes Signed-off-by: lrcouto <laurarccouto@gmail.com> * Revert kedro-dataset changes Signed-off-by: lrcouto <laurarccouto@gmail.com> * Add information to release docs Signed-off-by: lrcouto <laurarccouto@gmail.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): support Polars lazy evaluation (#350) * feat(datasets) add PolarsDataset to support Polars's Lazy API Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * Fix(datasets): rename PolarsDataSet to PolarsDataSet Add PolarsDataSet as an alias for PolarsDataset with deprecation warning. Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * Fix(datasets): apply ruff linting rules Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * Fix(datasets): Correct pattern matching when Raising exceptions Corrected PolarsDataSet to PolarsDataset in the pattern to match in test_load_missing_file Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * fix(datasets): clean up PolarsDataset related code Remove reference to PolarsDataSet as this is not required for new dataset implementations. Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * feat(datasets): Rename Polars Datasets to better describe their intent Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * feat(datasets): clean up LazyPolarsDataset Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * fix(datasets): increase test coverage for PolarsDataset classes Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * docs(datasets): add renamed Polars datasets to docs Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * docs(datasets): Add new polars datasets to release notes Signed-off-by: Matthias Roels <mroels2@its.jnj.com> * fix(datasets): load_args not properly passed to LazyPolarsDataset.load Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * docs(datasets): fix spelling error in release notes Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> --------- Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> Signed-off-by: Matthias Roels <mroels2@its.jnj.com> Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Matthias Roels <mroels2@its.jnj.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(datasets): Release `1.8.0` (#406) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(airflow): Release 0.7.0 (#407) * bump version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update release notes Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(telemetry): Release 0.3.0 (#408) Bump version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(docker): Release 0.4.0 (#409) Bump version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * style(airflow): blacken README.md of Kedro-Airflow (#418) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Fix missing jQuery (#414) Fix missing jQuery Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Fix Lazy Polars dataset to use the new-style base class (#413) * Fix Lazy Polars dataset to use the new-style base class Fix gh-412 Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Update release notes Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Revert "Update release notes" This reverts commit 92ceea6d8fa412abf3d8abd28a2f0a22353867ed. --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): lazily load `partitions` classes (#411) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): fix code blocks and `data_set` use (#417) * chore(datasets): lazily load `partitions` classes Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): run doctests to check examples run Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): keep running tests amidst failures Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): format ManagedTableDataset example Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): ignore breaking mods for doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * style(airflow): black code in Kedro-Airflow README Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): fix example syntax, and autoformat Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `kedro.extras.datasets` ref Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `>>> ` prefix for YAML code Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `kedro.extras.datasets` ref Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): replace `data_set`s with `dataset`s Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): undo changes for running doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * revert(datasets): undo lazily load `partitions` classes Refs: 3fdc5a8efa034fa9a18b7683a942415915b42fb5 Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * revert(airflow): undo black code in Kedro-Airflow README Refs: dc3476ea36bac98e2adcc0b52a11b0f90001e31d Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix: TF model load failure when model is saved as a TensorFlow Saved Model format (#410) * fixes TF model load failure when model is saved as a TensorFlow Saved Model format when a model is saved in the TensorFlow SavedModel format ("tf" default option in tf.save_model when using TF 2.x) via the catalog.xml file, the subsequent loading of that model for further use in a subsequent node fails. The issue is linked to the fact that the model files don't get copied into the temporary folder, presumably because the _fs.get function "thinks" that the provided path is a file and not a folder. Adding an terminating "/" to the path fixes the issue. Signed-off-by: Edouard59 <68538605+Edouard59@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Drop support for Python 3.7 on kedro-datasets (#419) * Drop support for Python 3.7 on kedro-datasets Signed-off-by: lrcouto <laurarccouto@gmail.com> * Remove redundant 3.8 markers Signed-off-by: lrcouto <laurarccouto@gmail.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> Signed-off-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> * test(datasets): run doctests to check examples run (#416) * chore(datasets): lazily load `partitions` classes Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): run doctests to check examples run Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): keep running tests amidst failures Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): format ManagedTableDataset example Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): ignore breaking mods for doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * style(airflow): black code in Kedro-Airflow README Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): fix example syntax, and autoformat Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `kedro.extras.datasets` ref Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `>>> ` prefix for YAML code Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `kedro.extras.datasets` ref Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): replace `data_set`s with `dataset`s Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): run doctests separately Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * separate dataset-doctests Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * chore(datasets): ignore non-passing tests to make CI pass Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): fix comment location Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): fix .py.py Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): don't measure coverage on doctest run Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * build(datasets): fix windows and snowflake stuff in Makefile Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Nok <nok.lam.chan@quantumblack.com> Co-authored-by: Nok <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Add support for `databricks-connect>=13.0` (#352) Signed-off-by: Miguel Rodriguez Gutierrez <miguel7r@hotmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(telemetry): remove double execution by moving to after catalog created hook (#422) * remove double execution by moving to after catalog created hook Signed-off-by: Florian Roessler <roessler.fd@gmail.com> * update release notes Signed-off-by: Florian Roessler <roessler.fd@gmail.com> * fix tests Signed-off-by: Florian Roessler <roessler.fd@gmail.com> * remove unsued fixture Signed-off-by: Florian Roessler <roessler.fd@gmail.com> --------- Signed-off-by: Florian Roessler <roessler.fd@gmail.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs: Add python version support policy to plugin `README.md`s (#425) * Add python version support policy to plugin readmes Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * Temporarily pin connexion Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> --------- Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(airflow): Use new docs link (#393) Use new docs link Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * style: Add shared CSS and meganav to datasets docs (#400) * Add shared CSS and meganav Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Add end of file Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Add new heap data source Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * adjust heap parameter Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Remove nav_version next to Kedro logo in top left; add Kedro logo * Revise project name and author name Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Use full kedro icon and type for logo * Add close btn to mobile nav Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Add css for mobile nav logo image Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Update close button for mobile nav Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Add open button to mobile nav Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Delete kedro-datasets/docs/source/kedro-horizontal-color-on-light.svg Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Update conf.py Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Update layout.html Add links to subprojects Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Remove svg from docs -- not needed?? Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * linter error fix Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> --------- Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> Co-authored-by: Tynan DeBold <thdebold@gmail.com> Co-authored-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Add Hugging Face datasets (#344) * Add HuggingFace datasets Co-authored-by: Danny Farah <danny_farah@mckinsey.com> Co-authored-by: Kevin Koga <Kevin_Koga@mckinsey.com> Co-authored-by: Mate Scharnitzky <Mate_Scharnitzky@mckinsey.com> Co-authored-by: Tomer Shor <Tomer_Shor@mckinsey.com> Co-authored-by: Pierre-Yves Mousset <Pierre-Yves_Mousset@mckinsey.com> Co-authored-by: Bela Chupal <Bela_chuphal@mckinsey.com> Co-authored-by: Khangjrakpam Arjun <Khangjrakpam_Arjun@mckinsey.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Apply suggestions from code review Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Joel <35801847+datajoely@users.noreply.github.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> * Typo Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Fix docstring Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add docstring for HFTransformerPipelineDataset Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Use intersphinx for cross references in Hugging Face docstrings Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add docstring for HFDataset Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add missing test dependencies Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add tests for huggingface datasets Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Fix HFDataset.save Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add test for HFDataset.list_datasets Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Use new name Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Consolidate imports Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Danny Farah <danny_farah@mckinsey.com> Co-authored-by: Kevin Koga <Kevin_Koga@mckinsey.com> Co-authored-by: Mate Scharnitzky <Mate_Scharnitzky@mckinsey.com> Co-authored-by: Tomer Shor <Tomer_Shor@mckinsey.com> Co-authored-by: Pierre-Yves Mousset <Pierre-Yves_Mousset@mckinsey.com> Co-authored-by: Bela Chupal <Bela_chuphal@mckinsey.com> Co-authored-by: Khangjrakpam Arjun <Khangjrakpam_Arjun@mckinsey.com> Co-authored-by: Joel <35801847+datajoely@users.noreply.github.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): fix `dask.ParquetDataset` doctests (#439) * test(datasets): fix `dask.ParquetDataset` doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): use `tmp_path` fixture in doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): simplify by not passing the schema Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): ignore conftest for doctests cover Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Create MANIFEST.in Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * refactor: Remove `DataSet` aliases and mentions (#440) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * chore(datasets): replace "Pyspark" with "PySpark" (#423) Consistently write "PySpark" rather than "Pyspark" Also, fix list formatting Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): make `api.APIDataset` doctests run (#448) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): Fix `pandas.GenericDataset` doctest (#445) Fix pandas.GenericDataset doctest Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): make datasets arguments keywords only (#358) * feat(datasets): make `APIDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `BioSequenceDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `ParquetDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `EmailMessageDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `GeoJSONDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `HoloviewsWriter.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `JSONDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `MatplotlibWriter.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `GMLDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `GraphMLDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make NetworkX `JSONDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `PickleDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `ImageDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make plotly `JSONDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `PlotlyDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make polars `CSVDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make polars `GenericDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make redis `PickleDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SnowparkTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SVMLightDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `TensorFlowModelDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `TextDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `YAMLDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `ManagedTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `VideoDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `CSVDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `DeltaTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `ExcelDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `FeatherDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `GBQTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `GenericDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make pandas `JSONDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make pandas `ParquerDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SQLTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `XMLDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `HDFDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `DeltaTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SparkDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SparkHiveDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SparkJDBCDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SparkStreamingDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `IncrementalDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `LazyPolarsDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * docs(datasets): update doctests for HoloviewsWriter Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * Update release notes Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> --------- Signed-off-by: Felix Scherz <felixwscherz@gmail.com> Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Co-authored-by: Felix Scherz <felixwscherz@gmail.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Drop support for python 3.8 on kedro-datasets (#442) * Drop support for python 3.8 on kedro-datasets --------- Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Signed-off-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): add outputs to matplotlib doctests (#449) * test(datasets): add outputs to matplotlib doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Update Makefile Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Reformat code example, line length is short enough * Update kedro-datasets/kedro_datasets/matplotlib/matplotlib_writer.py Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): Fix more doctest issues (#451) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): fix failing doctests in Windows CI (#457) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): fix accidental reference to NumPy (#450) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): don't pollute dev env in doctests (#452) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat: Add tools to heap event (#430) * Add add-on data to heap event Signed-off-by: lrcouto <laurarccouto@gmail.com> * Move addons logic to _get_project_property Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add condition for pyproject.toml Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Fix tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Fix tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * add tools to mock Signed-off-by: lrcouto <laurarccouto@gmail.com> * lint Signed-off-by: lrcouto <laurarccouto@gmail.com> * Update tools test Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add after_context_created tools test Signed-off-by: lrcouto <laurarccouto@gmail.com> * Update rename to tools Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update kedro-telemetry/tests/test_plugin.py Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Co-authored-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(datasets): install deps in single `pip install` (#454) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(datasets): Bump s3fs (#463) * Use mocking for AWS responses Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * Add change to release notes Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * Update release notes Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * Use pytest xfail instead of commenting out test Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> --------- Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): make SQL dataset examples runnable (#455) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): correct pandas-gbq as py311 dependency (#460) * update pandas-gbq dependency declaration Signed-off-by: Onur Kuru <kuru.onur1@gmail.com> * fix fmt Signed-off-by: Onur Kuru <kuru.onur1@gmail.com> --------- Signed-off-by: Onur Kuru <kuru.onur1@gmail.com> Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): Document `IncrementalDataset` (#468) Document IncrementalDataset Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Update datasets to be arguments keyword only (#466) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Clean up code for old dataset syntax compatibility (#465) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Update scikit-learn version (#469) Update scikit-learn version Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): support versioning data partitions (#447) * feat(datasets): support versioning data partitions Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Remove unused import Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): use keyword arguments when needed Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Apply suggestions from code review Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Update kedro-datasets/kedro_datasets/partitions/partitioned_dataset.py Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): Improve documentation index (#428) Rework documentation index Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): update wrong docstring about `con` (#461) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(datasets): Release `2.0.0` (#472) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(telemetry): Pin `PyYAML` (#474) Pin PyYaml Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(telemetry): Release 0.3.1 (#475) Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): Fix broken links in README (#477) Fix broken links in README Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): replace more "data_set" instances (#476) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): Fix doctests (#488) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): Fix delta + incremental dataset docstrings (#489) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(airflow): Post 0.19 cleanup (#478) * bump version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Unbump version and clean test Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Split big test into smaller tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update conftest Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update conftest Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Fix coverage Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Try unpin airflow Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * remove datacatalog step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Change node Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * update tasks test step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Revert to older airflow and constraint pendulum Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update template Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update message in e2e step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Final cleanup Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update kedro-airflow/pyproject.toml Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> * Pin apache-airflow again Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(airflow): Release 0.8.0 (#491) Bump version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix: telemetry metadata (#495) --------- Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix: Update tests on kedro-docker for 0.5.0 release. (#496) * bump version to 0.5.0 Signed-off-by: lrcouto <laurarccouto@gmail.com> * bump version to 0.5.0 Signed-off-by: lrcouto <laurarccouto@gmail.com> * update e2e tests to use new starters Signed-off-by: lrcouto <laurarccouto@gmail.com> * Lint Signed-off-by: lrcouto <laurarccouto@gmail.com> * update e2e tests to use new starters Signed-off-by: lrcouto <laurarccouto@gmail.com> * fix test path for e2e tests Signed-off-by: lrcouto <laurarccouto@gmail.com> * fix requirements path on dockerfiles Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * Remove redundant test Signed-off-by: lrcouto <laurarccouto@gmail.com> * Alter test for custom GID and UID Signed-off-by: lrcouto <laurarccouto@gmail.com> * Update release notes Signed-off-by: lrcouto <laurarccouto@gmail.com> * Revert version bump to put in in separate PR Signed-off-by: lrcouto <laurarccouto@gmail.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build: Release kedro-docker 0.5.0 (#497) * bump version to 0.5.0 Signed-off-by: lrcouto <laurarccouto@gmail.com> * bump version to 0.5.0 Signed-off-by: lrcouto <laurarccouto@gmail.com> * update e2e tests to use new starters Signed-off-by: lrcouto <laurarccouto@gmail.com> * Lint Signed-off-by: lrcouto <laurarccouto@gmail.com> * update e2e tests to use new starters Signed-off-by: lrcouto <laurarccouto@gmail.com> * fix test path for e2e tests Signed-off-by: lrcouto <laurarccouto@gmail.com> * fix requirements path on dockerfiles Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * Remove redundant test Signed-off-by: lrcouto <laurarccouto@gmail.com> * Alter test for custom GID and UID Signed-off-by: lrcouto <laurarccouto@gmail.com> * Update release notes Signed-off-by: lrcouto <laurarccouto@gmail.com> * Revert version bump to put in in separate PR Signed-off-by: lrcouto <laurarccouto@gmail.com> * Bump kedro-docker to 0.5.0 Signed-off-by: lrcouto <laurarccouto@gmail.com> * Add release notes Signed-off-by: lrcouto <laurarccouto@gmail.com> * Update kedro-docker/RELEASE.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): Update partitioned dataset docstring (#502) Update partitioned dataset docstring Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * Fix GeotiffDataset import + casing Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * Fix lint Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Relax pandas.HDFDataSet dependencies which are broken on Windows (#426) * Relax pandas.HDFDataSet dependencies which are broken on Window (#402) Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com> * Update RELEASE.md Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com> * Apply suggestions from code review Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> * Update kedro-datasets/setup.py Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> --------- Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com> Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix: airflow metadata (#498) * Add example pipeline entry to metadata declaration Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Fix entry Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Make entries consistent Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Add tools to config Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * fix: telemetry metadata (#495) --------- Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Revert "Add tools to config" This reverts commit 14732d772a3c2f4787063071a68fdf1512c93488. Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Quick fix Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Lint Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Remove outdated config key Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Use kedro new instead of cookiecutter Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> --------- Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Co-authored-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(airflow): Bump `apache-airflow` version (#511) * Bump apache airflow Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Change starter Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e test steps Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e test steps Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(datasets): Unpin dask (#522) * Unpin dask Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update doctest Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update doctest Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update kedro-datasets/setup.py Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Add `MatlabDataset` to `kedro-datasets` (#515) * Refork and commit kedro matlab datasets Signed-off-by: samuelleeshemen <samuel_lee_sj@aiap.sg> * Fix lint, add to docs Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Try fixing docstring Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Try fixing save Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Try fix docstest Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Fix unit tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update release notes: Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Not hardcode load mode Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: samuelleeshemen <samuel_lee_sj@aiap.sg> Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Co-authored-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(airflow): Pin `Flask-Session` version (#521) * Restrict pendulum version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update airflow init step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Remove pendulum pin Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update create connections step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Pin flask session Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add comment Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat: `kedro-airflow` group in memory nodes (#241) * feat: option to group in-memory nodes Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * fix: MemoryDataset Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * Update kedro-airflow/README.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/README.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/README.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/RELEASE.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/kedro_airflow/grouping.py Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/kedro_airflow/plugin.py Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/tests/test_node_grouping.py Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/tests/test_node_grouping.py Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/kedro_airflow/grouping.py Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/kedro_airflow/grouping.py Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * fix: tests Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * Bump minimum kedro version Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * fixes Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> --------- Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(datasets): Update pyproject.toml to pin Kedro 0.19 for kedro-datasets (#526) Update pyproject.toml Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(airflow): include environment name in DAG filename (#492) * feat: include environment name in DAG file Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * doc: add update to release notes Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> --------- Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Enable search-as-you type on Kedro-datasets docs (#532) * done Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix lint Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> --------- Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Debug and fix `kedro-datasets` nightly build failures (#541) * pin deltalake * Update kedro-datasets/setup.py Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> * Update setup.py Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * sort order and compare * Update setup.py * lint * pin deltalake * add comment to pin --------- Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Dataset Preview Refactor (#504) * test * done * change from _preview to preview * fix lint and tests * added docstrings * rtd fix * rtd fix * fix rtd Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix rtd Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix rtd - pls" Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * add nitpick ignore Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * test again Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * move tracking datasets to constant Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * remove comma Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * remove Newtype from json_dataset" Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * pls work Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * confirm rtd works locally Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * juanlu's fix Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix tests Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * remove unnecessary stuff from conf.py Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fixes based on review Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * changes based on review Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix tests Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * add suffix Preview Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * change img return type to bytes Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix tests Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * update release note * fix lint --------- Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> Co-authored-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com> Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Drop pyarrow constraint when using snowpark (#538) * Free pyarrow req Signed-off-by: Felipe Monroy <felipe.m02@gmail.com> * Free pyarrow req Signed-off-by: Felipe Monroy <felipe.m02@gmail.com> --------- Signed-off-by: Felipe Monroy <felipe.m02@gmail.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs: Update kedro-telemetry docs on which data is collected (#546) * Update data being collected --------- Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Signed-off-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com> Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(docker): Trying to fix e2e tests (#548) * Pin psutil Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add no capture to test Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update pip version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * U…
* refactor(datasets): deprecate "DataSet" type names (#328) * refactor(datasets): deprecate "DataSet" type names (api) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (biosequence) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (dask) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (databricks) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (email) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (geopandas) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (holoviews) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (json) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (matplotlib) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (networkx) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.csv_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.deltatable_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.excel_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.feather_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.gbq_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.generic_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.hdf_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.json_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.parquet_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.sql_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pandas.xml_dataset) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pickle) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (pillow) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (plotly) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (polars) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (redis) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (snowflake) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (spark) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (svmlight) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (tensorflow) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (text) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (tracking) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (video) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): deprecate "DataSet" type names (yaml) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): ignore TensorFlow coverage issues Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added basic code for geotiff Signed-off-by: tgoelles <thomas.goelles@gmail.com> * renamed to xarray Signed-off-by: tgoelles <thomas.goelles@gmail.com> * renamed to xarray Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added load and self args Signed-off-by: tgoelles <thomas.goelles@gmail.com> * only local files Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added empty test Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added test data Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added rioxarray requirements Signed-off-by: tgoelles <thomas.goelles@gmail.com> * reformat with black Signed-off-by: tgoelles <thomas.goelles@gmail.com> * rioxarray 0.14 Signed-off-by: tgoelles <thomas.goelles@gmail.com> * rioxarray 0.15 Signed-off-by: tgoelles <thomas.goelles@gmail.com> * rioxarray 0.12 Signed-off-by: tgoelles <thomas.goelles@gmail.com> * rioxarray 0.9 Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fixed dataset typo Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fixed docstring for sphinx Signed-off-by: tgoelles <thomas.goelles@gmail.com> * run black Signed-off-by: tgoelles <thomas.goelles@gmail.com> * sort imports Signed-off-by: tgoelles <thomas.goelles@gmail.com> * class docstring Signed-off-by: tgoelles <thomas.goelles@gmail.com> * black Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fixed pylint Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added release notes Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added yaml example Signed-off-by: tgoelles <thomas.goelles@gmail.com> * improve testing WIP Signed-off-by: tgoelles <thomas.goelles@gmail.com> * basic test success Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test reloaded Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test exists Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added version Signed-off-by: tgoelles <thomas.goelles@gmail.com> * basic test suite Signed-off-by: tgoelles <thomas.goelles@gmail.com> * run black Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added example and test it Signed-off-by: tgoelles <thomas.goelles@gmail.com> * deleted duplications Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fixed position of example Signed-off-by: tgoelles <thomas.goelles@gmail.com> * black Signed-off-by: tgoelles <thomas.goelles@gmail.com> * style: Introduce `ruff` for linting in all plugins. (#354) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * feat(datasets): create custom `DeprecationWarning` (#356) * feat(datasets): create custom `DeprecationWarning` Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * feat(datasets): use the custom deprecation warning Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): show Kedro's deprecation warnings Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * fix(datasets): remove unused imports in test files Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): add note about DataSet deprecation (#357) Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): skip `tensorflow` tests on Windows (#363) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci: Pin `tables` version (#370) * Pin tables version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Also fix kedro-airflow Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Revert trying to fix airflow Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(datasets): Release `1.7.1` (#378) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs: Update CONTRIBUTING.md and add one for `kedro-datasets` (#379) Update CONTRIBUTING.md + add one for kedro-datasets Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(datasets): Run tensorflow tests separately from other dataset tests (#377) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat: Kedro-Airflow convert all pipelines option (#335) * feat: kedro airflow convert --all option Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * docs: release docs Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> --------- Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): blacken code in rst literal blocks (#362) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs: cloudpickle is an interesting extension of the pickle functionality (#361) Signed-off-by: H. Felix Wittmann <hfwittmann@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Fix secret scan entropy error (#383) Fix secret scan entropy error Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * style: Rename mentions of `DataSet` to `Dataset` in `kedro-airflow` and `kedro-telemetry` (#384) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Migrated `PartitionedDataSet` and `IncrementalDataSet` from main repository to kedro-datasets (#253) Signed-off-by: Peter Bludau <ptrbld.dev@gmail.com> Co-authored-by: Merel Theisen <merel.theisen@quantumblack.com> * fix: backwards compatibility for `kedro-airflow` (#381) Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * added metadata Signed-off-by: tgoelles <thomas.goelles@gmail.com> * after linting Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ignore ruff PLR0913 Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Don't warn for SparkDataset on Databricks when using s3 (#341) Signed-off-by: Alistair McKelvie <alistair.mckelvie@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Hot fix for RTD due to bad pip version (#396) fix RTD Signed-off-by: Nok <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Pin pip version temporarily (#398) * Pin pip version temporarily Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Hive support failures Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Also pin pip on lint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Temporary ignore databricks spark tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * perf(datasets): don't create connection until need (#281) * perf(datasets): delay `Engine` creation until need Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore: don't check coverage in TYPE_CHECKING block Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * perf(datasets): don't connect in `__init__` method Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): fix tests to touch `create_engine` Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * perf(datasets): don't connect in `__init__` method Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * style(datasets): exec Ruff on sql_dataset.py :dog: Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Undo changes to `engines` values type (for Sphinx) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Patch Sphinx build by removing `Engine` references * perf(datasets): don't connect in `__init__` method Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): don't require coverage for import * chore(datasets): del unused `TYPE_CHECKING` import * docs(datasets): document lazy connection in README * perf(datasets): remove create in `SQLQueryDataset` Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): do not return the created conn Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore: Drop Python 3.7 support for kedro-plugins (#392) * Remove references to Python 3.7 Signed-off-by: lrcouto <laurarccouto@gmail.com> * Revert kedro-dataset changes Signed-off-by: lrcouto <laurarccouto@gmail.com> * Revert kedro-dataset changes Signed-off-by: lrcouto <laurarccouto@gmail.com> * Add information to release docs Signed-off-by: lrcouto <laurarccouto@gmail.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): support Polars lazy evaluation (#350) * feat(datasets) add PolarsDataset to support Polars's Lazy API Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * Fix(datasets): rename PolarsDataSet to PolarsDataSet Add PolarsDataSet as an alias for PolarsDataset with deprecation warning. Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * Fix(datasets): apply ruff linting rules Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * Fix(datasets): Correct pattern matching when Raising exceptions Corrected PolarsDataSet to PolarsDataset in the pattern to match in test_load_missing_file Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * fix(datasets): clean up PolarsDataset related code Remove reference to PolarsDataSet as this is not required for new dataset implementations. Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * feat(datasets): Rename Polars Datasets to better describe their intent Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * feat(datasets): clean up LazyPolarsDataset Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * fix(datasets): increase test coverage for PolarsDataset classes Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * docs(datasets): add renamed Polars datasets to docs Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * docs(datasets): Add new polars datasets to release notes Signed-off-by: Matthias Roels <mroels2@its.jnj.com> * fix(datasets): load_args not properly passed to LazyPolarsDataset.load Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> * docs(datasets): fix spelling error in release notes Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> --------- Signed-off-by: Matthias Roels <matthias.roels21@gmail.com> Signed-off-by: Matthias Roels <mroels2@its.jnj.com> Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Matthias Roels <mroels2@its.jnj.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(datasets): Release `1.8.0` (#406) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(airflow): Release 0.7.0 (#407) * bump version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update release notes Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(telemetry): Release 0.3.0 (#408) Bump version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(docker): Release 0.4.0 (#409) Bump version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * style(airflow): blacken README.md of Kedro-Airflow (#418) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Fix missing jQuery (#414) Fix missing jQuery Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Fix Lazy Polars dataset to use the new-style base class (#413) * Fix Lazy Polars dataset to use the new-style base class Fix gh-412 Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Update release notes Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Revert "Update release notes" This reverts commit 92ceea6d8fa412abf3d8abd28a2f0a22353867ed. --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): lazily load `partitions` classes (#411) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): fix code blocks and `data_set` use (#417) * chore(datasets): lazily load `partitions` classes Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): run doctests to check examples run Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): keep running tests amidst failures Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): format ManagedTableDataset example Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): ignore breaking mods for doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * style(airflow): black code in Kedro-Airflow README Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): fix example syntax, and autoformat Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `kedro.extras.datasets` ref Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `>>> ` prefix for YAML code Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `kedro.extras.datasets` ref Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): replace `data_set`s with `dataset`s Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): undo changes for running doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * revert(datasets): undo lazily load `partitions` classes Refs: 3fdc5a8efa034fa9a18b7683a942415915b42fb5 Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * revert(airflow): undo black code in Kedro-Airflow README Refs: dc3476ea36bac98e2adcc0b52a11b0f90001e31d Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix: TF model load failure when model is saved as a TensorFlow Saved Model format (#410) * fixes TF model load failure when model is saved as a TensorFlow Saved Model format when a model is saved in the TensorFlow SavedModel format ("tf" default option in tf.save_model when using TF 2.x) via the catalog.xml file, the subsequent loading of that model for further use in a subsequent node fails. The issue is linked to the fact that the model files don't get copied into the temporary folder, presumably because the _fs.get function "thinks" that the provided path is a file and not a folder. Adding an terminating "/" to the path fixes the issue. Signed-off-by: Edouard59 <68538605+Edouard59@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Drop support for Python 3.7 on kedro-datasets (#419) * Drop support for Python 3.7 on kedro-datasets Signed-off-by: lrcouto <laurarccouto@gmail.com> * Remove redundant 3.8 markers Signed-off-by: lrcouto <laurarccouto@gmail.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> Signed-off-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> * test(datasets): run doctests to check examples run (#416) * chore(datasets): lazily load `partitions` classes Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): run doctests to check examples run Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): keep running tests amidst failures Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): format ManagedTableDataset example Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): ignore breaking mods for doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * style(airflow): black code in Kedro-Airflow README Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): fix example syntax, and autoformat Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `kedro.extras.datasets` ref Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `>>> ` prefix for YAML code Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): remove `kedro.extras.datasets` ref Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * docs(datasets): replace `data_set`s with `dataset`s Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * refactor(datasets): run doctests separately Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * separate dataset-doctests Signed-off-by: Nok <nok.lam.chan@quantumblack.com> * chore(datasets): ignore non-passing tests to make CI pass Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): fix comment location Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): fix .py.py Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): don't measure coverage on doctest run Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * build(datasets): fix windows and snowflake stuff in Makefile Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: Nok <nok.lam.chan@quantumblack.com> Co-authored-by: Nok <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Add support for `databricks-connect>=13.0` (#352) Signed-off-by: Miguel Rodriguez Gutierrez <miguel7r@hotmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(telemetry): remove double execution by moving to after catalog created hook (#422) * remove double execution by moving to after catalog created hook Signed-off-by: Florian Roessler <roessler.fd@gmail.com> * update release notes Signed-off-by: Florian Roessler <roessler.fd@gmail.com> * fix tests Signed-off-by: Florian Roessler <roessler.fd@gmail.com> * remove unsued fixture Signed-off-by: Florian Roessler <roessler.fd@gmail.com> --------- Signed-off-by: Florian Roessler <roessler.fd@gmail.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs: Add python version support policy to plugin `README.md`s (#425) * Add python version support policy to plugin readmes Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * Temporarily pin connexion Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> --------- Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(airflow): Use new docs link (#393) Use new docs link Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * style: Add shared CSS and meganav to datasets docs (#400) * Add shared CSS and meganav Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Add end of file Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Add new heap data source Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * adjust heap parameter Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Remove nav_version next to Kedro logo in top left; add Kedro logo * Revise project name and author name Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Use full kedro icon and type for logo * Add close btn to mobile nav Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Add css for mobile nav logo image Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Update close button for mobile nav Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Add open button to mobile nav Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Delete kedro-datasets/docs/source/kedro-horizontal-color-on-light.svg Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Update conf.py Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> * Update layout.html Add links to subprojects Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * Remove svg from docs -- not needed?? Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> * linter error fix Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> --------- Signed-off-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> Co-authored-by: Tynan DeBold <thdebold@gmail.com> Co-authored-by: vladimir-mck <106236933+vladimir-mck@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Add Hugging Face datasets (#344) * Add HuggingFace datasets Co-authored-by: Danny Farah <danny_farah@mckinsey.com> Co-authored-by: Kevin Koga <Kevin_Koga@mckinsey.com> Co-authored-by: Mate Scharnitzky <Mate_Scharnitzky@mckinsey.com> Co-authored-by: Tomer Shor <Tomer_Shor@mckinsey.com> Co-authored-by: Pierre-Yves Mousset <Pierre-Yves_Mousset@mckinsey.com> Co-authored-by: Bela Chupal <Bela_chuphal@mckinsey.com> Co-authored-by: Khangjrakpam Arjun <Khangjrakpam_Arjun@mckinsey.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Apply suggestions from code review Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Joel <35801847+datajoely@users.noreply.github.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> * Typo Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Fix docstring Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add docstring for HFTransformerPipelineDataset Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Use intersphinx for cross references in Hugging Face docstrings Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add docstring for HFDataset Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add missing test dependencies Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add tests for huggingface datasets Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Fix HFDataset.save Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Add test for HFDataset.list_datasets Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Use new name Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Consolidate imports Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Danny Farah <danny_farah@mckinsey.com> Co-authored-by: Kevin Koga <Kevin_Koga@mckinsey.com> Co-authored-by: Mate Scharnitzky <Mate_Scharnitzky@mckinsey.com> Co-authored-by: Tomer Shor <Tomer_Shor@mckinsey.com> Co-authored-by: Pierre-Yves Mousset <Pierre-Yves_Mousset@mckinsey.com> Co-authored-by: Bela Chupal <Bela_chuphal@mckinsey.com> Co-authored-by: Khangjrakpam Arjun <Khangjrakpam_Arjun@mckinsey.com> Co-authored-by: Joel <35801847+datajoely@users.noreply.github.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): fix `dask.ParquetDataset` doctests (#439) * test(datasets): fix `dask.ParquetDataset` doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): use `tmp_path` fixture in doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): simplify by not passing the schema Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * test(datasets): ignore conftest for doctests cover Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Create MANIFEST.in Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * refactor: Remove `DataSet` aliases and mentions (#440) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * chore(datasets): replace "Pyspark" with "PySpark" (#423) Consistently write "PySpark" rather than "Pyspark" Also, fix list formatting Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): make `api.APIDataset` doctests run (#448) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): Fix `pandas.GenericDataset` doctest (#445) Fix pandas.GenericDataset doctest Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): make datasets arguments keywords only (#358) * feat(datasets): make `APIDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `BioSequenceDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `ParquetDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `EmailMessageDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `GeoJSONDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `HoloviewsWriter.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `JSONDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `MatplotlibWriter.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `GMLDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `GraphMLDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make NetworkX `JSONDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `PickleDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `ImageDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make plotly `JSONDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `PlotlyDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make polars `CSVDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make polars `GenericDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make redis `PickleDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SnowparkTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SVMLightDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `TensorFlowModelDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `TextDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `YAMLDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `ManagedTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `VideoDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `CSVDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `DeltaTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `ExcelDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `FeatherDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `GBQTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `GenericDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make pandas `JSONDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make pandas `ParquerDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SQLTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `XMLDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `HDFDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `DeltaTableDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SparkDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SparkHiveDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SparkJDBCDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `SparkStreamingDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `IncrementalDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * feat(datasets): make `LazyPolarsDataset.__init__` keyword only Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * docs(datasets): update doctests for HoloviewsWriter Signed-off-by: Felix Scherz <felixwscherz@gmail.com> * Update release notes Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> --------- Signed-off-by: Felix Scherz <felixwscherz@gmail.com> Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Co-authored-by: Felix Scherz <felixwscherz@gmail.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Drop support for python 3.8 on kedro-datasets (#442) * Drop support for python 3.8 on kedro-datasets --------- Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Signed-off-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): add outputs to matplotlib doctests (#449) * test(datasets): add outputs to matplotlib doctests Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Update Makefile Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Reformat code example, line length is short enough * Update kedro-datasets/kedro_datasets/matplotlib/matplotlib_writer.py Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): Fix more doctest issues (#451) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): fix failing doctests in Windows CI (#457) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): fix accidental reference to NumPy (#450) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): don't pollute dev env in doctests (#452) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat: Add tools to heap event (#430) * Add add-on data to heap event Signed-off-by: lrcouto <laurarccouto@gmail.com> * Move addons logic to _get_project_property Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add condition for pyproject.toml Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Fix tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Fix tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * add tools to mock Signed-off-by: lrcouto <laurarccouto@gmail.com> * lint Signed-off-by: lrcouto <laurarccouto@gmail.com> * Update tools test Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add after_context_created tools test Signed-off-by: lrcouto <laurarccouto@gmail.com> * Update rename to tools Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update kedro-telemetry/tests/test_plugin.py Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Co-authored-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(datasets): install deps in single `pip install` (#454) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(datasets): Bump s3fs (#463) * Use mocking for AWS responses Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * Add change to release notes Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * Update release notes Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * Use pytest xfail instead of commenting out test Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> --------- Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * test(datasets): make SQL dataset examples runnable (#455) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): correct pandas-gbq as py311 dependency (#460) * update pandas-gbq dependency declaration Signed-off-by: Onur Kuru <kuru.onur1@gmail.com> * fix fmt Signed-off-by: Onur Kuru <kuru.onur1@gmail.com> --------- Signed-off-by: Onur Kuru <kuru.onur1@gmail.com> Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): Document `IncrementalDataset` (#468) Document IncrementalDataset Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Update datasets to be arguments keyword only (#466) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Clean up code for old dataset syntax compatibility (#465) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: Update scikit-learn version (#469) Update scikit-learn version Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): support versioning data partitions (#447) * feat(datasets): support versioning data partitions Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Remove unused import Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * chore(datasets): use keyword arguments when needed Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Apply suggestions from code review Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> * Update kedro-datasets/kedro_datasets/partitions/partitioned_dataset.py Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> --------- Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): Improve documentation index (#428) Rework documentation index Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): update wrong docstring about `con` (#461) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(datasets): Release `2.0.0` (#472) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(telemetry): Pin `PyYAML` (#474) Pin PyYaml Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(telemetry): Release 0.3.1 (#475) Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(datasets): Fix broken links in README (#477) Fix broken links in README Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): replace more "data_set" instances (#476) Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): Fix doctests (#488) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): Fix delta + incremental dataset docstrings (#489) Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(airflow): Post 0.19 cleanup (#478) * bump version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Unbump version and clean test Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Split big test into smaller tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update conftest Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update conftest Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Fix coverage Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Try unpin airflow Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * remove datacatalog step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Change node Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * update tasks test step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Revert to older airflow and constraint pendulum Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update template Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update message in e2e step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Final cleanup Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update kedro-airflow/pyproject.toml Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> * Pin apache-airflow again Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build(airflow): Release 0.8.0 (#491) Bump version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix: telemetry metadata (#495) --------- Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix: Update tests on kedro-docker for 0.5.0 release. (#496) * bump version to 0.5.0 Signed-off-by: lrcouto <laurarccouto@gmail.com> * bump version to 0.5.0 Signed-off-by: lrcouto <laurarccouto@gmail.com> * update e2e tests to use new starters Signed-off-by: lrcouto <laurarccouto@gmail.com> * Lint Signed-off-by: lrcouto <laurarccouto@gmail.com> * update e2e tests to use new starters Signed-off-by: lrcouto <laurarccouto@gmail.com> * fix test path for e2e tests Signed-off-by: lrcouto <laurarccouto@gmail.com> * fix requirements path on dockerfiles Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * Remove redundant test Signed-off-by: lrcouto <laurarccouto@gmail.com> * Alter test for custom GID and UID Signed-off-by: lrcouto <laurarccouto@gmail.com> * Update release notes Signed-off-by: lrcouto <laurarccouto@gmail.com> * Revert version bump to put in in separate PR Signed-off-by: lrcouto <laurarccouto@gmail.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * build: Release kedro-docker 0.5.0 (#497) * bump version to 0.5.0 Signed-off-by: lrcouto <laurarccouto@gmail.com> * bump version to 0.5.0 Signed-off-by: lrcouto <laurarccouto@gmail.com> * update e2e tests to use new starters Signed-off-by: lrcouto <laurarccouto@gmail.com> * Lint Signed-off-by: lrcouto <laurarccouto@gmail.com> * update e2e tests to use new starters Signed-off-by: lrcouto <laurarccouto@gmail.com> * fix test path for e2e tests Signed-off-by: lrcouto <laurarccouto@gmail.com> * fix requirements path on dockerfiles Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * update tests to fit with current log format Signed-off-by: lrcouto <laurarccouto@gmail.com> * Remove redundant test Signed-off-by: lrcouto <laurarccouto@gmail.com> * Alter test for custom GID and UID Signed-off-by: lrcouto <laurarccouto@gmail.com> * Update release notes Signed-off-by: lrcouto <laurarccouto@gmail.com> * Revert version bump to put in in separate PR Signed-off-by: lrcouto <laurarccouto@gmail.com> * Bump kedro-docker to 0.5.0 Signed-off-by: lrcouto <laurarccouto@gmail.com> * Add release notes Signed-off-by: lrcouto <laurarccouto@gmail.com> * Update kedro-docker/RELEASE.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> --------- Signed-off-by: lrcouto <laurarccouto@gmail.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(datasets): Update partitioned dataset docstring (#502) Update partitioned dataset docstring Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * Fix GeotiffDataset import + casing Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> * Fix lint Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Relax pandas.HDFDataSet dependencies which are broken on Windows (#426) * Relax pandas.HDFDataSet dependencies which are broken on Window (#402) Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com> * Update RELEASE.md Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com> * Apply suggestions from code review Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> * Update kedro-datasets/setup.py Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> --------- Signed-off-by: Yolan Honoré-Rougé <yolan.honore.rouge@gmail.com> Signed-off-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix: airflow metadata (#498) * Add example pipeline entry to metadata declaration Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Fix entry Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Make entries consistent Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Add tools to config Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * fix: telemetry metadata (#495) --------- Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Revert "Add tools to config" This reverts commit 14732d772a3c2f4787063071a68fdf1512c93488. Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Quick fix Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Lint Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Remove outdated config key Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> * Use kedro new instead of cookiecutter Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> --------- Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com> Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Co-authored-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore(airflow): Bump `apache-airflow` version (#511) * Bump apache airflow Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Change starter Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e test steps Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update e2e test steps Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(datasets): Unpin dask (#522) * Unpin dask Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update doctest Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update doctest Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update kedro-datasets/setup.py Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Add `MatlabDataset` to `kedro-datasets` (#515) * Refork and commit kedro matlab datasets Signed-off-by: samuelleeshemen <samuel_lee_sj@aiap.sg> * Fix lint, add to docs Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Try fixing docstring Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Try fixing save Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Try fix docstest Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Fix unit tests Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update release notes: Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Not hardcode load mode Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: samuelleeshemen <samuel_lee_sj@aiap.sg> Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Co-authored-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(airflow): Pin `Flask-Session` version (#521) * Restrict pendulum version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update airflow init step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Remove pendulum pin Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update create connections step Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Pin flask session Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add comment Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat: `kedro-airflow` group in memory nodes (#241) * feat: option to group in-memory nodes Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * fix: MemoryDataset Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * Update kedro-airflow/README.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/README.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/README.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/RELEASE.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/kedro_airflow/grouping.py Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/kedro_airflow/plugin.py Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/tests/test_node_grouping.py Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/tests/test_node_grouping.py Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/kedro_airflow/grouping.py Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * Update kedro-airflow/kedro_airflow/grouping.py Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> * fix: tests Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * Bump minimum kedro version Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * fixes Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> --------- Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> Signed-off-by: Simon Brugman <sbrugman@users.noreply.github.com> Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(datasets): Update pyproject.toml to pin Kedro 0.19 for kedro-datasets (#526) Update pyproject.toml Signed-off-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(airflow): include environment name in DAG filename (#492) * feat: include environment name in DAG file Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> * doc: add update to release notes Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> --------- Signed-off-by: Simon Brugman <sfbbrugman@gmail.com> Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Enable search-as-you type on Kedro-datasets docs (#532) * done Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix lint Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> --------- Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Debug and fix `kedro-datasets` nightly build failures (#541) * pin deltalake * Update kedro-datasets/setup.py Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> * Update setup.py Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * sort order and compare * Update setup.py * lint * pin deltalake * add comment to pin --------- Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * feat(datasets): Dataset Preview Refactor (#504) * test * done * change from _preview to preview * fix lint and tests * added docstrings * rtd fix * rtd fix * fix rtd Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix rtd Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix rtd - pls" Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * add nitpick ignore Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * test again Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * move tracking datasets to constant Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * remove comma Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * remove Newtype from json_dataset" Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * pls work Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * confirm rtd works locally Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * juanlu's fix Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix tests Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * remove unnecessary stuff from conf.py Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fixes based on review Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * changes based on review Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix tests Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * add suffix Preview Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * change img return type to bytes Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * fix tests Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> * update release note * fix lint --------- Signed-off-by: rashidakanchwala <rashida_kanchwala@mckinsey.com> Co-authored-by: ravi-kumar-pilla <ravi_kumar_pilla@mckinsey.com> Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix(datasets): Drop pyarrow constraint when using snowpark (#538) * Free pyarrow req Signed-off-by: Felipe Monroy <felipe.m02@gmail.com> * Free pyarrow req Signed-off-by: Felipe Monroy <felipe.m02@gmail.com> --------- Signed-off-by: Felipe Monroy <felipe.m02@gmail.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs: Update kedro-telemetry docs on which data is collected (#546) * Update data being collected --------- Signed-off-by: Dmitry Sorokin <dmd40in@gmail.com> Signed-off-by: Dmitry Sorokin <40151847+DimedS@users.noreply.github.com> Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * ci(docker): Trying to fix e2e tests (#548) * Pin psutil Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Add no capture to test Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update pip version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update call Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Update pip Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * pip ruamel Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * change pip v Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * change pip v Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * show stdout Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * use no cache dir Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * revert extra changes Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * pin pip Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * gitpod Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * pip inside dockerfile Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * pip pip inside dockerfile Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * chore: bump actions versions (#539) * Unpin pip and bump actions versions Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * remove version Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Revert unpinning of pip Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * docs(telemetry): Direct readers to Kedro documentation for further information on telemetry (#555) * Direct readers to Kedro documentation for further information on telemetry Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> * Wording improvements Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> * Amend README section Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> --------- Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com> Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space> Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Signed-off-by: tgoelles <thomas.goelles@gmail.com> * fix: kedro-telemetry masking (#552) * Fix masking Signed-off-by: Dmitr…
Description
For optimal data processing, it is recommended to use Polar's Lazy API (instead of the Eager one) so it is only natural that kedro-datasets supports it too.
Closes #224
Development notes
Checklist
RELEASE.md
file