Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master merge for 1.3.0 release #1971

Merged
merged 25 commits into from
Oct 22, 2024
Merged

master merge for 1.3.0 release #1971

merged 25 commits into from
Oct 22, 2024

Conversation

rudolfix
Copy link
Collaborator

Description

master merge for 1.3.0 release

sh-rp and others added 24 commits October 8, 2024 14:30
…1507)

* add simple ibis helper

* start working on dataframe reading interface

* a bit more work

* first simple implementation

* small change

* more work on dataset

* some work on filesystem destination

* add support for parquet files and compression on jsonl files in filesystem dataframe implementation

* fix test after devel merge

* add nice composable pipeline example

* small updates to demo

* enable tests for all bucket providers
remove resource based dataset accessor

* fix tests

* create views in duckdb filesystem accessor

* move to relations based interface

* add generic duckdb interface to filesystem

* move code for accessing frames and tables to the cursor and use duckdb dbapi cursor in filesystem

* add native db api cursor fetching to exposed dataset

* some small changes

* switch dataaccess pandas to pyarrow

* add native bigquery support for df and arrow tables

* change iter functions to always expect chunk size (None will default to full frame/table)

* add native implementation for databricks

* add dremio native implementation for full frames and tables

* fix filesystem test
make filesystem duckdb instance use glob pattern

* add test for evolving filesystem

* fix empty dataframe retrieval

* remove old df test

* clean up interfaces a bit (more to come?)
remove pipeline dependency from dataset

* move dataset creation into destination client and clean up interfaces / reference a bit more

* renames some interfaces and adds brief docstrings

* add filesystem cached duckdb and remove the need to declare needed views for filesystem

* fix tests for snowflake

* make data set a function

* fix db-types depdency for bigquery

* create duckdb based sql client for filesystem

* fix example pipeline

* enable filesystem sql client to work on streamlit

* add comments

* rename sql to query
remove unneeded code

* fix tests that rely on sql client

* post merge cleanups

* move imports around a bit

* exclude abfss buckets from test

* add support for arrow schema creation from known dlt schema

* re-use sqldatabase code for cursors

* fix bug

* add default columns where needed

* add sql glot to filesystem deps

* store filesystem tables in correct dataset

* move cursor columns location

* fix snowflake and mssql
disable tests with sftp

* clean up compose files a bit

* fix sqlalchemy

* add mysql docker compose file

* fix linting

* prepare hint checking

* disable part of state test

* enable hint check

* add column type support for filesystem json

* rename dataset implementation to DBAPI
remove dataset specific code from destination client

* wrap functions in dbapi readable dataset

* remove example pipeline

* rename test_decimal_name

* make column code a bit clearer and fix mssql again

* rename df methods to pandas

* fix bug in default columns

* fix hints test and columns bug
removes some uneeded code

* catch mysql error if no rows returned

* add exceptions for not implemented bucket and filetypes

* fix docs

* add config section for getting pipeline clients

* set default dataset in filesystem sqlclient

* add config section for sync_destination

* rename readablerelation methods

* use more functions of the duckdb sql client in filesystem version

* update dependencies

* use active pipeline capabilities if available for arrow table

* update types

* rename dataset accessor function

* add test for accessing tables with unquqlified tablename

* fix sql client

* add duckdb native support for azure, s3 and gcs (via s3)

* some typing

* add dataframes tests back in

* add join table and update view tests for filesystem

* start adding tests for creating views on remote duckdb

* fix snippets

* fix some dependencies and mssql/synapse tests

* fix bigquery dependencies and abfss tests

* add tests for adding view to external dbs and persistent secrets

* add support for delta tables

* add duckdb to read interface tests

* fix delta tests

* make default secret name derived from bucket url

* try fix azure tests again

* fix df access tests

* PR fixes

* correct internal table access

* allow datasets without schema

* skips parametrized queries, skips tables from non-dataset schemas

* move filesystem specific sql_client tests to correct location and test a few more things

* fix sql client tests

* make secret name when dropping optional

* fix gs test

* remove moved filesystem tests from test_read_interfaces

* fix sql client tests again... :)

* clear duckdb secrets

* disable secrets deleting for delta tests

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* adds sources registry and factory, allows for late config binding and rename, wraps standalone resources

* converts rest_api to a standard source

* marks secret values with Annotated, allows regular types to be used in configs

* reduces the number of modules imported on initial dlt import

* removes resource rename via AST in dlt init, provides new templates

* replaces hardcoded paths to settings and data with pluggable run context

* fixes init command tests

* adds plugin system and example plugin tests

* uses run context to load secrets / configs

* adds run context name to source reference and uses it to resolve

* fixes module name and wrong SPEC for single resource sources when registering

* adds pluggy

* adds methods to get location of entities to run context

* fixes toml provider to write toml objects, fixes toml writing to not override old documents and preserve comments

* simplifies init command, makes sure it creates files according to run context

* fixes dbt test venv, prepares to use uv

* adds SPEC for callable resources

* fixes wrong SPEC passed to single resource source

* allows mock run context to read from env

* fixes oauth2 auth dataclass

* fixes secrets masking for shorthand auth

* adds rest_api auth secret config injections tests, fixes some others

* fixes docstrings

* allows source references to python modules out of registry

* fixes lock
* allows to pass run_dir to RunContext init

* uses uv to install deps in Venv if found

* removes semver deprecations
* make cli commands pluggable

* make deploy command behave correctly if not available

* add global debug flag

* * move plugin interface
* add tests for cli plugin discovery
* allow plugins to overwrite core cli commands

* ensure plugins take precedence over core commands
#1871)

* adds oauth2_client_credentials to authentication short hands

* adds documentation on auth shorthand `oauth2_client_credentials`

* fixes types

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* users path normalize for columns in arrow tables

* adds sqlglot to pipeline dev group

* improves normalization tests, improves docstrings
* create first version of dataset factory

* update all destination implementations for getting the newest schema, fixed linter errors, made dataset aware of config types

* test retrieval of schema for all destinations (except custom destination)

* add simple tests for schema selection in dataset tests

* unify filesystem schema behavior with other destinations

* fix gcs delta tests

* try to fix ci errors

* allow athena in a kind of "read only" mode

* fix delta table tests?

* mark dataset factory as private

* change signature and behavior of get_stored_schema

* fix weaviate schema retrieval

* switch back to  properties
* remove unneeded clickhouse config vars

* connect to buckets with https by default and make this configurable

* update docs and add some tests

* make utils test essential
* allows to pass run_dir via plugin hook + arbitrary args

* adds name, data_dir and pipeline deprecation to run_configuration, renames to runtime_configuration

* adds before_add, after_remove and improves add_extra when adding to container, tracks reference to container in context

* merges run context and provider context, exposes init providers via run context

* initializes loggers with run context

* does not use config injection when creating default requests Client

* removes duplicated code for examples and doc snippets

* allows to init requests helper without runtime injection, uses re-entrant locks when injecting context

* disables sentry on CI

* renames config provider context to container, improves telemetry fixtures in tests
* add universal exception wrapper and configurable docs urls

* fix imports

* add simple entries for schema and telemetry commands on cli page

* fix tests

* fix deploy command tests
* SQL Database: Support including NULL cursor values

* Support exclude option

* Test skip import

* Always add exclude condition, import sqlalchemy from common lib
* Add `references` table hint

+ reflect references from foreign keys in sqlalchemy source

* Fix table as boolean

* Merge references

* sqla Ignore other schema and missing referenced tables

* Lint

* Add resolve_foreign_keys option

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
* add simple test and suppres exception on non-existing versions table

* change existig test to verify that dropping unknown tables fails silently

* test delete_schema command for not existing versions table

* simplify check

* undo suppress of schema delete

* fixes drop table tests

---------

Co-authored-by: Marcin Rudolf <rudolfix@rudolfix.org>
…1968)

* supports Motherduck md:? connstr and env variable for token

* supports bigquery partition expiration days

* removes test code
@rudolfix rudolfix added the ci full run the full load tests on pr label Oct 21, 2024
@rudolfix rudolfix self-assigned this Oct 21, 2024
Copy link

netlify bot commented Oct 21, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 4926d1d
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/671668ef761bf90008e851f7
😎 Deploy Preview https://deploy-preview-1971--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rudolfix rudolfix merged commit 1893860 into master Oct 22, 2024
74 of 90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci full run the full load tests on pr
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants