build(deps-dev): bump dbldatagen from 0.3.5 to 0.4.0 #637

dependabot · 2024-06-10T08:28:09Z

Bumps dbldatagen from 0.3.5 to 0.4.0.

Release notes

release 0.4.0

This release adds the following new features:

various bug fixes

support for Constraints

support for standard datasets

The new standard dataset feature allows creation of synthetic data sets in just a couple of lines of code for benchmarking / optimization and other purposes

release/v.0.3.6.1

Hot fixes post v0.3.6

Updates to documentation

updates to enable dbldatagen work better with Databricks Connect

bumped version

Release v0.3.6

This release includes fixes for use of dbldatagen on the Databricks shared clusters

Changelog

Sourced from dbldatagen's changelog.

Version 0.4.0

Changed

Updated minimum pyspark version to be 3.2.1, compatible with Databricks runtime 10.4 LTS or later

Modified data generator to allow specification of constraints to the data generation process

Updated documentation for generating text data.

Modified data distribiutions to use abstract base classes

migrated data distribution tests to use pytest

Additional standard datasets

Added

Added classes for constraints on the data generation via new package dbldatagen.constraints

Added support for standard data sets via the new package dbldatagen.datasets

Version 0.3.6 Post 1

Changed

Updated docs for complex data types / JSON to correct code examples

Updated license file in public docs

Fixed

Fixed scenario where DataAnalyzer is used on dataframe containing a column named summary

Version 0.3.6

Changed

Updated readme to include details on which versions of Databricks runtime support Unity Catalog shared access mode.

Updated code to use default parallelism of 200 when using a shared Spark session

Updated code to use Spark's SQL function element_at instead of array indexing due to incompatibility

Notes

Ths version marks the changing minimum version of Databricks runtime to 10.4 LTS and later releases.

While there are no known incompatibilities with Databricks 9.1 LTS, we will not test against this release

Commits

aae8bde Changed release version to be 0.4.0 (#271)
4206b5c Feature standard datasets - part 2 (#286)
da1df6b Feature standard datasets - part 1 (#258)
2d51200 Revert "PR To test updates to process (#278)" (#284)
2863ac7 PR To test updates to process (#278)
8136ccf Feature distribution changes - migrated tests to Pytest, use of abstract base...
82ce5ce Feature constraints (#257)
b28602d Misc doc changes (#268)
02d529e changes to actions (#276)
2482dca Feature hotfixes (#274)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [dbldatagen](https://github.com/databrickslabs/data-generator) from 0.3.5 to 0.4.0. - [Release notes](https://github.com/databrickslabs/data-generator/releases) - [Changelog](https://github.com/databrickslabs/dbldatagen/blob/master/CHANGELOG.md) - [Commits](databrickslabs/dbldatagen@release/v0.3.5...release/v0.4.0) --- updated-dependencies: - dependency-name: dbldatagen dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>

ireneisdoomed

Constraints is an interesting feature that can be useful in scenarios we currently have:

When the mock data requires specific conditions at the time of generation.
- The library already covered examples like this one where we define a list of possible values, but these additions adds much more flexibility.
When the mock data requires specific conditions at the time of usage in a particular unit test.
- For example here, where we need study locus with an empty ldSet.

From their documentation, this is an example of how it works:

import dbldatagen as dg

data_rows = 10000000

dataspec = dg.DataGenerator(spark, rows=10000000, partitions=8)

dataspec = (
    dataspec.withColumn("name", "string", template=r"\\w \\w|\\w a. \\w")
    .withColumn(
        "product_sku", "string", minValue=1000000, maxValue=1000000 + 1000, prefix="dr", random=True
    )
    .withColumn("email", "string", template=r"\\w.\\w@\\w.com")
    .withColumn("qty_ordered", "int", minValue=1, maxValue=10, distribution="normal", random=True)
    .withColumn("unit_price", "float", minValue=1.0, maxValue=30.0, step=0.01, distribution="normal",
                baseColumn="product_sku", baseColumnType="hash")
    .withColumn("order_ts", "timestamp", begin="2020-01-01 01:00:00",
                end="2020-12-31 23:59:00",
                interval="1 minute", random=True )
    .withColumn("shipping_ts", "timestamp", begin="2020-01-05 01:00:00",
                end="2020-12-31 23:59:00",
                interval="1 minute", random=True, percentNulls=0.1)
    .withSqlConstraint(""shipping_ts is null or shipping_ts > order_ts"")
)
df1 = dataspec.build()

I wouldn't spend time in changing what we currently have, but it's just worth knowing it exists.

Bumps [dbldatagen](https://github.com/databrickslabs/data-generator) from 0.3.5 to 0.4.0. - [Release notes](https://github.com/databrickslabs/data-generator/releases) - [Changelog](https://github.com/databrickslabs/dbldatagen/blob/master/CHANGELOG.md) - [Commits](databrickslabs/dbldatagen@release/v0.3.5...release/v0.4.0) --- updated-dependencies: - dependency-name: dbldatagen dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

This reverts commit fd06ae1.

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Jun 10, 2024

dependabot bot force-pushed the dependabot/pip/dbldatagen-0.4.0 branch from 82f3b2e to 5bf615a Compare June 10, 2024 08:41

Merge branch 'dev' into dependabot/pip/dbldatagen-0.4.0

102f077

github-actions bot added Build size-XS labels Jun 11, 2024

chore: merge

c85fb56

ireneisdoomed approved these changes Jun 13, 2024

View reviewed changes

ireneisdoomed merged commit 976ee30 into dev Jun 13, 2024
4 checks passed

ireneisdoomed deleted the dependabot/pip/dbldatagen-0.4.0 branch June 13, 2024 09:29

project-defiant pushed a commit that referenced this pull request Jul 12, 2024

Revert "build(deps-dev): bump dbldatagen from 0.3.5 to 0.4.0 (#637)"

124680a

This reverts commit fd06ae1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(deps-dev): bump dbldatagen from 0.3.5 to 0.4.0 #637

build(deps-dev): bump dbldatagen from 0.3.5 to 0.4.0 #637

dependabot bot commented on behalf of github Jun 10, 2024 •

edited

Loading

ireneisdoomed left a comment

build(deps-dev): bump dbldatagen from 0.3.5 to 0.4.0 #637

build(deps-dev): bump dbldatagen from 0.3.5 to 0.4.0 #637

Conversation

dependabot bot commented on behalf of github Jun 10, 2024 • edited Loading

release 0.4.0

release/v.0.3.6.1

Release v0.3.6

Version 0.4.0

Changed

Added

Version 0.3.6 Post 1

Changed

Fixed

Version 0.3.6

Changed

Notes

ireneisdoomed left a comment

Choose a reason for hiding this comment

dependabot bot commented on behalf of github Jun 10, 2024 •

edited

Loading