Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Updating latest release v1.5.5 in Docs #17992

Merged
merged 2 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 29 additions & 20 deletions openmetadata-docs/content/partials/v1.5/releases/latest.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# 1.5.4 Release 🎉
# 1.5.5 Release 🎉

{% note noteType="Tip" %}
**Sep 13th, 2024**
**Sep 25th, 2024**
{% /note %}

{% inlineCalloutContainer %}
Expand All @@ -10,28 +10,37 @@ color="violet-70"
icon="celebration"
bold="Upgrade OpenMetadata"
href="/deployment/upgrade" %}
Learn how to upgrade your OpenMetadata instance to 1.5.4!
Learn how to upgrade your OpenMetadata instance to 1.5.5!
{% /inlineCallout %}
{% /inlineCalloutContainer %}

You can find the GitHub release [here](https://github.com/open-metadata/OpenMetadata/releases/tag/1.5.4-release).
You can find the GitHub release [here](https://github.com/open-metadata/OpenMetadata/releases/tag/1.5.5-release).

# What's Changed

## OpenMetadata
- Hotfix to the Term Aggregation size on Data Insights
- ES pagination with error handling
- Updated Domain in Docker Compose & Docs
- Fix Classification API returns Table class for restore
- Fix Redshift View Def regex_replace Error
- Make ingestion pipeline APIs public
- Updating the domain PRINCIPAL DOMAIN
- Glossary list selector for bulk import
- Unable to access the import glossary page
- Made the type optional in ES Response.
- Added support for refresh tokens with multiple tabs open.
- Resolved issue of overriding user info after login.
- Updated the custom property entities data model, along with the data product and database schema icons.
- Ensured Teams and Owner fields are correctly passed in the policy API call.
- Enhanced PII logging information.
- Addressed the paginate_es issue in OpenSearch.
- Decrypted JWT internally for system health checks.
- Implemented multithreading in View Lineage Processing.
- Improved search relevancy.
- Resolved issue with owners patch.
- Fixed Snowflake data diff issue.
- Updated Presidio Analyzer version and validated support for legal entities.
- Added validations for Salesforce connection.
- Allowed PII Processor to operate without storing sample data.
- Added seconds to the human-readable format scale for test case graphs.
- Added missing field in glossary term.
- Excluded defaultPersona if not present in personas.
- Resolved team export issue.
- Updated Python lineage SDK to work with UUID and FQN models.
- Fixed LDAP login issue.
- Column sizing of data quality and pipeline widget ${CollateIconWithLinkMD}
- Export with new line in description ${CollateIconWithLinkMD}
- Fix Page entity publicationDate datatype ${CollateIconWithLinkMD}

## Collate
- Fix token limitations using config
- Fix Automator pagination
- Fix MetaPilot push for no constraint

**Full Changelog**: https://github.com/open-metadata/OpenMetadata/compare/1.5.3-release...1.5.4-release
**Full Changelog**: https://github.com/open-metadata/OpenMetadata/compare/1.5.4-release...1.5.5-release
200 changes: 33 additions & 167 deletions openmetadata-docs/content/partials/v1.6/releases/latest.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# 1.5.0 Release 🎉
# 1.5.5 Release 🎉

{% note noteType="Tip" %}
**Aug 26th, 2024**
**Sep 25th, 2024**
{% /note %}

{% inlineCalloutContainer %}
Expand All @@ -10,171 +10,37 @@ color="violet-70"
icon="celebration"
bold="Upgrade OpenMetadata"
href="/deployment/upgrade" %}
Learn how to upgrade your OpenMetadata instance to 1.5.0!
Learn how to upgrade your OpenMetadata instance to 1.5.5!
{% /inlineCallout %}
{% /inlineCalloutContainer %}

You can find the GitHub release [here](https://github.com/open-metadata/OpenMetadata/releases/tag/1.5.0-release).

# Backward Incompatible Changes

## Multi Owners
OpenMetadata allows a single user or a team to be tagged as owners for any data assets. In Release 1.5.0, we allow users to tag multiple individual owners or a single team. This will allow organizations to add ownership to multiple individuals without necessarily needing to create a team around them like previously.

This is a backward incompatible change, if you are using APIs, please make sure the owner field is now changed to “owners”

## Import/Export Format
To support the multi-owner format, we have now changed how we export and import the CSV file in glossary, services, database, schema, table, etc. The new format will be
user:userName;team:TeamName

If you are importing an older file, please make sure to make this change.

## Pydantic V2
The core of OpenMetadata are the JSON Schemas that define the metadata standard. These schemas are automatically translated into Java, Typescript, and Python code with Pydantic classes.

In this release, we have [migrated](https://docs.pydantic.dev/latest/migration/) the codebase from Pydantic V1 to Pydantic V2.

## Deployment Related Changes (OSS only)

`./bootstrap/bootstrap_storage.sh` **removed**

OpenMetadata community has built rolling upgrades to database schema and the data to make upgrades easier. This tool is now called as ./bootstrap/openmetadata-ops.sh and has been part of our releases since 1.3. The `bootstrap_storage.sh` doesn’t support new native schemas in OpenMetadata. Hence, we have deleted this tool from this release.

While upgrading, please refer to our Upgrade Notes in the documentation. Always follow the best practices provided there.

## Database Connection Pooling

OpenMetadata uses Jdbi to handle database-related operations such as read/write/delete. In this release, we introduced additional configs to help with connection pooling, allowing the efficient use of a database with low resources.

Please update the defaults if your cluster is running at a large scale to scale up the connections efficiently.

For the new configuration, please refer to the [doc](https://docs.open-metadata.org/latest/deployment/database-connection-pooling) here

## Data Insights

The Data Insights application is meant to give you a quick glance at your data's state and allow you to take action based on the information you receive. To continue pursuing this objective, the application was completely refactored to allow customizability.

Part of this refactor was making Data Insights an internal application, no longer relying on an external pipeline. This means triggering Data Insights from the Python SDK will no longer be possible.

With this change you will need to run a backfill on the Data Insights for the last couple of days since the Data Assets data changed.

## UI Changes

### New Explore Page

Explore page displays hierarchically organized data assets by grouping them into `services > database > schema > tables/stored procedures`. This helps users organically find the data asset they are looking for based on a known database or schema they were using. This is a new feature and changes the way the Explore page was built in previous releases.

### Connector Schema Changes

In the latest release, several updates and enhancements have been made to the JSON schema across various connectors. These changes aim to improve security, configurability, and expand integration capabilities. Here's a detailed breakdown of the updates:

- **KafkaConnect**: Added `schemaRegistryTopicSuffixName` to enhance topic configuration flexibility for schema registries.
- **GCS Datalake**: Introduced `bucketNames` field, allowing users to specify targeted storage buckets within the Google Cloud Storage environment.
- **OpenLineage**: Added `saslConfig` to enhance security by enabling SASL (Simple Authentication and Security Layer) configuration.
- **Salesforce**: Added sslConfig to strengthen the security layer for Salesforce connections by supporting SSL.
- **DeltaLake**: Updated schema by moving metastoreConnection to a newly created `metastoreConfig.json` file. Additionally, introduced `configSource` to better define source configurations, with new support for `metastoreConfig.json` and `storageConfig.json`.
- **Iceberg RestCatalog**: Removed clientId and `clientSecret` as mandatory fields, making the schema more flexible for different authentication methods.
- **DBT Cloud Pipelines**: Added as a new connector to support cloud-native data transformation workflows using DBT.
- **Looker**: Expanded support to include connections using GitLab integration, offering more flexible and secure version control.
- **Tableau**: Enhanced support by adding capabilities for connecting with `TableauPublishedDatasource` and `TableauEmbeddedDatasource`, providing more granular control over data visualization and reporting.

## Include DDL
During the Database Metadata ingestion, we can optionally pick up the DDL for both tables and views. During the metadata ingestion, we use the view DDLs to generate the View Lineage.

To reduce the processing time for out-of-the-box workflows, we are disabling the include DDL by default, whereas before, it was enabled, which potentially led to long-running workflows.

## Secrets Manager
Starting with the release 1.5.0, the JWT Token for the bots will be sent to the Secrets Manager if you configured one. It won't appear anymore in your dag_generated_configs in Airflow.

## Python SDK
The `metadata insight` command has been removed. Since Data Insights application was moved to be an internal system application instead of relying on external pipelines the SDK command to run the pipeline was removed.

# What's New

## Data Observability with Anomaly Detection (Collate)

OpenMetadata has been driving innovation in Data Quality in Open Source. Many organizations are taking advantage of the following Data Quality features to achieve better-quality data

1. A Native Profiler to understand the shape of the data, freshness, completeness, volume, and ability to add your own metrics, including column level profiler over time-series and dashboards
2. No-code data quality tests, deploy, collect results back to see it in a dashboard all within OpenMetadata
3. Create alerts and get notified of Test results through email, Slack, NSteams, GChat, and Webhook
4. Incident Manager to collaborate around test failures and visibility to downstream consumers of failures from upstream

In 1.5.0, we are bringing in **Anomaly Detection** based on AI to predict when an anomaly happens based on our learning historical data and automatically sending notifications to the owners of the table to warn them of the impending incidents

{% youtube videoId="BPuNC8vPcsw" start="0:00" end="1:01" width="560px" height="315px" /%}

## Enhanced Data Quality Dashboard (Collate)

We also have improved the Table Data quality dashboard to showcase the tests categorized and make it easy for everyone to consume. When there are issues, the new dashboard makes it easier to understand the Data Quality coverage of your tables and the possible impact each test failure has by organizing tests into different groups.

{% youtube videoId="bXcQBtZuyoU" start="0:00" end="2:10" width="560px" height="315px" /%}

## Freshness Data Quality Tests (Collate)
Working with old data can lead to making wrong decisions. With the new Freshness test, you can validate that your data arrives at the right time. Freshness tests are a critical part of any data team's toolset. Bringing these tests together with lineage information and the Incident Manager, your team will be able to quickly detect issues related to missing data or stuck pipelines.

{% youtube videoId="QRcR3m9cCGo" start="0:00" end="1:09" width="560px" height="315px" /%}

## Data Diff Data Quality Tests
Data quality checks are important not only within a single table but also between different tables. These data diff checks can ensure key data remains unchanged after transformation, or conversely, ensure that the transformations were actually performed.

We are introducing the **table difference data quality test** to validate that multiple appearances of the same information remain consistent. Note that the test allows you to specify which column to use as a key and which columns you want to compare, and even add filters in the data to give you more control over multiple use cases.

{% youtube videoId="oxZVS_UGrE4" start="0:00" end="2:22" width="560px" height="315px" /%}

## Domains RBAC & Subdomains
OpenMetadata introduced Domains & Data Products in 1.3.0. Since then, many large organizations have started using Domains & Data Products to achieve better ownership and collaboration around domains that can span multiple teams.

In the 1.5.0 release, we added support for subdomains. This will help teams to organize into multiple subdomains within each domain.

### RBAC for Domains
With the 1.5.0 release, we are adding more stricter controls around Domain. Now, teams, data assets, glossaries, and classification can have domain concepts and can get a policy such that only users within a domain can access the data within a domain. Domain owners can use Data Products to publish data products and showcase publicly available data assets from a specific domain.

This will help large companies to use a single OpenMetadata platform to unify all of their data and teams but also provide more stringent controls to segment the data between domains

{% youtube videoId="r-_HaewjgTQ" start="0:00" end="0:44" width="560px" height="315px" /%}

## Improved Explore Page & Data Asset Widget
OpenMetadata, with its simple UI/UX and data collaboration features, is becoming more attractive to non-technical users as well. Data Governance teams are using OpenMetadata to add glossary terms and policies around metadata. Teams using Collate SaaS product are taking advantage of our Automations feature to gain productivity in their governance tasks.

Our new improved navigation on the Explore page will help users navigate hierarchically and find the data they are looking for. Users will see the data assets now grouped by `service name -> database -> schema -> tables/stored procedures`.

We are also making the discovery of data more accessible for users introducing a data asset widget, which will group the assets by platform type. This will help users find their data if they are working on a specific platform such as Looker or Snowflake they can easily click on that icon and get to the data.

{% youtube videoId="45ekUIRO1Ec" start="0:00" end="1:11" width="560px" height="315px" /%}

## Pipeline Status Widget
We are also adding another widget you can use to customize the Landing Page of the User Personas in your organization.

With the Pipeline Status widget, Data Engineers can easily track the pipelines that are not behaving as expected. This widget, together with the obervability alerts that are already in place, will help your teams jump even faster to solving any issues in the platform.

## API as Data Asset
The Internet runs using APIs, both producing and consuming data. Organizations today run many microservices and REST APIs to capture data from their users and update a transaction database in the backend.

On top of the many supported connectors across Databases, Dashboards, ML Models, etc. We believe that providing support for API Services as data assets will help to get the full picture of how the data is coming through from various services and landing into databases, going to warehouses and BI tools.

In 1.5.0 we are introducing APIs as another first-class entity. Teams can now capture API requests and responses payloads and use our column level lineage to capture the relation between APIs and any other asset in the platform.

{% youtube videoId="b9wrVnM3u80" start="0:00" end="0:33" width="560px" height="315px" /%}

## Glossary Improvements
OpenMetadata supports multiple glossaries, an import/export and review process, and bulk asset tagging with glossary terms. Many teams are taking advantage of these features, and with an amazing open-source community, we are receiving great feedback on improving glossary functionality.

Here are some of the improvements coming in 1.5.0:
1. Glossary Reviewers can be teams
2. Updating a glossary will enforce a re-review
3. Renaming the Glossary Term while it's under review will keep the task associated with it open

## Data Insights (Collate)
The Data Insights application is meant to give you a quick glance of your data's state and allow you to take action based on the information you receive.

To continue pursuing this objective, the application was completely refactored to allow customizability. This is achieved by the possibility of now creating custom dashboards. On this release you can create charts based on your data assets metadata based on your needs.

## Ingestion Connectors
80+ connectors to help teams to centralize metadata. We continue to push the boundaries of this mission, in
- **Apache Flink** as a Pipeline Connector
- **SAP ERP**, after a long and successful collaboration with our community and SAP experts
- **Teradata** as a community contribution from [gbpy](https://github.com/gpby) to broaden the integration capabilities for enterprise-scale analytics and data management.
- **GCS Storage Connector** as a community contribution from [Matt Chamberlin](https://github.com/MChamberlin)

**Full Changelog**: https://github.com/open-metadata/OpenMetadata/compare/1.4.0-release...1.5.0-release
You can find the GitHub release [here](https://github.com/open-metadata/OpenMetadata/releases/tag/1.5.5-release).

# What's Changed

- Made the type optional in ES Response.
- Added support for refresh tokens with multiple tabs open.
- Resolved issue of overriding user info after login.
- Updated the custom property entities data model, along with the data product and database schema icons.
- Ensured Teams and Owner fields are correctly passed in the policy API call.
- Enhanced PII logging information.
- Addressed the paginate_es issue in OpenSearch.
- Decrypted JWT internally for system health checks.
- Implemented multithreading in View Lineage Processing.
- Improved search relevancy.
- Resolved issue with owners patch.
- Fixed Snowflake data diff issue.
- Updated Presidio Analyzer version and validated support for legal entities.
- Added validations for Salesforce connection.
- Allowed PII Processor to operate without storing sample data.
- Added seconds to the human-readable format scale for test case graphs.
- Added missing field in glossary term.
- Excluded defaultPersona if not present in personas.
- Resolved team export issue.
- Updated Python lineage SDK to work with UUID and FQN models.
- Fixed LDAP login issue.
- Column sizing of data quality and pipeline widget ${CollateIconWithLinkMD}
- Export with new line in description ${CollateIconWithLinkMD}
- Fix Page entity publicationDate datatype ${CollateIconWithLinkMD}

**Full Changelog**: https://github.com/open-metadata/OpenMetadata/compare/1.5.4-release...1.5.5-release
2 changes: 2 additions & 0 deletions openmetadata-docs/content/v1.5.x/collate-menu.md
Original file line number Diff line number Diff line change
Expand Up @@ -761,6 +761,8 @@ site_menu:
url: /releases/supported
- category: Releases / All Releases
url: /releases/all-releases
- category: Releases / All Releases / 1.5.4 Release
url: /releases/all-releases/#1.5.4-release
- category: Releases / All Releases / 1.5.3 Release
url: /releases/all-releases/#1.5.3-release
- category: Releases / All Releases / 1.5.2 Release
Expand Down
Loading
Loading