Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add keywords for SEO #10358

Merged
merged 3 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ HOSTED_DOCS_ONLY-->
</p>
<!-- -->

# DataHub: The Metadata Platform for the Modern Data Stack
# DataHub: The Data Discovery Platform for the Modern Data Stack
## Built with ❤️ by <img src="https://datahubproject.io/img/acryl-logo-light-mark.png" width="25"/> [Acryl Data](https://acryldata.io) and <img src="https://datahubproject.io/img/LI-In-Bug.png" width="25"/> [LinkedIn](https://engineering.linkedin.com)
[![Version](https://img.shields.io/github/v/release/datahub-project/datahub?include_prereleases)](https://github.com/datahub-project/datahub/releases/latest)
[![PyPI version](https://badge.fury.io/py/acryl-datahub.svg)](https://badge.fury.io/py/acryl-datahub)
Expand Down Expand Up @@ -61,7 +61,7 @@ HOSTED_DOCS_ONLY-->

## Introduction

DataHub is an open-source metadata platform for the modern data stack. Read about the architectures of different metadata systems and why DataHub excels [here](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained). Also read our
DataHub is an open-source data catalog for the modern data stack. Read about the architectures of different metadata systems and why DataHub excels [here](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained). Also read our
[LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2019/data-hub), check out our [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019) and watch our [Crunch Conference Talk](https://www.youtube.com/watch?v=OB-O0Y6OYDE). You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented.

## Features & Roadmap
Expand Down
4 changes: 2 additions & 2 deletions docs-website/src/pages/_components/CardCTAs/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ const cardsContent = [
},
{
label: "Data Contracts",
title: "End-to-end Reliability in Data",
title: "Data Contracts: End-to-end Reliability in Data",
url: "https://www.acryldata.io/blog/data-contracts-in-datahub-combining-verifiability-with-holistic-data-management?utm_source=datahub&utm_medium=referral&utm_content=blog",
},
{
label: "Shift Left",
title: "Developer-friendly Data Governance",
title: "Data Governance and Lineage Impact Analysis",
url: "https://www.acryldata.io/blog/the-3-must-haves-of-metadata-management-part-2?utm_source=datahub&utm_medium=referral&utm_content=blog",
},
];
Expand Down
2 changes: 1 addition & 1 deletion docs-website/src/pages/_components/Hero/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ const Hero = ({}) => {
<div>
<h1 className="hero__title">The #1 Open Source Metadata Platform</h1>
<p className="hero__subtitle">
DataHub is an extensible metadata platform that enables data discovery, data observability and federated governance to help tame the
DataHub is an extensible data catalog that enables data discovery, data observability and federated governance to help tame the
complexity of your data ecosystem.
</p>
<p className="hero__subtitle">
Expand Down
13 changes: 7 additions & 6 deletions docs-website/src/pages/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ function Home() {
return !siteConfig.customFields.isSaas ? (
<Layout
title={siteConfig.tagline}
description="DataHub is a data discovery application built on an extensible metadata platform that helps you tame the complexity of diverse data ecosystems."
description="DataHub is a data discovery application built on an extensible data catalog that helps you tame the complexity of diverse data ecosystems."
>
<Hero />
<Features />
Expand Down Expand Up @@ -70,9 +70,10 @@ function Home() {
</h1>
{/* <hr style={{ border: "2px solid black", width: "20rem" }}></hr> */}
<p style={{ fontSize: "18px" }}>
Explore DataHub's journey from search and discovery tool at
LinkedIn to the #1 open source metadata platform, through the
lens of its founder and some amazing community members.
Explore DataHub's journey from search and data discovery tool at
LinkedIn to the #1 open source metadata management platform,
through the lens of its founder and some amazing community
members.
</p>
</div>
</div>
Expand Down Expand Up @@ -143,8 +144,8 @@ function Home() {
</h2>
<p>
DataHub is the one-stop shop for documentation, schemas,
ownership, lineage, pipelines, data quality, usage information,
and more.
ownership, data lineage, pipelines, data quality, usage
information, and more.
</p>
</div>
<div className="col col--6 col--offset-1">
Expand Down
2 changes: 1 addition & 1 deletion docs/act-on-metadata/impact-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Follow these simple steps to understand the full dependency chain of your data e
* [searchAcrossLineage](../../graphql/queries.md#searchacrosslineage)
* [searchAcrossLineageInput](../../graphql/inputObjects.md#searchacrosslineageinput)

Looking for an example of how to use `searchAcrossLineage` to read lineage? Look [here](../api/tutorials/lineage.md#read-lineage)
Looking for an example of how to use `searchAcrossLineage` to read data lineage? Look [here](../api/tutorials/lineage.md#read-lineage)

### DataHub Blog

Expand Down
6 changes: 3 additions & 3 deletions docs/api/tutorials/lineage.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Lineage
# Data Lineage

## Why Would You Use Lineage?

Lineage is used to capture data dependencies within an organization. It allows you to track the inputs from which a data asset is derived, along with the data assets that depend on it downstream.
Data lineage is used to capture data dependencies within an organization. It allows you to track the inputs from which a data asset is derived, along with the data assets that depend on it downstream.

For more information about lineage, refer to [About DataHub Lineage](/docs/generated/lineage/lineage-feature-guide.md).
For more information about data lineage, refer to [About DataHub Lineage](/docs/generated/lineage/lineage-feature-guide.md).

### Goal Of This Guide

Expand Down
4 changes: 2 additions & 2 deletions docs/architecture/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: "Overview"

# DataHub Architecture Overview

DataHub is a [3rd generation](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained) Metadata Platform that enables Data Discovery, Collaboration, Governance, and end-to-end Observability
DataHub is a [3rd generation](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained) data catalog that enables Data Discovery, Collaboration, Governance, and end-to-end Observability
that is built for the Modern Data Stack. DataHub employs a model-first philosophy, with a focus on unlocking interoperability between
disparate tools & systems.

Expand All @@ -31,7 +31,7 @@ There are three main highlights of DataHub's architecture.

DataHub's metadata model is described using a [serialization agnostic language](https://linkedin.github.io/rest.li/pdl_schema). Both [REST](../../metadata-service) as well as [GraphQL API-s](../../datahub-web-react/src/graphql) are supported. In addition, DataHub supports an [AVRO-based API](../../metadata-events) over Kafka to communicate metadata changes and subscribe to them. Our [roadmap](../roadmap.md) includes a milestone to support no-code metadata model edits very soon, which will allow for even more ease of use, while retaining all the benefits of a typed API. Read about metadata modeling at [metadata modeling].

### Stream-based Real-time Metadata Platform
### Stream-based Real-time Metadata Management Platform

DataHub's metadata infrastructure is stream-oriented, which allows for changes in metadata to be communicated and reflected within the platform within seconds. You can also subscribe to changes happening in DataHub's metadata, allowing you to build real-time metadata-driven systems. For example, you can build an access-control system that can observe a previously world-readable dataset adding a new schema field which contains PII, and locks down that dataset for access control reviews.

Expand Down
2 changes: 1 addition & 1 deletion docs/dataproducts.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ There are many more advanced cli commands for managing Data Products as code. Ta

The following features are next on the roadmap for Data Products
- Support for marking data assets in a Data Product as private versus shareable for other teams to consume
- Support for declaring lineage manually to upstream and downstream data products
- Support for declaring data lineage manually to upstream and downstream data products
- Support for declaring logical schema for Data Products
- Support for associating data contracts with Data Products
- Support for semantic versioning of the Data Product entity
Expand Down
4 changes: 2 additions & 2 deletions docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ import FeatureCardSection from '@site/src/pages/docs/_components/FeatureCardSect

# What is DataHub?

DataHub is a modern data catalog built to enable end-to-end data discovery, data observability, and data governance.
This extensible metadata platform is built for developers to tame the complexity of their rapidly evolving data ecosystems and for data practitioners to leverage the total value of data within their organization.
DataHub is a modern data catalog designed to streamline metadata management, data discovery, and data governance. It enables users to efficiently explore and understand their data, track data lineage, profile datasets, and establish data contracts.
This extensible metadata management platform is built for developers to tame the complexity of their rapidly evolving data ecosystems and for data practitioners to leverage the total value of data within their organization.

## Quickstart

Expand Down
10 changes: 5 additions & 5 deletions docs/features/feature-guides/ui-lineage.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Managing Lineage via UI
# Managing Data Lineage via UI

## Viewing lineage
The UI shows the latest version of the lineage. The time picker can be used to filter out edges within the latest version to exclude those that were last updated outside of the time window. Selecting time windows in the patch will not show you historical lineages. It will only filter the view of the latest version of the lineage.
## Viewing Data Lineage
The UI shows the latest version of the data lineage. The time picker can be used to filter out edges within the latest version to exclude those that were last updated outside of the time window. Selecting time windows in the patch will not show you historical data lineages. It will only filter the view of the latest version of the lineage.

## Editing from Lineage Graph View

The first place that you can edit lineage for entities is from the Lineage Visualization screen. Click on the "Lineage" button on the top right of an entity's profile to get to this view.
The first place that you can edit data lineage for entities is from the Lineage Visualization screen. Click on the "Lineage" button on the top right of an entity's profile to get to this view.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/lineage-viz-button.png"/>
Expand All @@ -25,7 +25,7 @@ If you want to edit upstream lineage for entities downstream of the center node

### Adding Lineage Edges

Once you click "Edit Upstream" or "Edit Downstream," a modal will open that allows you to manage lineage for the selected entity in the chosen direction. In order to add a lineage edge to a new entity, search for it by name in the provided search bar and select it. Once you're satisfied with everything you've added, click "Save Changes." If you change your mind, you can always cancel or exit without saving the changes you've made.
Once you click "Edit Upstream" or "Edit Downstream," a modal will open that allows you to manage data lineage for the selected entity in the chosen direction. In order to add a lineage edge to a new entity, search for it by name in the provided search bar and select it. Once you're satisfied with everything you've added, click "Save Changes." If you change your mind, you can always cancel or exit without saving the changes you've made.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/add-upstream.png"/>
Expand Down
2 changes: 1 addition & 1 deletion docs/managed-datahub/managed-datahub-overview.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# How Acryl DataHub compares to DataHub

DataHub is the #1 open source metadata platform for developers.
DataHub is the #1 open source data catalog for developers.

Acryl DataHub takes DataHub to the next level by offering features that allow
you to roll out the product to the entire organization beyond your central data
Expand Down
2 changes: 1 addition & 1 deletion docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ Use-Case: Support for free-form global tags for social collaboration and aiding
- [x] UI support for attaching business terms to entities and fields

#### Jobs, Flows / Pipelines
Use case: Search and Discover your Pipelines (e.g. Airflow DAGs) and understand lineage with datasets
Use case: Search and Discover your Pipelines (e.g. Airflow DAGs) and understand data lineage with datasets
- [x] Support for Metadata Models + Backend Implementation
- [x] Metadata Integrations with systems like Airflow.

Expand Down
6 changes: 3 additions & 3 deletions docs/townhall-history.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ For the Town Hall meetings after June 2023, please refer to our [LinkedIn Live e

- Community & Project Updates - Maggie Hays & Shirshanka Das (Acryl Data)
- Community Case Study: Dataset Joins - Raj Tekal & Bobbie-Jean Nowak (Optum)
- DataHub 201: Column-Level Lineage - Hyejin Yoon (Acryl Data)
- DataHub 201: Column-Level Data Lineage - Hyejin Yoon (Acryl Data)
- Sneak Peek: BigQuery Column-Level Lineage with SQL Parsing - Harshal Sheth (Acryl Data)
- DataHub Performance Tuning – Indy Prentice (Acryl Data)

Expand Down Expand Up @@ -196,7 +196,7 @@ November Town Hall (in December!)
- Lineage Impact Analysis - using DataHub to understand the impact of changes on downstream dependencies
- Displaying Data Quality Checks in the UI
- Roadmap update: Schema Version History & Column-Level Lineage
- Community Case Study: Managing Lineage via YAML
- Community Case Study: Managing Data Lineage via YAML

### Jan 2022
[Full YouTube video](https://youtu.be/ShlSR3dMUnE)
Expand Down Expand Up @@ -340,7 +340,7 @@ November Town Hall (in December!)
- 0.7.1 Release and callouts (dbt by Gary Lucas)
- Product Analytics design sprint announcement (Maggie Hayes)
- Use-Case: DataHub at DefinedCrowd ([video](https://www.youtube.com/watch?v=qz5Rpmw8I5E)) by Pedro Silva - 15 mins
- Deep Dive + Demo: Lineage! Airflow, Superset integration ([video](https://www.youtube.com/watch?v=3wiaqhb8UR0)) by Harshal Sheth and Gabe Lyons - 10 mins
- Deep Dive + Demo: Data Lineage! Airflow, Superset integration ([video](https://www.youtube.com/watch?v=3wiaqhb8UR0)) by Harshal Sheth and Gabe Lyons - 10 mins
- Use-Case: DataHub Hackathon at Depop ([video](https://www.youtube.com/watch?v=SmOMyFc-9J0)) by John Cragg - 10 mins
- Observability Feedback share out - 5 mins
- General Q&A and closing remarks - 5 mins
Expand Down
2 changes: 1 addition & 1 deletion metadata-ingestion/scripts/docgen.py
Original file line number Diff line number Diff line change
Expand Up @@ -890,7 +890,7 @@ def generate(
f.write("<FeatureAvailability/>\n")

f.write("""
Lineage is used to capture data dependencies within an organization. It allows you to track the inputs from which a data asset is derived, along with the data assets that depend on it downstream.
Data Lineage is used to capture data dependencies within an organization. It allows you to track the inputs from which a data asset is derived, along with the data assets that depend on it downstream.

## Viewing Lineage

Expand Down
Loading