[Question] Symlink Facet missing when creating Hive table for first time

Hi,

I have been doing some exploring into OpenLineage for the Spark lineage, looking into the codebase and testing out behaviours and was wondering if the following scenario is something that can be implemented and missing, a bug, or a technical limitation of what is possible with the logical plan and Spark.

When creating a table for the first time using `SaveAsTable` or whenever using the `overwrite` mode, the symLink facet will be missing with the link to the destination hive table in the `execute_insert_into_hadoop_fs_relation_command`. Is this because that specific step at that time doesn't have the catalog details and can't access the previous step in the plan that contains the table creation command? Not sure if as the columnLineage process runs it is able to pick up the catalog details for the new table at the same time.

```python
df = spark.createDataFrame([
    (100, "Hyukjin Kwon"), (120, "Hyukjin Kwon"), (140, "Haejoon Lee")],
    schema=["age", "name"]
)

# doesn't contain symlink facet
df.write.mode("append").saveAsTable("a_database.b_table")

# will now contain symlink facet
df.write.mode("append").saveAsTable("a_database.b_table")

# will again not contain symlink facet
df.write.mode("overwrite").saveAsTable("a_database.b_table")
```
`InsertInto` works fine but guessing because the target table needs to already exists

If it is possible or an actual bug I'll update this to a feature or bug respectively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Symlink Facet missing when creating Hive table for first time #3558

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Symlink Facet missing when creating Hive table for first time #3558

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions