Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

The comment in CatalogTable returned from Hive is always empty. We store it in the table property when creating a table. However, when we try to retrieve the table metadata from Hive metastore, we do not rebuild it. The comment is always empty.

This PR is to fix the issue.

How was this patch tested?

Fixed the test case to verify the change.

@SparkQA
Copy link

SparkQA commented Aug 9, 2016

Test build #63392 has finished for PR 14550 at commit e546af5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan . This is a tiny fix. Could you review this?

val tableMetadata = catalog.getTableMetadata(TableIdentifier(tabName, Some("default")))
val viewMetadata = catalog.getTableMetadata(TableIdentifier(viewName, Some("default")))
assert(tableMetadata.comment == Option("BLABLA"))
assert(tableMetadata.properties.get("comment") == Option("BLABLA"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also remove the comment from table properties, to not surprise users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained below, Hive keeps comment in the table properties. Should we keep it too?

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. HiveClient should be symmetrical, the table meta read back should be same with what users saved into. We don't need to follow hive here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I see. Let me fix it now.

@cloud-fan
Copy link
Contributor

do you know why we put comment in table properties instead of table comment field? Some hive tricks?

@gatorsmile
Copy link
Member Author

gatorsmile commented Aug 9, 2016

After an investigation, Hive does not have such a field in org.apache.hadoop.hive.ql.metadata.Table. Thus, they also put it in the table properties/parameters.

For example, this is the Hive output of one table with table comment d

hive> desc formatted tab5;
OK
# col_name              data_type               comment             

id                      int                     b                   

# Detailed Table Information         
Database:               default                  
Owner:                  root                     
CreateTime:             Mon Aug 08 16:19:12 EDT 2016     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://6b68a24121f4:9000/user/hive/warehouse/tab5    
Table Type:             MANAGED_TABLE            
Table Parameters:        
    comment                 d                   
    transient_lastDdlTime   1470687552          

# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    serialization.format    1     

.map(_.asScala.toMap).orNull
),
properties = properties,
properties = properties.filter(kv => kv._1 != "path"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use filterKeys?

Copy link
Member Author

@gatorsmile gatorsmile Aug 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of filterKeys will create a new object.

override def filterKeys(p: A => Boolean): Map[A, B] = new FilteredKeys(p) with DefaultMap[A, B]

Thus, I got the following error:

org.apache.spark.SparkException: Task not serializable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, we should filter out comment here, not path

Copy link
Member Author

@gatorsmile gatorsmile Aug 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, sure. Working on multiple PRs at the same time. Will open a new PR tomorrow regarding path...

@SparkQA
Copy link

SparkQA commented Aug 10, 2016

Test build #63508 has finished for PR 14550 at commit e575700.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in bdd5371 Aug 10, 2016
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@SparkQA
Copy link

SparkQA commented Aug 10, 2016

Test build #63517 has finished for PR 14550 at commit b1cbf55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@cloud-fan To backport #14531, we need to backport this. Do you want me to do it?

asfgit pushed a commit that referenced this pull request Sep 3, 2016
…m Hive Metastore

### What changes were proposed in this pull request?
The `comment` in `CatalogTable` returned from Hive is always empty. We store it in the table property when creating a table. However, when we try to retrieve the table metadata from Hive metastore, we do not rebuild it. The `comment` is always empty.

This PR is to fix the issue.

### How was this patch tested?
Fixed the test case to verify the change.

Author: gatorsmile <gatorsmile@gmail.com>

Closes #14550 from gatorsmile/tableComment.

(cherry picked from commit bdd5371)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

backported to 2.0

@gatorsmile
Copy link
Member Author

gatorsmile commented Sep 3, 2016

Thank you! Let me fix the test case failure in Branch 2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants