[SPARK-18700][SQL] Add StripedLock for each table's relation in cache #16135

xuanyuanking · 2016-12-04T06:52:27Z

What changes were proposed in this pull request?

As the scenario describe in SPARK-18700, when cachedDataSourceTables invalided, the coming few queries will fetch all FileStatus in listLeafFiles function. In the condition of table has many partitions, these jobs will occupy much memory of driver finally may cause driver OOM.

In this patch, add StripedLock for each table's relation in cache not for the whole cachedDataSourceTables, each table's load cache operation protected by it.

How was this patch tested?

Add a multi-thread access table test in PartitionedTablePerfStatsSuite and check it only loading once using metrics in HiveCatalogMetrics

SparkQA · 2016-12-04T07:03:39Z

Test build #69634 has finished for PR 16135 at commit 8718ec3.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-04T10:38:42Z

Test build #69635 has finished for PR 16135 at commit a1d9a3c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-05T07:27:19Z

Test build #69660 has finished for PR 16135 at commit 95aabb8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

xuanyuanking · 2016-12-07T03:22:44Z

@rxin @liancheng

rxin · 2016-12-07T04:55:52Z

cc @ericl can you take a look at this?

ericl · 2016-12-07T19:33:11Z

Is it sufficient to lock around the catalog.filterPartitions(Nil)? Why do we need reader locks?

ericl · 2016-12-07T19:33:51Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala


+  /** ReadWriteLock for each tables, protect the read and write cached */
+  protected[hive] val tableLockMap =
+    new ConcurrentHashMap[QualifiedTableName, ReentrantReadWriteLock]


Have you considered https://google.github.io/guava/releases/19.0/api/docs/com/google/common/util/concurrent/Striped.html ?

Yes, I considered the Striped.lazyWeakReadWriteLock here but I need deal with invalidateAllCache(), it need all <K, V>, so I finally new a HashMap self.

xuanyuanking · 2016-12-08T06:19:42Z

@ericl Thanks for your review.

Is it sufficient to lock around the catalog.filterPartitions(Nil)?

Yes, this patch port from 1.6.2 and I missed the diff here. Fixed in next patch.

Why do we need reader locks?

Write or Invalid the table cache operation fewer than read it. Reader waiting when there is same table writing cache. Do ericl mean here just need a lock, not RWLock?

ericl · 2016-12-08T06:57:24Z

I guess the large number of lock sites is confusing me. We only want to prevent concurrent instantiation of a single table, so shouldn't you only need 1 lock for that site?

Also, we should have a unit test that tries to concurrently read from a table from many threads, and verifies via the catalog metrics that it is only loaded once (see TablePerfStatsSuite for how to access the metrics).

ericl · 2016-12-08T07:05:39Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala

    val key = metastoreCatalog.getQualifiedTableName(table)
-    metastoreCatalog.cachedDataSourceTables.getIfPresent(key)
+    metastoreCatalog.readLock(key,
+      metastoreCatalog.cachedDataSourceTables.getIfPresent(key))


Why does this need locking?

ericl · 2016-12-08T07:05:44Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala


  def invalidateCache(): Unit = {
-    metastoreCatalog.cachedDataSourceTables.invalidateAll()
+    metastoreCatalog.invalidateAllCache()


Why this change?

xuanyuanking · 2016-12-09T04:39:36Z

hi @ericl
This commit do the 3 things below, thanks for your check:

Delete the unnecessary lock use and simplify the lock operation
Add UT test in PartitionedTablePerfStatsSuite
Add cache hit metrics in HiveCatalogMetrics

Also change the description of this PR.

SparkQA · 2016-12-09T07:33:25Z

Test build #69903 has finished for PR 16135 at commit 82cf00e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ericl · 2016-12-10T04:07:20Z

core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala

+  /**
+   * Tracks the total number of cachedDataSourceTables hits.
+   */
+  val METRIC_DATASOUCE_TABLE_CACHE_HITS = metricRegistry.counter(


Could we use one of the other metrics, rather than add a new one?

May be we can't, only the cache hits can help us check the number.
I do the test below:
I add a Thread.sleep(1000) before cachedDataSourceTables.put(tableIdentifier, created) in HiveMetastoreCatalog.scala +265 to make the build table relation slow. And print all the metrics with and without lock

println(HiveCatalogMetrics.METRIC_DATASOUCE_TABLE_CACHE_HITS.getCount()) println(HiveCatalogMetrics.METRIC_FILE_CACHE_HITS.getCount()) println(HiveCatalogMetrics.METRIC_FILES_DISCOVERED.getCount()) println(HiveCatalogMetrics.METRIC_HIVE_CLIENT_CALLS.getCount()) println(HiveCatalogMetrics.METRIC_PARALLEL_LISTING_JOB_COUNT.getCount()) println(HiveCatalogMetrics.METRIC_PARTITIONS_FETCHED.getCount())

The result of without lock:

0 0 5 70 0 0

and the result of with lock:

9 0 5 70 0 0

That's kind of odd, I'd expect the duplicate table building to cause more file accesses or at least cache hits since we are scanning the filesystem multiple times. Is that not the case?

I may find the reason of the odd scenario, please check my conclusion:
In 2.0 add a new config lazyPruningEnabled, it's default value is true and while multi-thread do the building same time, it will not do listLeafFile.
So I set the HIVE_MANAGE_FILESOURCE_PARTITIONS=false and set the partition larger than PARALLEL_PARTITION_DISCOVERY_THRESHOLD(this will cause METRIC_PARALLEL_LISTING_JOB_COUNT +1), test results list below:
without lock:

0 0 550 (50 file * 11, 1 is cache.load() and the other 10 is 10 threads) 90 11 (also 1 * 11) 1000

and with lock:

9 0 100 (2 * 50, 1 is cache.load() and the other 1 is the first threads) 54 2 (also 1 * 2) 550

so I can delete the added dataSourceTableCacheHits metrics and use parallelListingJobCount and filesDiscovered instead. Do this in next patch.

ericl · 2016-12-10T04:07:31Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

 import org.apache.spark.sql.types._


+


nit: extra newline

ericl · 2016-12-10T04:08:06Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

      tableIdent.table.toLowerCase)
  }

+  /** ReadWriteLock for each tables, protect the read and write cached */


Could you update this comment to say that the reason we lock is to prevent concurrent table instantiation?

update done, and more comments at HiveMetastoreCatalog.scala+226

ericl · 2016-12-10T04:08:17Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

+  private val tableLockStripes = Striped.lazyWeakLock(10)
+
+  /** Acquires a lock on the table cache for the duration of `f`. */
+  private def cacheLock[A](tableName: QualifiedTableName, f: => A): A = {


withTableCreationLock

update done

ericl · 2016-12-10T04:09:13Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala

+        executorPool.shutdown()
+        executorPool.awaitTermination(30, TimeUnit.SECONDS)
+        // check the cache hit, the cache only load once
+        assert(HiveCatalogMetrics.METRIC_DATASOUCE_TABLE_CACHE_HITS.getCount() == 9)


Does this test fail without the lock?

It may failed sometimes without lock, but when I add Thread.sleep(1000) before cachedDataSourceTables.put(tableIdentifier, created) in HiveMetastoreCatalog.scala +265 to make the build table relation slow.It will failed every time. How can I do this hook in UT? Or how to make the cache build operation slow without really make a big table? : )

Yeah that's fine, as long as it fails some fraction of the time it will eventually show up as a flaky test.

ericl · 2016-12-10T04:10:45Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala

    }
  }
+
+  test("SPARK-18700: add lock for each table's realation in cache") {


"table loaded only once even when resolved concurrently"

update done

ericl · 2016-12-10T04:11:50Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

  }

+  /** ReadWriteLock for each tables, protect the read and write cached */
+  private val tableLockStripes = Striped.lazyWeakLock(10)


Hm, may as well make it 100 if it's a lazy weak lock.

nit: tableCreationLocks.

SparkQA · 2016-12-11T07:07:36Z

Test build #69979 has started for PR 16135 at commit 276656e.

ericl · 2016-12-11T07:21:19Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala

+        executorPool.shutdown()
+        executorPool.awaitTermination(30, TimeUnit.SECONDS)
+        // check the cache hit, the cache only load once
+        assert(HiveCatalogMetrics.METRIC_DATASOUCE_TABLE_CACHE_HITS.getCount() == 9)


Yeah that's fine, as long as it fails some fraction of the time it will eventually show up as a flaky test.

ericl · 2016-12-11T07:21:49Z

core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala

    METRIC_FILE_CACHE_HITS.dec(METRIC_FILE_CACHE_HITS.getCount())
    METRIC_HIVE_CLIENT_CALLS.dec(METRIC_HIVE_CLIENT_CALLS.getCount())
    METRIC_PARALLEL_LISTING_JOB_COUNT.dec(METRIC_PARALLEL_LISTING_JOB_COUNT.getCount())
+    METRIC_DATASOUCE_TABLE_CACHE_HITS.dec(METRIC_DATASOUCE_TABLE_CACHE_HITS.getCount())


s/souce/source

e...sorry, this new added metric will delete next patch like before comment

ericl · 2016-12-11T07:22:43Z

core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala

+  /**
+   * Tracks the total number of cachedDataSourceTables hits.
+   */
+  val METRIC_DATASOUCE_TABLE_CACHE_HITS = metricRegistry.counter(


That's kind of odd, I'd expect the duplicate table building to cause more file accesses or at least cache hits since we are scanning the filesystem multiple times. Is that not the case?

SparkQA · 2016-12-12T05:31:59Z

Test build #69998 has finished for PR 16135 at commit 16c47c5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ericl

Thanks for making these changes, this lgtm with some nits.

ericl · 2016-12-12T08:46:19Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

-            catalog.filterPartitions(Nil)  // materialize all the partitions in memory
+      // Here we should protect all relation get and create operation with lock while big
+      // table's CatalogFileIndex will take some time, only lock cachedDataSourceTables.put
+      // will still cause driver memory waste. More detail see SPARK-18700.


nit: probably don't need this comment

done, delete it

ericl · 2016-12-12T08:47:48Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

  }

+  /** Locks for preventing driver mem waste when concurrent table instantiation */
+  private val tableCreationLocks = Striped.lazyWeakLock(100)


nit: "These locks guard against multiple attempts to instantiate a table, which wastes memory."

ericl · 2016-12-12T08:48:39Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala

+          // check the cache hit, we use the metric of METRIC_FILES_DISCOVERED and
+          // METRIC_PARALLEL_LISTING_JOB_COUNT to check this, while the lock take effect,
+          // only one thread can really do the build, so the listing job count is 2, the other
+          // one is cahce.load func. Also METRIC_FILES_DISCOVERED is $partition_num * 2


s/cahce/cache

……sorry, fix done

xuanyuanking · 2016-12-12T09:00:09Z

Thanks for ericl's review!

SparkQA · 2016-12-12T10:33:26Z

Test build #70012 has finished for PR 16135 at commit 5beccaa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ericl · 2016-12-13T20:41:05Z

cc @rxin , this lgtm

xuanyuanking · 2016-12-16T06:32:10Z

cc @rxin thanks for check. :)

hvanhovell · 2016-12-19T14:36:51Z

retest this please

hvanhovell · 2016-12-19T14:37:30Z

I am merging this one after a successful test run. Ping me if you object.

SparkQA · 2016-12-19T16:11:47Z

Test build #70357 has finished for PR 16135 at commit 5beccaa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-12-19T19:12:54Z

Go for it.

## What changes were proposed in this pull request? As the scenario describe in [SPARK-18700](https://issues.apache.org/jira/browse/SPARK-18700), when cachedDataSourceTables invalided, the coming few queries will fetch all FileStatus in listLeafFiles function. In the condition of table has many partitions, these jobs will occupy much memory of driver finally may cause driver OOM. In this patch, add StripedLock for each table's relation in cache not for the whole cachedDataSourceTables, each table's load cache operation protected by it. ## How was this patch tested? Add a multi-thread access table test in `PartitionedTablePerfStatsSuite` and check it only loading once using metrics in `HiveCatalogMetrics` Author: xuanyuanking <xyliyuanjian@gmail.com> Closes #16135 from xuanyuanking/SPARK-18700. (cherry picked from commit 2448285) Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>

hvanhovell · 2016-12-19T19:35:35Z

Merging to master/2.1. @xuanyuanking can you open a backport for 2.0, if we also need to merge this to that branche?

xuanyuanking · 2016-12-20T11:05:14Z

@hvanhovell Sure, I open a new BACKPORT-2.0.
There's a little diff in branch-2.0, the ut test of this patch based on the HiveCatalogMetrics which not existed in 2.0, so I added the metric used in this patch. Thanks for check.

…ation in cache ## What changes were proposed in this pull request? Backport of #16135 to branch-2.0 ## How was this patch tested? Because of the diff between branch-2.0 and master/2.1, here add a multi-thread access table test in `HiveMetadataCacheSuite` and check it only loading once using metrics in `HiveCatalogMetrics` Author: xuanyuanking <xyliyuanjian@gmail.com> Closes #16350 from xuanyuanking/SPARK-18700-2.0.

gatorsmile · 2017-01-05T23:47:46Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala

+          // check the cache hit, we use the metric of METRIC_FILES_DISCOVERED and
+          // METRIC_PARALLEL_LISTING_JOB_COUNT to check this, while the lock take effect,
+          // only one thread can really do the build, so the listing job count is 2, the other
+          // one is cache.load func. Also METRIC_FILES_DISCOVERED is $partition_num * 2


This comment is wrong. The extra counts are from the DataFrameWriter's save() API.

Working on a fix to avoid the useless filesystem scan caused by the save() API.

@gatorsmile Xiao fixed this in #16481

## What changes were proposed in this pull request? As the scenario describe in [SPARK-18700](https://issues.apache.org/jira/browse/SPARK-18700), when cachedDataSourceTables invalided, the coming few queries will fetch all FileStatus in listLeafFiles function. In the condition of table has many partitions, these jobs will occupy much memory of driver finally may cause driver OOM. In this patch, add StripedLock for each table's relation in cache not for the whole cachedDataSourceTables, each table's load cache operation protected by it. ## How was this patch tested? Add a multi-thread access table test in `PartitionedTablePerfStatsSuite` and check it only loading once using metrics in `HiveCatalogMetrics` Author: xuanyuanking <xyliyuanjian@gmail.com> Closes apache#16135 from xuanyuanking/SPARK-18700.

SPARK-18700: add ReadWriteLock for each table's relation in cache

8718ec3

xuanyuanking changed the title ~~SPARK-18700: add ReadWriteLock for each table's relation in cache~~ [SPARK-18700][SQL] Add ReadWriteLock for each table's relation in cache Dec 4, 2016

make readLock writeLock synchronized

a1d9a3c

Narrow the range of synchronized

95aabb8

ericl reviewed Dec 7, 2016

View reviewed changes

ericl reviewed Dec 8, 2016

View reviewed changes

Simplify the usage of lock and add ut test with metrics

82cf00e

xuanyuanking changed the title ~~[SPARK-18700][SQL] Add ReadWriteLock for each table's relation in cache~~ [SPARK-18700][SQL] Add StripedLock for each table's relation in cache Dec 9, 2016

ericl suggested changes Dec 10, 2016

View reviewed changes

ericl reviewed Dec 10, 2016

View reviewed changes

Modify var name and comments

276656e

ericl reviewed Dec 11, 2016

View reviewed changes

Delete new added metrics and use exists instead in test

16c47c5

ericl approved these changes Dec 12, 2016

View reviewed changes

Fix comments and some nits

5beccaa

asfgit closed this in 2448285 Dec 19, 2016

xuanyuanking mentioned this pull request Dec 20, 2016

[SPARK-18700][SQL][BACKPORT-2.0] Add StripedLock for each table's relation in cache #16350

Closed

gatorsmile reviewed Jan 5, 2017

View reviewed changes

[SPARK-18700][SQL] Add StripedLock for each table's relation in cache #16135

[SPARK-18700][SQL] Add StripedLock for each table's relation in cache #16135

Uh oh!

Conversation

xuanyuanking commented Dec 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Dec 4, 2016

Uh oh!

SparkQA commented Dec 4, 2016

Uh oh!

SparkQA commented Dec 5, 2016

Uh oh!

xuanyuanking commented Dec 7, 2016

Uh oh!

rxin commented Dec 7, 2016

Uh oh!

ericl commented Dec 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericl Dec 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuanyuanking commented Dec 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericl commented Dec 8, 2016

Uh oh!

ericl Dec 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuanyuanking commented Dec 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Dec 9, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericl Dec 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuanyuanking commented Dec 4, 2016 •

edited

Loading

ericl commented Dec 7, 2016 •

edited

Loading

ericl Dec 7, 2016 •

edited

Loading

xuanyuanking commented Dec 8, 2016 •

edited

Loading

ericl Dec 8, 2016 •

edited

Loading

xuanyuanking commented Dec 9, 2016 •

edited

Loading

ericl Dec 10, 2016 •

edited

Loading

ericl commented Dec 13, 2016 •

edited

Loading