-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3 #29331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -38,7 +38,7 @@ class DistributedSuite extends SparkFunSuite with Matchers with LocalSparkContex | |
| // Necessary to make ScalaTest 3.x interrupt a thread on the JVM like ScalaTest 2.2.x | ||
| implicit val defaultSignaler: Signaler = ThreadSignaler | ||
|
|
||
| val clusterUrl = "local-cluster[2,1,1024]" | ||
| val clusterUrl = "local-cluster[3,1,1024]" | ||
|
|
||
| test("task throws not serializable exception") { | ||
| // Ensures that executors do not crash when an exn is not serializable. If executors crash, | ||
|
|
@@ -174,7 +174,7 @@ class DistributedSuite extends SparkFunSuite with Matchers with LocalSparkContex | |
|
|
||
| private def testCaching(conf: SparkConf, storageLevel: StorageLevel): Unit = { | ||
| sc = new SparkContext(conf.setMaster(clusterUrl).setAppName("test")) | ||
| TestUtils.waitUntilExecutorsUp(sc, 2, 60000) | ||
| TestUtils.waitUntilExecutorsUp(sc, 3, 60000) | ||
| val data = sc.parallelize(1 to 1000, 10) | ||
| val cachedData = data.persist(storageLevel) | ||
| assert(cachedData.count === 1000) | ||
|
|
@@ -206,7 +206,8 @@ class DistributedSuite extends SparkFunSuite with Matchers with LocalSparkContex | |
| "caching on disk" -> StorageLevel.DISK_ONLY, | ||
| "caching in memory, replicated" -> StorageLevel.MEMORY_ONLY_2, | ||
| "caching in memory, serialized, replicated" -> StorageLevel.MEMORY_ONLY_SER_2, | ||
| "caching on disk, replicated" -> StorageLevel.DISK_ONLY_2, | ||
| "caching on disk, replicated 2" -> StorageLevel.DISK_ONLY_2, | ||
| "caching on disk, replicated 3" -> StorageLevel.DISK_ONLY_3, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so what happen if there aren't 3 executors? do we have a test that needs updating?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The number of copies becomes 2 and this test case fail reasonably.
Yes. This test suite is updated at line 41. |
||
| "caching in memory and disk, replicated" -> StorageLevel.MEMORY_AND_DISK_2, | ||
| "caching in memory and disk, serialized, replicated" -> StorageLevel.MEMORY_AND_DISK_SER_2 | ||
| ).foreach { case (testName, storageLevel) => | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1256,7 +1256,7 @@ storage levels is: | |
|
|
||
| **Note:** *In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library, | ||
| so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`, | ||
| `MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, and `DISK_ONLY_2`.* | ||
| `MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2`, and `DISK_ONLY_3`.* | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it looks like we need to update the table above as well?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me rephrase the request.
Is there something more I can do, @tgravescs ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thats it. |
||
|
|
||
| Spark also automatically persists some intermediate data in shuffle operations (e.g. `reduceByKey`), even without users calling `persist`. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call `persist` on the resulting RDD if they plan to reuse it. | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.