-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-1888] enhance MEMORY_AND_DISK mode by dropping blocks in parallel #791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
IMO this make things fragile. |
|
This is thread safe. |
|
Use of dropping is not my safe
|
|
As far as I know, reasons for task failure may be: exception happens during task execution, Executor lost and relaunch, stage cancelled by user. But I'm not sure if I listed all the reasons. And I don't know the detail how spark relaunch Executor and cancel stage and how to handle these when dropping memory blocks. Is a try-catch enough for it? I want to reset the dropping flag if the task is terminated. |
|
It should read MT safe - phone "autocorrected" it, sigh. There could be any number of reasons for dropping block to fail (including disk issues, etc). |
|
Can you please create a JIRA for this, and update the title of the PR. |
|
@mridulm @tdas I have created a JIRA for this: https://issues.apache.org/jira/browse/SPARK-1888 |
|
@mridulm Sorry I may misunderstood you because of my poor english :(
|
|
As we know, memory store is used for add, read, remove blocks. Reading and removing is quite simple, so let's focus on adding. |
|
It is not MT safe because the PR is checking/modifiying shared state (like dropping variable) in an unsafe manner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are modifying entry.dropping here - there is no gaurantee this change will be visible to other threads anytime soon.
|
@mridulm Thanks very much for your comment! I think a big difference is: earlier code call BlockManager#dropFromMemory within putLock, but now we call it in parallel, we have to check it carefully. |
|
|
Essentially there are a few things here : a) What happens if existing block is re-added. Looks like this was probably handled earlier also ? b) What happens if same block is added in parallel by two threads. |
|
@mridulm I checked the code of BlockManager#doPut. BlockManger will create a BlockInfo for the block to be added, and |
|
This seems really promising!! However, can you explain whether the following sequence of events is possible or not in Both thread 1 and thread 2 wants to insert blocks of 100 bytes. Existing blocks include block A and block B of 100 bytes each, and the total capacity is 200 bytes. Next,
Is this sequence possible? |
|
@cloud-fan there are multiple calls to memoryStore to directly put a block - not just from external addition. |
|
@tdas there is a dropping flag which prevents this. |
|
@mridulm i may be missing something as well. Are you referring to the new |
|
@tdas yes - thread 1 should set A's dropping to true; so thread 2 should not select it |
|
@mridulm Is that so? Since selection and marking are occurring in different |
|
@tdas you missed an important thing. |
|
@tdas as @cloud-fan stated, the code uses the implementation detail that the private method is always called within context of a tryToPut lock - and not called by anyone else. I dont like the fact that we have locking state spread out like this, but then this is how it was already I guess ... |
|
@cloud-fan makes more sense. @tdas, can you also comment about the usecases/flows I mentioned above ? |
|
@tdas I think we shouldn't synchronize on |
|
@mridulm I checked all caller of MemoryStore#putValues and putBytes via IDE, it shows only BlockManager will call them and with block info synchronized. So maybe we don't need to worry about putting same block in parallel? |
|
@mridulm @tdas I have moved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of 'get', can you rename it to 'find' or some such ?
|
did a manual merge :) |
|
Can one of the admins verify this patch? |
|
@cloud-fan This is now outdated. There have been relatively significant changes that went into |
|
Actually, before you do that, have you looked at #2134, which seems to be doing something really similar on the new code? |
|
Let me raise the same question here that I raised in #2134. If my understanding is correct, by the time |
|
First, |
|
Seems a big change has made to memory store, I will digest it and update my PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan @andrewor14
Hi cloud-fan, I think there will be some problem when you doesn't update the currentMemory. Assume there are two threads, the first one get the selectLock and finished running and release the lock, till now the currentMemory is not updated, then the second thread get the selectLock, the value of currentMemory for the second thread is the same with the first thread, so, the freeMemory=maxMemory-currentMemory is use for two times by the two threads. which means the selectedMemory for the second thread is smaller than it actually required.
|
Hi @liyezhang556520 , thanks for pointing this out! I have updated my PR, please review @andrewor14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cloud-fan , you removed accountingLock.synchronized here, so there will be more than one thread call planFreeSpace here for reserving memory. And each thread will asking for memory with size maxUnrollMemory - currentUnrollMemory. I think the logic is not the same with the original intention.
There is second question, what if maxUnrollMemory is large (maxMemory*unrollFraction might be dozens of GB large), while the requested memory amountToRequest is small (maybe dozens of MB), then you only use one thread to free the size, which is spaceToEnsure, this seems doesn't solve the IO issue.
Third, since you are lazy drop the to be dropped blocks, how can you avoid OOM which is @andrewor14 pointed out (the putting speed is faster than dropping)?
Does the three problems exists in the current patch? Maybe I missed something.
|
@liyezhang556520 Thanks for you comments. 1) yes, the logic is not the same with the original intention. I have updated my PR to fix this. 2) the origin logic to calculate |
|
This has mostly gone stale so I'd suggest we close this issue and revisit this later. This is a decent idea, but it does complicate things a good amount, and this particular piece of code IMO is already quite complicated. As with any performance change, it would be useful to quantify the performance problems observed as a result of this issue. For instance, has it been observed as a bottleneck in real clusters? Putting information of this type on the JIRA would be useful. |
|
@pwendell , I updated a design doc for SPARK-3000 several days ago which is also mainly to resolve the issue, There might have some performance problems in some case. You can have a look on this. |
…6.1.1.4.0 (apache#791)" (apache#841) May be the RCA of https://jirap.corp.ebay.com/browse/HADP-59331
It's unefficient to drop memory blocks to disk inside a synchronized block as IO is slow. As the TODO says, we just need synchronize selecting blocks to be dropped. So my implementation is: in
ensureFreeSpace, we iterate entries and select blocks to be dropped. But instead of dropping block insideensureFreeSpace, we can just mark selected entries as dropping, and return these blocks, let the caller do the dropping. When other thread callensureFreeSpaceagain, they will skip entries that marked as dropping when iterating entries. And the caller,tryToPut, will do the dropping before put the new block into entries. In this way, we can do dropping in parallel.