-
Notifications
You must be signed in to change notification settings - Fork 0
SHS-NG M4.3: Port StorageTab to the new backend. #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
squito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at StreamBlocks before, and I have to say I'm totally perplexed even by the current implementation. It looks to me like its just every block on the executors, and nothing to do with streaming at all (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockStatusListener.scala#L94). I will look at that more. But what you have here seems consistent with that anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my prev comment on exec memory remaining with multiple rdds still applies here. you're only updating it for the one RDD with an update, but the memory remaining is actually changing for all rdds cached on this executor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this whole if/else just liveUpdate(rdd)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update call needs an extra argument here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I don't forget, repeating the comment for adding the test for multiple rdds on one executor
This required adding information about StreamBlockId to the store, which is not available yet via the API. So an internal type was added until there's a need to expose that information in the API. The UI only lists RDDs that have cached partitions, and that information wasn't being correctly captured in the listener, so that's also fixed, along with some minor (internal) API adjustments so that the UI can get the correct data. Because of the way partitions are cached, some optimizations w.r.t. how often the data is flushed to the store could not be applied to this code; because of that, some different ways to make the code more performant were added to the data structures tracking RDD blocks, with the goal of avoiding expensive copies when lots of blocks are being updated.
This required adding information about StreamBlockId to the store,
which is not available yet via the API. So an internal type was added
until there's a need to expose that information in the API.
The UI only lists RDDs that have cached partitions, and that information
wasn't being correctly captured in the listener, so that's also fixed,
along with some minor (internal) API adjustments so that the UI can
get the correct data.
Because of the way partitions are cached, some optimizations w.r.t. how
often the data is flushed to the store could not be applied to this code;
because of that, some different ways to make the code more performant
were added to the data structures tracking RDD blocks, with the goal of
avoiding expensive copies when lots of blocks are being updated.