Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-5270: Improve loading of profiles listing in the WebUI #1654

Closed
wants to merge 1 commit into from

Conversation

kkhatua
Copy link
Contributor

@kkhatua kkhatua commented Feb 24, 2019

Note: Closed the old PR #755 and #1250 and opening this.

When Drill is displaying profiles stored on the file system (Local or Distributed), it does so by loading the entire list of .sys.drill files in the profile directory, sorting and deserializing. This can get expensive, since only a single CPU thread does this.
As an example, a directory of 120K profiles, the time to just fetch the list of files alone is about 6 seconds. After that, based on the number of profiles being rendered, the time varies. An average of 30ms is needed to deserialize a standard profile, which translates to an additional 3sec for the rendering of default 100 profiles.

A user reported issue confirms just that:
DRILL-5028 Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS

Additional JIRAs filed ask for managing these profiles
DRILL-2362 Drill should manage Query Profiling archiving
DRILL-2861 enhance drill profile file management

This PR brings the following enhancements to achieve that:

  1. Improve loading times by pinning the deserialized list in memory (TreeSet; for maintaining a memory-efficient sortedness of the profiles). That way, if we do not detect any new profiles in the profileStore (i.e. profile directory) since the last time a web-request for rendering the profiles was made, we can re-serve the same listing and skip making a trip to the filesystem to re-fetch all the profiles.
  2. Leverage Guava Cache to save on deserializing profiles, since the WebServer makes 2 calls to deserialize a file on disk for rendering.

Reload & reconstruction of the profiles in the Tree is done in the event of any of the following states changing:
i. Modification Time of profile dir
ii. Number of profiles in the profile dir
iii. Number of profiles requested exceeds existing the currently available list

Access to the profiles requires a lock to ensure that at any time, only one of the requesting users is able to trigger a reconstruction of the TreeSet. If reconstruction is not required, it uses the existing cache rather than attempting to read the entire profile directory's contents all over again.

1. Use finite-size TreeSet to track latest profile names
2. Guava Cache for faster profile access (since a rendered profile is deserialized twice)
@kkhatua
Copy link
Contributor Author

kkhatua commented Feb 27, 2019

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant