Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admin Page is Really Slow #380

Open
Soben713 opened this issue Oct 28, 2016 · 14 comments
Open

Admin Page is Really Slow #380

Soben713 opened this issue Oct 28, 2016 · 14 comments

Comments

@Soben713
Copy link
Collaborator

I created this issue to address the performance problem in the admin page. As we all know, the admin page takes a while to load. And even after it is loaded, it is very slow due to the massive data that needs to be loaded and rendered in the page. This needs to be fixed.

Related issue: #343

@Soben713
Copy link
Collaborator Author

Soben713 commented Oct 28, 2016

One of the main causes of this problem is the huge DC map which includes all the labels. There seems to be a way to make the map a little bit faster by using canvas instead of SVG layers. It seems to be in the leaflet's documentation too (note that the example in the documentation is exactly the problem we have — "many thousands of circle markers on the map").

image

I checked this solution and it makes it significantly faster (but it is not perfect yet) on my localhost. However, the image quality is reduced a little bit (my guess is that it only happens in high-density displays though). Here is a screenshot for comparison:

L_PREFER_CANVAS = true --> lower quality, higher performance:
image

L_PREFER_CANVAS = false --> higher quality, lower performance:
image

@jonfroehlich
Copy link
Member

Interesting. I'd like to assign this to @tongning along with #343 but we should discuss this. I'll assign it tentatively to @tongning for now. I think focusing on #343 first should be a priority.

@tongning
Copy link
Collaborator

tongning commented Oct 30, 2016

Two of the tables (labels and users) are quite huge as well. On my computer removing them reduced the page load time to 3-5 seconds from 10 seconds (as measured by Page load time chrome extension). Moving these tables to different pages should help significantly.

@jonfroehlich
Copy link
Member

@tongning, yep, that's all part of redesigning the admin page with a menu/tab interface: #343. Thanks for investigating this. We can use a similar approach to ensure that our pages have fast load times.

@manaswisaha
Copy link
Member

Adding Anthony's recent investigations (in #588):

Even with the smaller test dataset, the server takes around 4-5 seconds to respond with the admin page after the initial request, according to the waterfall in Chrome dev tools. This may potentially be related to the OutOfMemory issues at #508.

I removed various elements of the admin page one at a time, to see which parts are contributing most to the page generation time. Documenting the results below for reference, time in seconds.

Original page:
(5.37, 4.45, 4.05, 4.55, 4.21)

Without the Recent Labels table:
(3.90, 3.61, 3.61, 3.76, 3.49)

Without the Activities table:
(4.19, 3.78, 3.57, 3.79, 3.66)

Without the Registered Users table:
(2.29, 2.28, 2.20, 2.13, 2.15)

The registered users table seems to be contributing a lot, especially with only ~150 users in the test dataset. Improving the table pagination so that the server doesn't have to return all of the data at once might help with this.

@misaugstad
Copy link
Member

The registered users table seems to be contributing a lot, especially with only ~150 users in the test dataset. Improving the table pagination so that the server doesn't have to return all of the data at once might help with this.

Yeah well over 50% off the time of loading the initial admin page is from the user table. I think this is because we are querying for the list of users, then for each user we are running 5 separate queries to get metadata about them. I think if we did this in just a single query (or even in 5 queries) instead of having a number of queries equal to 5 times the number of users, this would be a lot faster. I'm talking about bringing the admin page loading time from 2 minutes to 30 seconds or so, so it still isn't going to be fast at all 😁

@jonfroehlich
Copy link
Member

Reducing the number of queries is important... another strategy is pre-computing data and offloading to another table. Is that easy to do?

@misaugstad
Copy link
Member

It is definitely more complicated than just optimizing the queries themselves, especially if we want to be able to see the most up-to-date information on the admin page (which we often do), b/c that would require real-time updates to those tables.

I think that pre-computed tables are always an option, but are closer to a last resort (especially if it is just for the admin page). We probably all agree that the way to do this is to iteratively look for what part of the page is taking the longest to load and fixing that. So if this part still takes longer to load than anything else, then we should look into pre-computing here.

@misaugstad
Copy link
Member

misaugstad commented Oct 5, 2018

Okay so I found a few cases where adding an index to a table improves performance for certain queries. It isn't a silver bullet, but it can definitely help in some cases. The best part is that it really doesn't have a downside for us. The typical downside is that it makes insertion and deletion queries slower, but that isn't a problem for us at all right now.

It is also nice because adding indices doesn't require adding new code. We can just log into the server and make those updates to the database.

@misaugstad
Copy link
Member

Furthermore, I think that we should make these updates directly to the database ourselves instead of adding them to a Play evolutions file in the code because some of the indices we want to add will take hours to create, and that's not something I think we should have running automatically whenever someone wants to test out old versions of the code in the local dev environment.

@misaugstad
Copy link
Member

And the most time-consuming query we have is the one that computes the amount of time audited by each user. I think that we should pre-compute these values and put them in a table, updating nightly. This query isn't being used on the main admin page, it is only run when you click on the "analytics" tab, but it takes something like 5 minutes to run.

@manaswisaha
Copy link
Member

manaswisaha commented Oct 5, 2018 via email

@misaugstad
Copy link
Member

However, they can be a part of the initial evolutions script when we set up
a new database and the application e.g. for a new city.

Agreed, I'm not exactly sure what the process is for creating a new database yet, but I think the fresh database we make should be set up correctly like this :)

@misaugstad misaugstad assigned misaugstad and unassigned misaugstad and sbower213 Jan 24, 2019
@misaugstad misaugstad removed their assignment Mar 1, 2019
@misaugstad
Copy link
Member

When we get back to improving the load times of the Admin page (pretty important right now, since we can't even see the Admin page in Seattle #3544), it should be noted that the tabbed interface did a lot to help here, and I think that the low hanging fruit has been picked already. I did some profiling on my local environment: The page takes about 50-55 seconds to load, but no single query took more than 2 seconds total, even if it was used multiple times. The remaining methods of speeding up the initial page load that remain:

  1. There are a lot of places where we are running essentially the same query multiple times for different situations. Take the countValidationsByResult() functions. We are running a query to count the number of "Agree" votes, running another to count the number of "Disagree", and a third to count the number of "Unsure". It's even worse, because for each of those we are running a query for each label type as well, for a total of 27 separate queries that could really be run as one query that gives a count by validation result and label type, and then we can fill in the whole table using that output. The reason that we did it this way is that it's much easier to reason about it in the template scala (.scala.html) files; it's much easier to read/understand the HTML this way.

  2. Moving queries out of the template scala (.scala.html) and into new API endpoints that we then call from JS after the initial page load. When we run request this data in the template scala, we are telling the application to run the computation and put the result directly into the HTML before we send along the page's HTML. This can take over a minute, resulting in a proxy error! The most important thing is that the page loads at all. So even if the total load time doesn't decrease, getting the page to load and then sending further requests to the server for more data that needs to be computed would be worthwhile.

    Furthermore, when we upgrade to a newer version of the Play framework (Update versions of languages/libraries/frameworks #1258), we'll need to remove the directly embedded function calls from our template scala anyway! So there's really no reason not to do this. Plus it works hand in hand with point 1: Rather than sending 27 GET requests for data on validation counts, it makes a lot more sense to just send one request and then put data in the correct places using JS.

    When we do this, it would be helpful to add some sort of loading icon in places where we are waiting on data from the server, and then replace those with the data as we get it, or some sort of error icon if the GET request fails.

Here are a few functions I've seen that could benefit from some of this consolidation and removal from the template scala:

  1. countValidations() / countValidationsByResult(result: Int) / countValidationsByResult(result: Int, labelType: String) could be turned into a single query
  2. countValidationUsersContributed / countValidationResearchersContributed / countAllValidationUsersContributed
  3. countLabels() / countLabels(labelType: String)

The list goes on!

When working on this, you may want to try to prioritize which functions you do this to based off of the total time spent across all versions of that function. I just did a quick test of this for the countValidations functions by creating a variable var totalTime: Long = 0 in LabelValidationTable.scala. Then I turned countValidations(labelType: String) from this:

def countValidations(labelType: String): Int = db.withSession { implicit session =>
  val typeID = LabelTypeTable.labelTypeToId(labelType)

  validationLabels.innerJoin(labelsWithoutDeleted).on(_.labelId === _.labelId)
    .filter(_._2.labelTypeId === typeID)
    .length.run
}

to this

def countValidations(labelType: String): Int = db.withSession { implicit session =>
  val startTime: Long = System.nanoTime()
  val typeID = LabelTypeTable.labelTypeToId(labelType)

  val x = validationLabels.innerJoin(labelsWithoutDeleted).on(_.labelId === _.labelId)
    .filter(_._2.labelTypeId === typeID)
    .length.run
  val endTime: Long = System.nanoTime()
  val duration: Long = endTime - startTime
  totalTime += duration
  Logger.info(s"This time: ${duration / 1e9}, total time: ${totalTime / 1e9} seconds")
  x
}

I did this for all three countValidations functions to get the total time spent on them across all of those functions when loading the admin page. It's a little bit hacky, but it was quick and easy to do to get an idea of how much total time was spent on those funcs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants