Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big files: the Web result page is huge #31

Closed
ColinMaudry opened this issue Apr 14, 2020 · 5 comments
Closed

Big files: the Web result page is huge #31

ColinMaudry opened this issue Apr 14, 2020 · 5 comments
Assignees
Labels
performance results page Relating to the results page, excluding checks and headlines

Comments

@ColinMaudry
Copy link
Member

ColinMaudry commented Apr 14, 2020

Hello!

As we are about to publish French award data, I had to validate it (4 days ago): https://standard.open-contracting.org/review/data/55390859-63fd-453a-989b-21c612d69687

If you clicked the above link and the validation have not expired:

  1. it's going to be a little while before you see something
  2. then your browser may be struggling a bit to display the page

It's not surprising, it's trying to display 120,000+ releases.

I don't think that displaying a release table with so many lines is relevant, especially if it costs so much on both the client and the server side.

Would it make sense to disable the display of the release table from a certain number of releases?

More generally, should the reviewing process be optimized for big files? That could be changes on cove-ocds, but also the release of command line tool that would be run locally.

@jpmckinney
Copy link
Member

Related: OpenDataServices/cove#896

@robredpath
Copy link
Contributor

@jpmckinney
Copy link
Member

#35 deals well with valid files. In one comment:

This change makes a big difference for valid data, but not so much for invalid data (which has long lists of information about each error).

I don't think a super long list (e.g. 42000 entries, like when submitting one of the files mentioned in #35) is useful to any user.

I think we can have a configurable setting to limit the number of results returned. To address performance issues, we can set a high limit that still exceeds usefulness, like 1000.

If we want a smaller number like 100, we'll want to randomize the results returned, so that we're not simply reporting e.g. the first 100 errors all caused by old data and none of the errors caused by newer data (publishers who are only making improvements to new data are likely to ignore the results if they only seem to pertain to old data).

@Bjwebb
Copy link
Contributor

Bjwebb commented Aug 20, 2020

@jpmckinney jpmckinney added performance results page Relating to the results page, excluding checks and headlines labels Sep 2, 2020
@jpmckinney
Copy link
Member

Narrower follow-up issue is #59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance results page Relating to the results page, excluding checks and headlines
Projects
None yet
Development

No branches or pull requests

4 participants