Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: PHP lex_stats call shouldn't return entire entries list #1775

Closed
rmunn opened this issue Sep 19, 2023 · 3 comments · Fixed by #1779
Closed

bug: PHP lex_stats call shouldn't return entire entries list #1775

rmunn opened this issue Sep 19, 2023 · 3 comments · Fixed by #1779

Comments

@rmunn
Copy link
Collaborator

rmunn commented Sep 19, 2023

See #1773 (comment) for details. On a project with more than 30,000 entries, lex_stats is trying to return all the entries and running out of memory. It should only be returning an entry count, plus a count of how many entries have pictures and how many have audio, both of which would be easy to get with a simple Mongo query.

@rmunn
Copy link
Collaborator Author

rmunn commented Sep 21, 2023

It's rather hard to construct a Mongo query for "entries with audio", because:

  1. We record audio writing systems with -audio at the end of the tag
  2. Our Mongo structure uses writing systems as keys
  3. You can't match a regex against a key in Mongo's query syntax, only against values

To construct a Mongo query for audio, we'll need to first build a list of audio writing systems in the project config, then construct a query that checks entry, senses, and examples for those writing systems. (Which could, in theory, appear in any field of the entry, sense, or example). That's not a simple Mongo query to construct.

@rmunn
Copy link
Collaborator Author

rmunn commented Sep 21, 2023

It is actually surprisingly difficult to construct a Mongo query for "entries containing audio", and I'm becoming persuaded that it's not worth the effort. I'd rather remove the "entries containing audio" part from the dashboard and replace it with something else, such as "entries with comments" or "entries with TODO status in a comment". The amount of programming effort of the audio feature seems to be outpacing its value to the users.

@megahirt
Copy link
Collaborator

You're right - I remember this coming up when Billy was working on developing the dashboard. I think the proper way to implement this is through a "audio count" or "hasAudio" property on the entry for the purpose of querying, and this is maintained as a separate property from the writing system. Of course, this would also require changes to the existing API and a data migration.

The cheap way forward, which avoids changes and fixes problems with projects that encounter the 500 error, is to just remove the "entries with audio" stat as you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants