Skip to content

Conversation

@ChrisChV
Copy link
Contributor

@ChrisChV ChrisChV commented May 9, 2024

Description

  • Adds time elapsed to import log.
  • Fix issues with memory and time when importing a large taxonomy.

Supporting information

It has been tested with taxonomies that take too long or could not finish due to a memory backup. These are their average times after fixing the issue:

  • WGU Instructional Design: K-12 Collection: 5.95 seconds.
  • FlatTaxonomy: 65.55 seconds.
  • Lightcast Open Skills Taxonomy: 76.5 seconds.

Testing instructions

from openedx_tagging.core.tagging.models.import_export import TagImportTask
task = list(TagImportTask.objects.all())[-1]
task.log

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label May 9, 2024
@openedx-webhooks
Copy link

openedx-webhooks commented May 9, 2024

Thanks for the pull request, @ChrisChV! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@ChrisChV ChrisChV marked this pull request as draft May 10, 2024 16:47
@ChrisChV ChrisChV changed the title fix: Issues when importing a large taxonomy [FC-0049] fix: Issues when importing a large taxonomy May 10, 2024
ChrisChV added 4 commits May 14, 2024 11:00
* Refactor execute() of CreateTag() to avoid save values on memory
* Refactor execute() of TagImportPlan() to avoid save logs in every action. This causes slow execution and memory overflow.
@ChrisChV ChrisChV force-pushed the chris/FAL-3721-issues-importing-large-taxonomy branch from 744e5d2 to afa0977 Compare May 14, 2024 16:01
@ChrisChV ChrisChV marked this pull request as ready for review May 14, 2024 17:10
Copy link
Member

@yusuf-musleh yusuf-musleh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 @ChrisChV Great work on this! Its working well, here are the timings I got when importing the different taxonomies:

  • WGU Instructional Design: K-12 Collection: 1.44 seconds.
  • FlatTaxonomy: 14.98 seconds.
  • Lightcast Open Skills Taxonomy: 18.47 seconds.

I just had a few nit questions below.

  • I tested this: I followed the testing instructions in the PR
  • I read through the code
  • I checked for accessibility issues
  • Includes documentation

Comment on lines -247 to 254
"""
parent = None
if self.tag.parent_id:
parent = self.taxonomy.tag_set.get(external_id=self.tag.parent_id)
taxonomy_tag = Tag(
Tag.objects.create(
taxonomy=self.taxonomy,
parent=parent,
parent=self.taxonomy.tag_set.get(external_id=self.tag.parent_id)
if self.tag.parent_id is not None else None,
value=self.tag.value,
external_id=self.tag.id,
)
taxonomy_tag.save()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just curious if this had an impact on improving the performance or is it mainly making the code cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It improves memory a little because it does not create an instance of Tag, but uses the create function. It was the first thing I tried, but it wasn't the root of the problem. I left it like this because it looks cleaner.

Copy link
Contributor

@bradenmacdonald bradenmacdonald May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag.objects.create(...) still creates a Tag instance and calls .save() on it for you, and returns it. It may be garbage-collected slightly better if you don't keep the returned Tag instance here, but I wouldn't expect any major difference. I do like the create() version more though - it is cleaner.

Comment on lines -412 to +409
taxonomy_tag = self._get_tag()
taxonomy_tag.delete()
self._get_tag().delete()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: similar to the above, if this had any performance improvement or just code style changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@bradenmacdonald bradenmacdonald merged commit d229afb into openedx:main May 17, 2024
@openedx-webhooks
Copy link

@ChrisChV 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@bradenmacdonald bradenmacdonald deleted the chris/FAL-3721-issues-importing-large-taxonomy branch May 17, 2024 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants