Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix import channel in Postgresql #12709

Merged

Conversation

jredrejo
Copy link
Member

@jredrejo jredrejo commented Oct 8, 2024

Summary

When importing channel data, psycopg2 execute_values function was used. This function code converts to bytes all the strings in order to have a better performance.
However, converting an utf-8 char to byte results in more than one byte, making some strings unfit in the maximum number of chars of a column limit.

This PR:

  • Replaces the use of execute_values by executemany
  • As sqlite does not applies the column char limit, ensures limit is applied to data before is inserted in Postgresql

Note: I had to cherry-pick the commit from #12466 to fix docs builds in GH

References

Closes: #11780

Reviewer guidance

Do tests pass?
In order to test the fix, this channel was failing and can be used in a kolibri installation using PG:
kolibri manage importchannel network --baseurl=https://studio.learningequality.org 07cd1633691b4473b6fda08caf826253

Testing checklist

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

PR process

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

@jredrejo jredrejo added the TODO: needs review Waiting for review label Oct 8, 2024
@jredrejo jredrejo added this to the Kolibri 0.17: Planned Patch 2 milestone Oct 8, 2024
@jredrejo jredrejo requested a review from rtibbles October 8, 2024 17:52
@github-actions github-actions bot added the DEV: backend Python, databases, networking, filesystem... label Oct 8, 2024
…+ add required extension sphinxcontrib.jquery
@jredrejo jredrejo force-pushed the fix_channel_import_in_pg branch from 24e4490 to f1b712a Compare October 8, 2024 18:15
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that we are properly setting psycopg2 up with unicode handling as described here? https://www.psycopg.org/docs/usage.html#unicode-handling

This might point to a way to handle this in a way that doesn't cause a huge performance regression.

Also, I think we should add a regression test for the specific case we are fixing here - a importing a unicode string for a node title that is at the max length.

kolibri/core/content/utils/channel_import.py Outdated Show resolved Hide resolved
@jredrejo jredrejo requested a review from rtibbles October 10, 2024 16:48
@jredrejo jredrejo force-pushed the fix_channel_import_in_pg branch from dd5dd8a to 130405f Compare October 10, 2024 18:31
@jredrejo
Copy link
Member Author

Also, I think we should add a regression test for the specific case we are fixing here - a importing a unicode string for a node title that is at the max length.

Done

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions to make sure the tests are doing what they ought - if the answer to my second question is yes, then we might need two test cases, one with utf-8 that is short enough but would overflow if it was converted to bytes, and another where the tag is just too long regardless.

I will manually test this to compare speed before and after.

@jredrejo jredrejo requested a review from rtibbles October 17, 2024 18:06
@rtibbles
Copy link
Member

This should be good to go, once I've done the speed test locally just to verify, will do tomorrow.

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local testing shows no performance regressions with these changes.

@rtibbles rtibbles merged commit d085880 into learningequality:release-v0.17.x Oct 18, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DEV: backend Python, databases, networking, filesystem... TODO: needs review Waiting for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants