-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix import channel in Postgresql #12709
Fix import channel in Postgresql #12709
Conversation
…+ add required extension sphinxcontrib.jquery
24e4490
to
f1b712a
Compare
Build Artifacts
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that we are properly setting psycopg2 up with unicode handling as described here? https://www.psycopg.org/docs/usage.html#unicode-handling
This might point to a way to handle this in a way that doesn't cause a huge performance regression.
Also, I think we should add a regression test for the specific case we are fixing here - a importing a unicode string for a node title that is at the max length.
…n a lot of data needs to be inserted
dd5dd8a
to
130405f
Compare
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of questions to make sure the tests are doing what they ought - if the answer to my second question is yes, then we might need two test cases, one with utf-8 that is short enough but would overflow if it was converted to bytes, and another where the tag is just too long regardless.
I will manually test this to compare speed before and after.
This should be good to go, once I've done the speed test locally just to verify, will do tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local testing shows no performance regressions with these changes.
Summary
When importing channel data, psycopg2 execute_values function was used. This function code converts to bytes all the strings in order to have a better performance.
However, converting an utf-8 char to byte results in more than one byte, making some strings unfit in the maximum number of chars of a column limit.
This PR:
execute_values
byexecutemany
Note: I had to cherry-pick the commit from #12466 to fix docs builds in GH
References
Closes: #11780
Reviewer guidance
Do tests pass?
In order to test the fix, this channel was failing and can be used in a kolibri installation using PG:
kolibri manage importchannel network --baseurl=https://studio.learningequality.org 07cd1633691b4473b6fda08caf826253
Testing checklist
PR process
Reviewer checklist
yarn
andpip
)