Fix import channel in Postgresql #12709

jredrejo · 2024-10-08T17:52:04Z

Summary

When importing channel data, psycopg2 execute_values function was used. This function code converts to bytes all the strings in order to have a better performance.
However, converting an utf-8 char to byte results in more than one byte, making some strings unfit in the maximum number of chars of a column limit.

This PR:

Replaces the use of execute_values by executemany
As sqlite does not applies the column char limit, ensures limit is applied to data before is inserted in Postgresql

Note: I had to cherry-pick the commit from #12466 to fix docs builds in GH

References

Closes: #11780

Reviewer guidance

Do tests pass?
In order to test the fix, this channel was failing and can be used in a kolibri installation using PG:
kolibri manage importchannel network --baseurl=https://studio.learningequality.org 07cd1633691b4473b6fda08caf826253

Testing checklist

Contributor has fully tested the PR manually
If there are any front-end changes, before/after screenshots are included
Critical user journeys are covered by Gherkin stories
Critical and brittle code paths are covered by unit tests

PR process

PR has the correct target branch and milestone
PR has 'needs review' or 'work-in-progress' label
If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
If this is an important user-facing change, PR or related issue has a 'changelog' label
If this includes an internal dependency change, a link to the diff is provided

Reviewer checklist

PR is fully functional
PR has been tested for accessibility regressions
External dependency files were updated if necessary (yarn and pip)
Documentation is updated
Contributor is in AUTHORS.md

…+ add required extension sphinxcontrib.jquery

github-actions · 2024-10-08T18:23:46Z

Build Artifacts

Asset type	Download link
PEX file	kolibri-0.17.3a0.dev0_git.12.g1aadc9b9.pex
Windows Installer (EXE)	kolibri-0.17.3a0.dev0+git.12.g1aadc9b9-windows-setup-unsigned.exe
Debian Package	kolibri_0.17.3a0.dev0+git.12.g1aadc9b9-0ubuntu1_all.deb
Mac Installer (DMG)	kolibri-0.17.3a0.dev0+git.12.g1aadc9b9.dmg
Android Package (APK)	kolibri-0.17.3a0.dev0+git.12.g1aadc9b9-0.1.4-debug.apk
TAR file	kolibri-0.17.3a0.dev0+git.12.g1aadc9b9.tar.gz
WHL file	kolibri-0.17.3a0.dev0+git.12.g1aadc9b9-py2.py3-none-any.whl

rtibbles

I am not sure that we are properly setting psycopg2 up with unicode handling as described here? https://www.psycopg.org/docs/usage.html#unicode-handling

This might point to a way to handle this in a way that doesn't cause a huge performance regression.

Also, I think we should add a regression test for the specific case we are fixing here - a importing a unicode string for a node title that is at the max length.

kolibri/core/content/utils/channel_import.py

…n a lot of data needs to be inserted

…length

jredrejo · 2024-10-10T18:32:04Z

Also, I think we should add a regression test for the specific case we are fixing here - a importing a unicode string for a node title that is at the max length.

Done

rtibbles

Just a couple of questions to make sure the tests are doing what they ought - if the answer to my second question is yes, then we might need two test cases, one with utf-8 that is short enough but would overflow if it was converted to bytes, and another where the tag is just too long regardless.

I will manually test this to compare speed before and after.

kolibri/core/content/fixtures/longdescriptions_content_data.json

kolibri/core/content/utils/channel_import.py

rtibbles · 2024-10-18T01:25:39Z

This should be good to go, once I've done the speed test locally just to verify, will do tomorrow.

rtibbles

Local testing shows no performance regressions with these changes.

Do not use execute_values to avoid byte conversion

f16551c

jredrejo added the TODO: needs review Waiting for review label Oct 8, 2024

jredrejo added this to the Kolibri 0.17: Planned Patch 2 milestone Oct 8, 2024

jredrejo requested a review from rtibbles October 8, 2024 17:52

github-actions bot added the DEV: backend Python, databases, networking, filesystem... label Oct 8, 2024

Adds loose pinning of dev docs requirements to ensure correct builds …

f1b712a

…+ add required extension sphinxcontrib.jquery

jredrejo force-pushed the fix_channel_import_in_pg branch from 24e4490 to f1b712a Compare October 8, 2024 18:15

rtibbles requested changes Oct 9, 2024

View reviewed changes

kolibri/core/content/utils/channel_import.py Outdated Show resolved Hide resolved

executemany produces one insert sql per row what can be very slow whe…

4e8c330

…n a lot of data needs to be inserted

jredrejo requested a review from rtibbles October 10, 2024 16:48

Added a postgresql test to check long names are shortened to its max …

130405f

…length

jredrejo force-pushed the fix_channel_import_in_pg branch from dd5dd8a to 130405f Compare October 10, 2024 18:31

rtibbles reviewed Oct 11, 2024

View reviewed changes

kolibri/core/content/fixtures/longdescriptions_content_data.json Show resolved Hide resolved

kolibri/core/content/utils/channel_import.py Show resolved Hide resolved

Add test to check byte decoding

34a910b

jredrejo requested a review from rtibbles October 17, 2024 18:06

rtibbles approved these changes Oct 18, 2024

View reviewed changes

rtibbles merged commit d085880 into learningequality:release-v0.17.x Oct 18, 2024
34 checks passed

rtibbles mentioned this pull request Oct 18, 2024

importchannel command fails in postgresql when channels had strings in non utf-8 format #11780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix import channel in Postgresql #12709

Fix import channel in Postgresql #12709

jredrejo commented Oct 8, 2024 •

edited

Loading

github-actions bot commented Oct 8, 2024 •

edited

Loading

rtibbles left a comment

jredrejo commented Oct 10, 2024

rtibbles left a comment

rtibbles commented Oct 18, 2024

rtibbles left a comment

Fix import channel in Postgresql #12709

Fix import channel in Postgresql #12709

Conversation

jredrejo commented Oct 8, 2024 • edited Loading

Summary

References

Reviewer guidance

Testing checklist

PR process

Reviewer checklist

github-actions bot commented Oct 8, 2024 • edited Loading

Build Artifacts

rtibbles left a comment

Choose a reason for hiding this comment

jredrejo commented Oct 10, 2024

rtibbles left a comment

Choose a reason for hiding this comment

rtibbles commented Oct 18, 2024

rtibbles left a comment

Choose a reason for hiding this comment

jredrejo commented Oct 8, 2024 •

edited

Loading

github-actions bot commented Oct 8, 2024 •

edited

Loading