Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User moderation: Delete a user's records when blocking them #1450

Conversation

max-moser
Copy link
Contributor

@max-moser max-moser commented Sep 7, 2023

Fixes inveniosoftware/invenio-app-rdm#2379

Screenshot of a blocked user's deleted record:
image

@max-moser max-moser force-pushed the mm/delete-records-when-blocking-users branch 3 times, most recently from e8a6ff0 to 95f9254 Compare September 7, 2023 13:06
}

# get all the parent records owned by the blocked user
parent_recs = [
Copy link
Contributor

@kpsherva kpsherva Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can be potentially costly queries to run, is this action done in celery task ? ping @slint @zzacharo @alejandromumo - not sure if that was discussed during design phase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the user's service, we kick off a celery task and execute all the actions inside it.

We can leave a comment here (e.g. this is meant to be used ...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docstring to reflect this!


# soft-delete all the published records of that user
for rec in recs:
if not rec.deletion_status.is_deleted:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docstring

"The record {record_id} in not included in the community {community_id}.".format(
record_id=self.record_id, community_id=self.community_id
)
"The record %(rec_id)s in not included in the community %(com_id)s.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @zzacharo @ntarocco @slint just flagging this to have your opinion

I find % operator for string formatting very unreadable and less flexible than .format, as far as I know there is negligible to no difference in performance during the evaluation

Copy link
Contributor Author

@max-moser max-moser Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i completely agree, the % format is not as nice as f-strings by any means!
however, from my understanding, they're required by lazy_gettext()...
if that's not the case, i'm happy to use the the f-string format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't lazy_gettext ok also with .format( ?
I remember f-strings not working with lazy_gettext but .format should be fine (to be verified - don't trust my memory!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've dug a bit, and these are my findings so far:
lazy_gettext(*args, **kwargs) creates a LazyString(gettext, *args, **kwargs).
the gettext in question (the first argument to the LazyString) calls get_domain().gettext(*args, **kwargs).
in my local invenio shell, this get_domain() gives me a flask_babel.Domain object, whose gettext(self, string, **variables) will call the % formatting.
the self.get_translations().ugettext(string) will point to a gettext.GNUTranslations(...).gettext(...) method, but from a quick glance this only seems to concern itself with loading the translations files and loading the translation strings, without variable replacement.

this leads me to the conclusion that unless flask_babel.get_domain() leads us down a different path in a live application (which is of course possible, i didn't check that so far), the f-string syntax won't work.

i think there's two possibilities regarding .format(...):

  1. we're calling .format() inside lazy_gettext (i.e. lazy_gettext(some_string.format(**vars)))
  2. we're calling it outside (i.e. lazy_gettext(some_string).format(**vars))

with my above observations, i think that case (1) will just replace the (f-string) placeholders in the some_string text before lazy_gettext even has a chance to fetch the translation string, causing the translation to fail silently (AFAIK, gettext() just returns the string as-is if it can't find a matching translation).
in this case, the affected translations are simply broken.

in case (2), lazy_gettext will translate the string as expected (as long as there are matching translations), but the call to .format(...) will immediately evaluate the LazyString, negating its purpose.
here, we could just use gettext() instead (which may not always be what we want, i assume).


this being said, all of the above is based on a mixture between static analysis and dynamic analysis in an invenio shell context, which may deviate from contexts given in a running application.
i'll try to check out the behavior of a running application regarding both formatting syntaxes on monday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above is a correct analysis.

Correct usage:

lazy_gettext("String %(var)s", var="a var")

Incorrect usage:

lazy_gettext("String {var}".format(var="test")) # will try to translate "String "test" but only "String {var}" was extracted.
lazy_gettext("String {var}").format(var="test") # reduces to gettext and translates immediately.

Technically, gettext("String {var}").format(var="test") will yield a correct result, but you rarely know when you are working with lazy_gettext vs gettext, hence for consistency I would never use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lnielsen is not only the fastest merger in the west, but also beat me to this 😄


I just verified the results in a local instance, having the following code:
invenio_rdm_records/requests/user_moderation/actions.py

    with force_locale("de_DE"):
        print(get_locale(), _("User %(user_id)s was blocked", user_id=user_id))

invenio_rdm_records/translations/de/LC_MESSAGES/messages.po (same for de_DE instead of de, and similar (with an empty msgstr in invenio_rdm_records/translations/messages.pot)

#: invenio_rdm_records/requests/user_moderation/actions.py:51
#, python-format
msgid "User %(user_id)s was blocked"
msgstr "Benutzer %(user_id)s wurde blockiert"

invenio.cfg

I18N_LANGUAGES = [
    ('de', _('German')),
    ('de_DE', _('German')),
]

This will print the expected values:

[2023-09-11 13:58:01,421: WARNING/ForkPoolWorker-17] de_DE
[2023-09-11 13:58:01,421: WARNING/ForkPoolWorker-17]
[2023-09-11 13:58:01,432: WARNING/ForkPoolWorker-17] Benutzer 5 wurde blockiert

If I simply replace the %(user_id)s with {user_id} (f-string / format syntax) like so (in the code as well as the translation files):

    with force_locale("de_DE"):
        print(get_locale(), _("User {user_id} was blocked", user_id=user_id))

Then I get the following logs:

[2023-09-11 14:01:09,448: WARNING/ForkPoolWorker-17] de_DE
[2023-09-11 14:01:09,449: WARNING/ForkPoolWorker-17]
[2023-09-11 14:01:09,460: WARNING/ForkPoolWorker-17] Benutzer {user_id} wurde blockiert

Which is also expected, since Flask-Babel uses % formatting and the syntax is different for that.
The text is translated correctly, but the user_id=user_id value is simply discarded.


Calling the .format() "inside" the translation:

    with force_locale("de_DE"):
        print(get_locale(), _("User {user_id} was blocked".format(user_id=user_id)))

Will break the translation:

[2023-09-11 14:02:39,452: WARNING/ForkPoolWorker-17] de_DE
[2023-09-11 14:02:39,452: WARNING/ForkPoolWorker-17]
[2023-09-11 14:02:39,464: WARNING/ForkPoolWorker-17] User 5 was blocked

Lastly, calling it "outside" will work like gettext, but waste a few CPU cycles (due to the construction and evaluation of the LazyString):

    with force_locale("de_DE"):
        print(get_locale(), _("User {user_id} was blocked").format(user_id=user_id))
[2023-09-11 14:06:14,551: WARNING/ForkPoolWorker-17] de_DE
[2023-09-11 14:06:14,551: WARNING/ForkPoolWorker-17]
[2023-09-11 14:06:14,551: WARNING/ForkPoolWorker-17] Benutzer 5 wurde blockiert

Copy link
Contributor Author

@max-moser max-moser Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to @rekt-hard who pointed out to me that i need to compile the translations via setup.py compile_catalog in invenio-rdm-records.
This step is also missing from the documentation.

@@ -188,6 +187,9 @@ def __init__(self, request_id):
def description(self):
"""Exception description."""
if self.request_id:
return _(f"Identical access requests already exist: {self.request_id}")
return _(
"Identical access requests already exist: %(request_id)s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if by any change request id is of numeric type, the formatting will raise an exception

Copy link
Contributor Author

@max-moser max-moser Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should just be cast into a string:

In [4]: "that's %(id)s example" % {"id": 1}
Out[4]: "that's 1 example"

Copy link
Contributor

@kpsherva kpsherva Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't it need to be %(id)d instead of %(id)s for the number to be cast to string ? I haven't used this notation in a long time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no – %(id)d will fail if given anything other than a number (decimal numbers will be truncated), while %(id)s will accept anything that can be converted to a string (str(value), value.__str__()).

In [47]: "%(x)s" % {"x": 1}
Out[47]: '1'

In [48]: "%(x)s" % {"x": "1"}
Out[48]: '1'

In [49]: "%(x)s" % {"x": "a"}
Out[49]: 'a'

In [50]: "%(x)d" % {"x": 1}
Out[50]: '1'

In [51]: "%(x)d" % {"x": "1"}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[51], line 1
----> 1 "%(x)d" % {"x": "1"}

TypeError: %d format: a number is required, not str

In [52]: "%(x)d" % {"x": "a"}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[52], line 1
----> 1 "%(x)d" % {"x": "a"}

TypeError: %d format: a number is required, not str

In [53]:

i guess the benefit of %d over %s is that you can control stuff like their sign relatively easily: https://pyformat.info/#number_sign

@max-moser max-moser force-pushed the mm/delete-records-when-blocking-users branch from 95f9254 to f83d05b Compare September 11, 2023 10:03
@max-moser max-moser force-pushed the mm/delete-records-when-blocking-users branch 2 times, most recently from 3882f7b to 240e084 Compare September 11, 2023 13:32
@max-moser max-moser mentioned this pull request Sep 11, 2023
@zzacharo zzacharo merged commit 69eca6c into inveniosoftware:master Sep 11, 2023
@max-moser max-moser deleted the mm/delete-records-when-blocking-users branch September 11, 2023 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

user moderation: connect record deletion
5 participants