-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User moderation: Delete a user's records when blocking them #1450
User moderation: Delete a user's records when blocking them #1450
Conversation
e8a6ff0
to
95f9254
Compare
} | ||
|
||
# get all the parent records owned by the blocked user | ||
parent_recs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can be potentially costly queries to run, is this action done in celery task ? ping @slint @zzacharo @alejandromumo - not sure if that was discussed during design phase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the user's service, we kick off a celery task and execute all the actions inside it.
We can leave a comment here (e.g. this is meant to be used ...
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the docstring to reflect this!
|
||
# soft-delete all the published records of that user | ||
for rec in recs: | ||
if not rec.deletion_status.is_deleted: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the docstring
"The record {record_id} in not included in the community {community_id}.".format( | ||
record_id=self.record_id, community_id=self.community_id | ||
) | ||
"The record %(rec_id)s in not included in the community %(com_id)s.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i completely agree, the %
format is not as nice as f-strings by any means!
however, from my understanding, they're required by lazy_gettext()
...
if that's not the case, i'm happy to use the the f-string format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't lazy_gettext ok also with .format(
?
I remember f-strings not working with lazy_gettext but .format should be fine (to be verified - don't trust my memory!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've dug a bit, and these are my findings so far:
lazy_gettext(*args, **kwargs)
creates a LazyString(gettext, *args, **kwargs)
.
the gettext
in question (the first argument to the LazyString
) calls get_domain().gettext(*args, **kwargs)
.
in my local invenio shell
, this get_domain()
gives me a flask_babel.Domain
object, whose gettext(self, string, **variables)
will call the % formatting.
the self.get_translations().ugettext(string)
will point to a gettext.GNUTranslations(...).gettext(...)
method, but from a quick glance this only seems to concern itself with loading the translations files and loading the translation strings, without variable replacement.
this leads me to the conclusion that unless flask_babel.get_domain()
leads us down a different path in a live application (which is of course possible, i didn't check that so far), the f-string syntax won't work.
i think there's two possibilities regarding .format(...)
:
- we're calling
.format()
insidelazy_gettext
(i.e.lazy_gettext(some_string.format(**vars))
) - we're calling it outside (i.e.
lazy_gettext(some_string).format(**vars)
)
with my above observations, i think that case (1) will just replace the (f-string) placeholders in the some_string
text before lazy_gettext
even has a chance to fetch the translation string, causing the translation to fail silently (AFAIK, gettext()
just returns the string as-is if it can't find a matching translation).
in this case, the affected translations are simply broken.
in case (2), lazy_gettext
will translate the string as expected (as long as there are matching translations), but the call to .format(...)
will immediately evaluate the LazyString
, negating its purpose.
here, we could just use gettext()
instead (which may not always be what we want, i assume).
this being said, all of the above is based on a mixture between static analysis and dynamic analysis in an invenio shell
context, which may deviate from contexts given in a running application.
i'll try to check out the behavior of a running application regarding both formatting syntaxes on monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above is a correct analysis.
Correct usage:
lazy_gettext("String %(var)s", var="a var")
Incorrect usage:
lazy_gettext("String {var}".format(var="test")) # will try to translate "String "test" but only "String {var}" was extracted.
lazy_gettext("String {var}").format(var="test") # reduces to gettext and translates immediately.
Technically, gettext("String {var}").format(var="test")
will yield a correct result, but you rarely know when you are working with lazy_gettext vs gettext, hence for consistency I would never use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, https://inveniordm.docs.cern.ch/develop/howtos/i18n/ is wrong in this case and I see we're missing a section as well https://inveniordm.docs.cern.ch/develop/howtos/i18n/#lazy-vs-non-lazy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lnielsen is not only the fastest merger in the west, but also beat me to this 😄
I just verified the results in a local instance, having the following code:
invenio_rdm_records/requests/user_moderation/actions.py
with force_locale("de_DE"):
print(get_locale(), _("User %(user_id)s was blocked", user_id=user_id))
invenio_rdm_records/translations/de/LC_MESSAGES/messages.po
(same for de_DE
instead of de
, and similar (with an empty msgstr
in invenio_rdm_records/translations/messages.pot
)
#: invenio_rdm_records/requests/user_moderation/actions.py:51
#, python-format
msgid "User %(user_id)s was blocked"
msgstr "Benutzer %(user_id)s wurde blockiert"
invenio.cfg
I18N_LANGUAGES = [
('de', _('German')),
('de_DE', _('German')),
]
This will print the expected values:
[2023-09-11 13:58:01,421: WARNING/ForkPoolWorker-17] de_DE
[2023-09-11 13:58:01,421: WARNING/ForkPoolWorker-17]
[2023-09-11 13:58:01,432: WARNING/ForkPoolWorker-17] Benutzer 5 wurde blockiert
If I simply replace the %(user_id)s
with {user_id}
(f-string / format syntax) like so (in the code as well as the translation files):
with force_locale("de_DE"):
print(get_locale(), _("User {user_id} was blocked", user_id=user_id))
Then I get the following logs:
[2023-09-11 14:01:09,448: WARNING/ForkPoolWorker-17] de_DE
[2023-09-11 14:01:09,449: WARNING/ForkPoolWorker-17]
[2023-09-11 14:01:09,460: WARNING/ForkPoolWorker-17] Benutzer {user_id} wurde blockiert
Which is also expected, since Flask-Babel
uses %
formatting and the syntax is different for that.
The text is translated correctly, but the user_id=user_id
value is simply discarded.
Calling the .format()
"inside" the translation:
with force_locale("de_DE"):
print(get_locale(), _("User {user_id} was blocked".format(user_id=user_id)))
Will break the translation:
[2023-09-11 14:02:39,452: WARNING/ForkPoolWorker-17] de_DE
[2023-09-11 14:02:39,452: WARNING/ForkPoolWorker-17]
[2023-09-11 14:02:39,464: WARNING/ForkPoolWorker-17] User 5 was blocked
Lastly, calling it "outside" will work like gettext
, but waste a few CPU cycles (due to the construction and evaluation of the LazyString
):
with force_locale("de_DE"):
print(get_locale(), _("User {user_id} was blocked").format(user_id=user_id))
[2023-09-11 14:06:14,551: WARNING/ForkPoolWorker-17] de_DE
[2023-09-11 14:06:14,551: WARNING/ForkPoolWorker-17]
[2023-09-11 14:06:14,551: WARNING/ForkPoolWorker-17] Benutzer 5 wurde blockiert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @rekt-hard who pointed out to me that i need to compile the translations via setup.py compile_catalog
in invenio-rdm-records
.
This step is also missing from the documentation.
@@ -188,6 +187,9 @@ def __init__(self, request_id): | |||
def description(self): | |||
"""Exception description.""" | |||
if self.request_id: | |||
return _(f"Identical access requests already exist: {self.request_id}") | |||
return _( | |||
"Identical access requests already exist: %(request_id)s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if by any change request id is of numeric type, the formatting will raise an exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should just be cast into a string:
In [4]: "that's %(id)s example" % {"id": 1}
Out[4]: "that's 1 example"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't it need to be %(id)d
instead of %(id)s
for the number to be cast to string ? I haven't used this notation in a long time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no – %(id)d
will fail if given anything other than a number (decimal numbers will be truncated), while %(id)s
will accept anything that can be converted to a string (str(value)
, value.__str__()
).
In [47]: "%(x)s" % {"x": 1}
Out[47]: '1'
In [48]: "%(x)s" % {"x": "1"}
Out[48]: '1'
In [49]: "%(x)s" % {"x": "a"}
Out[49]: 'a'
In [50]: "%(x)d" % {"x": 1}
Out[50]: '1'
In [51]: "%(x)d" % {"x": "1"}
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[51], line 1
----> 1 "%(x)d" % {"x": "1"}
TypeError: %d format: a number is required, not str
In [52]: "%(x)d" % {"x": "a"}
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[52], line 1
----> 1 "%(x)d" % {"x": "a"}
TypeError: %d format: a number is required, not str
In [53]:
i guess the benefit of %d
over %s
is that you can control stuff like their sign relatively easily: https://pyformat.info/#number_sign
95f9254
to
f83d05b
Compare
3882f7b
to
240e084
Compare
Fixes inveniosoftware/invenio-app-rdm#2379
Screenshot of a blocked user's deleted record: