Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dubious ownership of annotations #193

Open
savary opened this issue Jan 14, 2025 · 4 comments
Open

Dubious ownership of annotations #193

savary opened this issue Jan 14, 2025 · 4 comments
Assignees
Labels

Comments

@savary
Copy link

savary commented Jan 14, 2025

Our FLAT server goes down from time to time and we are trying to establish the reasons for that.
One such case happened yesterday evening.
I looked into the logs of the docker (sudo docker logs -n 1000 flat).
An annotator was working on FLAT, then she seems to have logged out and logged back in. Then this error occurred:

fatal: detected dubious ownership in repository at '/data/annotations'
To add an exception for this directory, call:
git config --global --add safe.directory /data/annotations

This is likely due to our versioning of the annotations via an external git repository.
It seems that the error is not that problematic as discussed here.

But I checked the access permissions in our annotation directory and they do not look very homogeneous:

parseme@parseme:~/annotations$ cd ..
parseme@parseme:~$ cd annotations/
parseme@parseme:~/annotations$ ls -lia
total 1752
5505187 drwxrwsr-x 351 parseme parseme 20480 nov. 13 18:49 .
5505059 drwxr-x--- 9 parseme parseme 4096 août 7 11:43 ..
5637212 drwxr-xr-x 2 parseme parseme 4096 janv. 10 2024 Abbas
5636773 drwxr-xr-x 2 parseme parseme 4096 janv. 10 2024 abdelati.hawwari
5637259 drwxr-xr-x 3 parseme parseme 4096 janv. 10 2024 abigail.walsh
5637425 drwxr-xr-x 2 parseme parseme 4096 janv. 10 2024 adela.tocaru
5767548 drwxr-sr-x 2 root parseme 4096 juil. 10 2024 adelina.cerpja
5767563 drwxr-sr-x 2 root parseme 4096 juil. 10 2024 adina.duca
5505488 drwxr-xr-x 4 parseme parseme 4096 janv. 10 2024 agata.savary
5637082 drwxr-xr-x 16 parseme parseme 4096 oct. 8 10:26 agata.savary.annotator
5506634 drwxr-xr-x 2 parseme parseme 4096 janv. 10 2024 agata.savary.test
5767594 drwxr-sr-x 2 root parseme 4096 juil. 5 2024 agata.savary.unidive
5636803 drwxr-xr-x 2 parseme parseme 4096 janv. 10 2024 aggelfoto123
5767576 drwxr-sr-x 2 root parseme 4096 juil. 17 10:38 agute.klints
5636807 drwxr-xr-x 2 parseme parseme 4096 janv. 10 2024 ainara.estarrona
5637002 drwxr-xr-x 2 parseme parseme 4096 janv. 10 2024 aixiu.an.zh
...
parseme@parseme:~/annotations$ ls -l ./yalda.yarandi/final
total 22000
-rw-r--r-- 1 root root 2373753 août 12 16:01 dev.udpipe-2.10-xpos-to-deprel.folia.xml
-rw-r--r-- 1 root root 2001046 août 12 16:01 test.udpipe-2.10-xpos-to-deprel.folia.xml
-rw-r--r-- 1 root root 12274118 août 12 16:01 train.udpipe-2.10-xpos-to-deprel.folia.xml
-rw-r--r-- 1 root root 1073626 août 12 16:01 tree_bank_without_VMWE.folia.xml
-rw-r--r-- 1 root root 4792400 août 12 16:01 tree_bank_with_VMWE.folia.xml
parseme@parseme:~/annotations$ ls -l ./jaka.cibej/
total 328
-rw-r--r-- 1 parseme parseme 333884 janv. 10 2024 parseme_sl_ssj500k_13412_13511_noIDs.parsemetsv.folia.xml

Some directories have root as owner, another have parseme. In the former case the permissions are drwxr-sr-x, in the latter they are drwxr-xr-x.
Similarly, some files have root as owner, some others have parseme.

Given that users are always added via DJANGO interface, what causes the difference? And are the correct owners and permissions?
Could this be a reason for the unstability of the server?

@proycon
Copy link
Owner

proycon commented Jan 15, 2025

Such inconsistent ownership occurs when the git repository is shared amongst
multiple (unix) users (in this case root and parseme). Note that this does not
refer to FLAT users but unix users.

Inside the FLAT container foliadocserve runs under UID/GID 100. How this maps
on the actual host depends a bit on your configuration. You can check this on the host
with ps aux | grep foliadocserve.

Ownership can get messed up if the FLAT container wasn't always consistently
started in the same way but as different users, or if the git repository at the
document root is pulled by some external process rather than foliadocserve and
that runs under another user. If you do a git pull, take care to use the same
user foliadocserve runs under.

You'll want to reset the ownership to make it consistent again using chown -R on the document root, setting it to the UID/GID foliadocserve runs under (which I guess is
parseme in your case, but make sure to check). Running as root, even within the
container, is not recommended.

This definitely can be a cause of errors, but that would be permission denied
errors where foliadocserve can't write a file to disk. Actually I'm thinking
back and it's quite likely this is the cause of
#191 .

Could this be a reason for the unstability of the server?

If you say the FLAT server goes down from time to time do you really mean the actual physical server goes down??
Or just FLAT/foliadocserve process. In any case, none of those should really occur because of this.

@proycon proycon self-assigned this Jan 15, 2025
@savary
Copy link
Author

savary commented Jan 23, 2025

Such inconsistent ownership occurs when the git repository is shared amongst multiple (unix) users (in this case root and parseme). Note that this does not refer to FLAT users but unix users.

Right.

Inside the FLAT container foliadocserve runs under UID/GID 100. How this maps on the actual host depends a bit on your configuration. You can check this on the host with ps aux | grep foliadocserve.

I'm not sure how to find it from this command?
parseme@parseme:~$ ps aux | grep foliadocserve
root 3849 0.0 0.0 820 0 ? Ss janv.14 0:00 runsv foliadocserve
root 4119 0.0 0.0 2224 516 ? S janv.14 0:01 tee /data/BKP/2025-01-14_13:19:47/foliadocserve.stdout
root 4120 0.0 0.0 2224 544 ? S janv.14 0:00 tee /data/BKP/2025-01-14_13:19:47/foliadocserve.stderr
root 4121 1.3 0.8 178152 134876 ? Sl janv.14 141:18 /usr/bin/python3 /usr/bin/foliadocserve -d /data/annotations --log /data/BKP/2025-01-14_13:19:47/foliadocserve.log -p 8080 --git
parseme 4144594 0.0 0.0 6612 2424 pts/0 S+ 13:57 0:00 grep --color=auto foliadocserve

Note that the parseme user has an UID which does not appear above:
parseme@parseme:~$ id -u parseme
1004

Ownership can get messed up if the FLAT container wasn't always consistently started in the same way but as different users, or if the git repository at the document root is pulled by some external process rather than foliadocserve and that runs under another user. If you do a git pull, take care to use the same user foliadocserve runs under.

Sorry, it is not quite clear to me. We indeed use git to version the annotations folder:

parseme@parseme:~$ cd annotations/
parseme@parseme:~/annotations$ git config --get remote.origin.url
git@gitlab.com:parseme/annotations.git

I think it is launched by cron, since in the logs I see:

sudo grep CRON /var/log/syslog
...
Jan 23 11:00:01 parseme CRON[3871233]: (parseme) CMD (/home/parseme/annotations/countMWEs.py \ >/home/parseme/annotations/.mwe-count.json)
Jan 23 11:00:01 parseme CRON[3871234]: (tuanbui) CMD (/home/parseme/parseme/updateGitlabWithAnnotations.sh)
Jan 23 11:00:01 parseme CRON[3871235]: (parseme) CMD \ (/home/parseme/annotations/.settingsCommitHack/updateGitlabWithSettings.sh)
Jan 23 11:00:01 parseme CRON[3871226]: (CRON) info (No MTA installed, discarding output)
Jan 23 11:00:01 parseme CRON[3871238]: (parseme) CMD (/home/parseme/annotations/updateGitlabWithAnnotations.sh)

And here is what the script contains:

parseme@parseme:~/annotations$ cat updateGitlabWithAnnotations.sh
#! /bin/sh
export LANGUAGE=C
export LANG=C
export LC_ALL=C
HERE="$(cd "$(dirname "$0")" && pwd)"
DATE="$(date --rfc-3339=s)"

set -o nounset # Using "$UNDEF" var raises error
set -o errexit # Exit on error, do not continue quietly

cd "$HERE"
exec 1>.git-commit-log.stdout
exec 2>.git-commit-log.stderr

echo "$DATE" >&2
git pull
modified_userdirs="$(git status | grep modified: | awk '{print $2}' | grep '/' | sed 's@/.*@@g' | sort -u | awk 'BEGIN{ORS=" "} 1' | sed 's@ *$@@g')"
git add * # commit all files in $HERE that do not start with "."
git commit -am "Auto-commit: $DATE" -m "Modified: [$modified_userdirs]" || true
git push

How do I know out of this which user launches git pull and git push?

In the logs it looks like it is tuanbui, right (line 3871234 above)? Which is probably not what is expected.
But even so, the folders and files in annotations\ belong either to parseme or to root, not to tuanbui.

You'll want to reset the ownership to make it consistent again using chown -R on the document root, setting it to the UID/GID foliadocserve runs under (which I guess is parseme in your case, but make sure to check). Running as root, even within the container, is not recommended.

I understand, but again I'm not quite sure. The docker image was set up by a colleague who moved somewhere else. He left a good documentation though and there I see that to restart the server I have to run:
sudo docker ps
Does it mean that the annotation directories and files will have root as owner (I don't think so).

This definitely can be a cause of errors, but that would be permission denied errors where foliadocserve can't write a file to disk. Actually I'm thinking back and it's quite likely this is the cause of #191

OK, so I have now given ownership to parseme to all the folders and files in annotations? parseme@parseme:/annotations$ sudo chown -R parseme:parseme * ... parseme@parseme:/annotations$ ll | head -10 total 1776 drwxrwsr-x 357 parseme parseme 20480 janv. 14 22:32 ./ drwxr-x--- 9 parseme parseme 4096 janv. 14 15:49 ../ drwxr-xr-x 2 parseme parseme 4096 janv. 10 2024 Abbas/ drwxr-xr-x 2 parseme parseme 4096 janv. 10 2024 abdelati.hawwari/ drwxr-xr-x 3 parseme parseme 4096 janv. 10 2024 abigail.walsh/ drwxr-xr-x 2 parseme parseme 4096 janv. 10 2024 adela.tocaru/ drwxr-sr-x 2 parseme parseme 4096 juil. 10 2024 adelina.cerpja/ drwxr-sr-x 2 parseme parseme 4096 juil. 10 2024 adina.duca/ drwxr-xr-x 4 parseme parseme 4096 janv. 10 2024 agata.savary/`

And what would be the right permissions? Note that currently we still have drwxr-xr-x and drwxr-sr-x in folders. And for files we have:
parseme@parseme:~/annotations$ ll */* | cut -d' ' -f1 | grep -E '^\-' | sort -u
-rw-------
-rw-r--r--
-rw-rw-r--
What does FLAT require?

Could this be a reason for the unstability of the server?

If you say the FLAT server goes down from time to time do you really mean the actual physical server goes down?? Or just FLAT/foliadocserve process. In any case, none of those should really occur because of this.

I'm actually not quite sure. Maybe we also have issues with the physical server from time to time. I will go on following this issue.

Thank you very much, this is enlightening.

@proycon
Copy link
Owner

proycon commented Jan 24, 2025 via email

@savary
Copy link
Author

savary commented Feb 7, 2025

Thanks for all the details. I'm talking to my IT staff about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants