-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled serialization fails due to write permission problems #532
Comments
This will also be a problem for #479 |
Workaround is to perform the serialization using the management command:
|
To allow SFM UI under Apache to write to /sfm-data, the solution is to run the various services as the sfm user in the sfm group. This includes sfm-ui, harvesters, exporters, and ELK. Branches for this ticket:
TODO:
For testing:
|
…zation back on. Turns off lazy initialization of sfm app when running under Apache.
New sfm-base image is built and correct sha added to Dockerfiles. PRs created. Automatic build of sfm-data setup. @Tanych - you should be good to go with testing. |
@justinlittman for the other containers such as |
Yes.
…On Nov 29, 2016 2:45 PM, "Victor" ***@***.***> wrote:
@justinlittman <https://github.com/justinlittman> for the other
containers such as sfm-ui,sfm-twittert-harvester, etc, I need to build
new image based on the sfm_532-user since the sfm-base has changed, right?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#532 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAj6L_jntfO0JI5f90jen0yyMzz25ZAPks5rDIDggaJpZM4Kyyut>
.
|
@justinlittman
data:
image: gwul/sfm-data:master
ui:
build:
context: ../sfm-ui
dockerfile: Dockerfile
twitterrestharvester:
build:
context: ../sfm-twitter-harvester
dockerfile: Dockerfile-rest-harvester TestCreate a twitter user timeline, permission error: arc.gz to /sfm-data/collection_set/fd7bb2aae00f481eb153529c9206089b/32155f0a8cf84528a14f1481fd2b85b8/2016/11/30/15/8206844747c14e9796a48a081504e84f-20161130155800635-00000-58-c16001fcbe30-8000.warc.gz
Exception in thread warc_processing_thread:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/opt/sfm-utils/sfmutils/harvester.py", line 508, in _process_warc_thread
os.makedirs(dest_path)
File "/usr/local/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/usr/local/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/usr/local/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/usr/local/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/usr/local/lib/python2.7/os.py", line 150, in makedirs
makedirs(head, mode)
File "/usr/local/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/sfm-data/collection_set/fd7bb2aae00f481eb153529c9206089b' Check the permission stats of File: ‘/sfm-data/’
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: 801h/2049d Inode: 3809951 Links: 6
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2016-11-30 10:28:11.737121154 -0500
Modify: 2016-11-29 16:19:04.482423185 -0500
Change: 2016-11-29 16:19:04.482423185 -0500
Birth: - Based on the Apache owers: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 75 0.0 0.0 4328 748 ? S 10:26 0:00 /bin/sh /usr/sbin/apachectl -DFOREGROUND
root 77 0.0 0.0 85052 6076 ? S 10:26 0:00 /usr/sbin/apache2 -DFOREGROUND
sfm 78 0.2 0.2 1537156 78792 ? Sl 10:26 0:09 /usr/sbin/apache2 -DFOREGROUND
sfm 79 0.0 0.0 2017060 12608 ? Sl 10:26 0:01 /usr/sbin/apache2 -DFOREGROUND
sfm 80 0.0 0.0 1820324 11820 ? Sl 10:26 0:01 /usr/sbin/apache2 -DFOREGROUND
|
It seems that the |
@justinlittman + groupadd -r sfm --gid=
groupadd: invalid group ID ''
+ export COLLECTION_SET_DIR=/sfm-data/collection_set
+ [ ! -d /sfm-data/collection_set ]
+ echo Creating collection_sets directory
+ mkdir -p /sfm-data/collection_set
Creating collection_sets directory
+ chown sfm:sfm /sfm-data/collection_set
chown: invalid user: 'sfm:sfm'
Creating containers directory
+ export CONTAINERS_DIR=/sfm-data/containers
+ [ ! -d /sfm-data/containers ]
+ echo Creating containers directory
+ mkdir -p /sfm-data/containers
+ chown sfm:sfm /sfm-data/containers
chown: invalid user: 'sfm:sfm'
Creating export directory
+ export EXPORT_DIR=/sfm-data/export
+ [ ! -d /sfm-data/export ]
+ echo Creating export directory
+ mkdir -p /sfm-data/export
+ chown sfm:sfm /sfm-data/export
chown: invalid user: 'sfm:sfm' We need add
|
Just pushed a correct for the example docker-compose.yml files: gwu-libraries/sfm-docker@4ca292c |
@justinlittman |
@Tanych I can't reproduce. Check to take sure your docker-compose is similar to below:
and
|
@justinlittman
The error could as follows:
Actually the permission of heritrix belongs to
|
Can you send me your docker-compose.yml?
…On Thu, Dec 1, 2016 at 11:50 AM, Victor ***@***.***> wrote:
@justinlittman <https://github.com/justinlittman>
The docker-compose.yml is the same for heritrix and data. I am running
sfm-ui on apache. I have cleared the volumes and images. To reproduce the
error,
- kill the heritrix and clear the volumes heritrix
- remove the image of heritrix
- build the image again
- start the sfm-heritrix and check the heritrix logs
The error could as follows:
error: exec: "/opt/heritrix/bin/heritrix": stat /opt/heritrix/bin/heritrix: permission denied
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#532 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAj6L9QQ6w4CcOW-0R5HWcpICQvVE_xvks5rDvrOgaJpZM4Kyyut>
.
|
sent through slack. |
I can't replicate:
Try: |
I have tried this but the problem still occurs. when I dig into the building log, all seems normal. |
@Tanych Problem solved:
|
SFM-UI Apache Instance Functions TestingBuilding all harvesters, sfm-ui and sfm-data using the corresponding dockerfile Harvesters and exportersThe testing results for running the harvesters:
Twitter Filter harvesterFor the twitter filter, adding a seed with
It might be the situation that the seed has no tweets.
Scheduling of serializationsetting of scheduling
Serialiazation records:
It works as expected. |
SFM-UI Runserver Functions TestingBuilding all harvesters, sfm-ui and sfm-data using the corresponding dockerfile sfm-ui building erroryml like the following: ui:
build:
context: ../sfm-ui
dockerfile: Dockerfile-runserver
Error with the
|
@justinlittman Relate issue |
@Tanych Rather than try to run runserver and Apache on port 80, I switched to port 8080. This eliminated the need to use setcap. See changes in sfm-ui and sfm-docker branches. |
Looking at the logic in the harvester, a Twitter filter that is blocking waiting for a tweet will not stop. See #579. |
@justinlittman Exporter common error
The default port for API client to find warc seems to be |
I've changed the Dockerfiles of all of the exporters so that they use port 8080. You'll need to rebuild all of the exporter images. |
@justinlittman
Whether all the tmp file should create in the /sfm-data/. |
Fixed with gwu-libraries/sfm-utils@4c5a2ac |
SFM-UI Runserver Instance Functions TestingBuilding all harvesters, sfm-ui and sfm-data using the corresponding dockerfile Harvesters and exportersThe testing results for running the harvesters:
sfm_heritrix portFor the Do I need to retest sfm-ui apache? |
@Tanych 8443 is probably blocked by WRLC. Just make sure sfm-ui apache builds and starts and test a single harvest and export. |
@justinlittman it's not the # Opens up the port for Heritrix admin console.
- "${HERITRIX_ADMIN_PORT}:8082"
I will test the apache in single harvester and exporter. |
@justinlittman I have verified the harvesters and exporters on the sfm-ui apache. All of them can work well. |
@Tanych Thanks for your persistence in testing this. |
…zation back on. Turns off lazy initialization of sfm app when running under Apache.
refs #532. Changes to use sfm user and group. Turns scheduled seriali…
This shows changes that need to be made to This shows changes that need to be made to |
Here are the steps for an upgrade:
It is possible that the directories in step 5 or 6 don't exist and an error is returned. That's OK. |
When run from Apache (instead of runserver), scheduled serialization fails like following:
Steps to reproduce:
Excepted result:
Serialization occurs according to schedule.
Actual results:
Serialization fails because serialization is running within Apache and therefore is running as a non-root user. However, non-root user does not have write permissions for /sfm-data.
The text was updated successfully, but these errors were encountered: