Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

records: refactor file handling #297

Merged
merged 1 commit into from
Sep 11, 2020
Merged

records: refactor file handling #297

merged 1 commit into from
Sep 11, 2020

Conversation

sebdeleze
Copy link
Contributor

This commit aims to refactor the way files are handled when added to a record, to be more generalistic and have specific treatment for documents without affecting others records types.

  • Moves specific methods create_thumbnail and create_fulltext_file in DocumentRecord
  • Tests file existence to determine if file have to be processed.
  • Handles files metadata in a more generalistic way.
  • Connects to file_uploaded and file_deleted signals with to different functions, to know if the file is deleted when processing.
  • Removes size property when harvesting from RERODOC, as this property is calculated when the file is created.

Co-Authored-by: Sébastien Délèze sebastien.deleze@rero.ch

@sebdeleze sebdeleze marked this pull request as ready for review September 9, 2020 13:55
@sebdeleze sebdeleze requested a review from zannkukai September 9, 2020 13:55
# If extract fulltext is disabled or file is not a PDF
if not current_app.config.get(
'SONAR_DOCUMENTS_EXTRACT_FULLTEXT_ON_IMPORT'
) or file.mimetype != 'application/pdf':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be interesting (in next PR/US) to extract fulltext from other mimetype (rtf, odt, ...)
In our repository we also have some 'application/pdf+a' mimeType.
Maybe interesting to have a general config parameter to list all mimeTypes available for fulltext extraction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for sure it could be interesting. I'll will notify that to Miguel.

sonar/modules/documents/api.py Outdated Show resolved Hide resolved
sonar/modules/documents/api.py Outdated Show resolved Hide resolved
This commit aims to refactor the way files are handled when added to a record, to be more generalistic and have specific treatment for documents without affecting others records types.

* Moves specific methods `create_thumbnail` and `create_fulltext_file` in `DocumentRecord`
* Tests file existence to determine if file have to be processed.
* Handles files metadata in a more generalistic way.
* Connects to `file_uploaded` and `file_deleted` signals with to different functions, to know if the file is deleted when processing.
* Removes `size` property when harvesting from RERODOC, as this property is calculated when the file is created.

Co-Authored-by: Sébastien Délèze <sebastien.deleze@rero.ch>
@sebdeleze sebdeleze merged commit 0c7f20f into rero:dev Sep 11, 2020
@sebdeleze sebdeleze deleted the sed-files-refactor branch September 11, 2020 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants