Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associated Works for Zenodo Records #48

Merged
merged 83 commits into from
May 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
49ddaed
Initial functions for determining zenodo all versions doi.
tjacovich Dec 7, 2021
d54afae
added all version dois to fetch_all_versions_doi return
tjacovich Dec 7, 2021
adb2119
Added call to doi.fetch_all_versions_doi() in task_process_new_citation.
tjacovich Dec 7, 2021
54cf13f
converted output of fetch_alll_versions_doi to dict
tjacovich Dec 8, 2021
32cf1f4
Added initial functions for fetching bibcodes from DataCite metadata.
tjacovich Dec 9, 2021
2ab9aa4
Added exception handling to _fetch_all_versions_doi. Modified calls t…
tjacovich Dec 10, 2021
75de0dc
housekeeping commit
tjacovich Dec 10, 2021
e1adaf5
Added error handling for all_versions_doi failure.
tjacovich Dec 14, 2021
123d150
Added initial function for calling associated works from db.
tjacovich Dec 14, 2021
75be1f2
commented additions to tasks.py
tjacovich Dec 16, 2021
966f96a
initial changes to forward.py
tjacovich Dec 17, 2021
11ae8f4
Initial modifications to forward.py and task_output_results
tjacovich Dec 17, 2021
ab9e31e
Updated doi.fetch_all_versions to include base doi in versions list. …
tjacovich Dec 21, 2021
8a7723d
updated db_version to db_version_bibcodes. Added new target citation …
tjacovich Dec 21, 2021
24af7b8
updated task_process_updated_associated_works to collect target_citat…
tjacovich Dec 21, 2021
113a327
Updated handling of associated works bibcodes so associated works are…
tjacovich Dec 22, 2021
93a62c6
Moved all_versions_doi check inside parsed_metadata check to avoid si…
tjacovich Dec 22, 2021
9fe905d
added initial alembic revision to include associated works in the pub…
tjacovich Dec 23, 2021
e1088de
Updated citation_target models and added upgrade and downgrade path f…
tjacovich Dec 27, 2021
64e986b
Updated db.store_event() to include associated works
tjacovich Dec 27, 2021
9845506
Added associated works to db.update_citation_target_metadata(). Added…
tjacovich Dec 27, 2021
002efb3
moved initial definition of associated_version_bibcodes outside of me…
tjacovich Dec 28, 2021
b302397
Fixed error using bibcodes in a function designed to accept dois. Mod…
tjacovich Dec 30, 2021
f5bf7c2
quick update to db.py
tjacovich Dec 30, 2021
1410128
Initial modifications to maintenance tasks for associated works.
tjacovich Jan 3, 2022
573a0c0
Updated celery queues in tasks.py. Added to task_maintenance_reevalu…
tjacovich Jan 3, 2022
82b9a2d
More modifications to task_maintenance_reevaluate_associated_works
tjacovich Jan 4, 2022
ac61944
Initial additions to test_forward.py for testing expected forwarded m…
tjacovich Jan 6, 2022
bfe9e58
Added parser arguments for maintenance evaluate associated works.
tjacovich Jan 7, 2022
cf84b95
Initial skeleton for testing task_updated_associated_works in test_ta…
tjacovich Jan 10, 2022
916a759
Fixed indentation error in test_forward.py
tjacovich Jan 13, 2022
2b757ba
merged alembic heads for dealing with github urls and the one defined…
tjacovich Jan 13, 2022
be5ee73
Updated solr version in complex testing environment to latest release…
tjacovich Jan 14, 2022
ad263a2
Updates to logging in associated works functions. fixed syntax error …
tjacovich Jan 20, 2022
ffe7a46
Updates to multiple parts of task.py to deal with updating associated…
tjacovich Jan 25, 2022
95c8ab8
Re-added associated works processing queue. Removed extra print state…
tjacovich Jan 25, 2022
57482dc
Modified processes related to associated works so that both the versi…
tjacovich Jan 28, 2022
4bf4211
Updated readme to include instructions for MAINTENANCE --eval-associa…
tjacovich Jan 28, 2022
d142d31
Updates to test_tasks and test_forward
tjacovich Jan 31, 2022
257c64a
Updated tasks.py to better match structure of other tasks. Added unit…
tjacovich Feb 2, 2022
6170fd7
Added unit test for doi.fetch_all_versions_doi
tjacovich Feb 2, 2022
fb41994
Added unit tests for forward.py
tjacovich Feb 2, 2022
e5ad320
Added unit tests for citation targets with associated works.
tjacovich Feb 3, 2022
72cc5d9
Moved task_update_associated_works to process-updated-citation queue.
tjacovich Feb 3, 2022
42ddab1
Fixed bug in _build_nonbib_record that caused only one data_links_row…
tjacovich Feb 3, 2022
5bf7453
Added unit test for task_process_updated_associated_work. Updated tas…
tjacovich Feb 4, 2022
d6a9bd0
Added unit test for task_output_results when bibcode_replaced = True.…
tjacovich Feb 4, 2022
d8ded76
Minor updated to test_task_output_results_if_bibcode_replaced
tjacovich Feb 4, 2022
75c0b7e
Updated readme to include race condition and additional functionality…
tjacovich Feb 17, 2022
2d06dce
Updated logger text and comments in task_maintenance_reevaluate_assoc…
tjacovich Feb 17, 2022
b545f9f
Updated README
tjacovich Feb 17, 2022
7a5e09e
Removed unused 'version' attribute from _extract_key_citation_target_…
tjacovich Feb 18, 2022
ae336bb
Added associated_works to task_update_citation and task_maintenance_*
tjacovich Feb 18, 2022
b4865ab
Refactored tasks.py to simplify collecting associated works across mu…
tjacovich Feb 18, 2022
aba7a67
bugfix: missing app in calls to db.get_citation_target_by_doi()
tjacovich Feb 18, 2022
a03f6e1
fixed call to attribute 'associated_works'
tjacovich Feb 18, 2022
5b7016f
Updated unit tests to reflect changes to delete and update tasks. Cha…
tjacovich Feb 22, 2022
e1d137d
Removed old code task_process_new_citation.
tjacovich Mar 22, 2022
f7c006a
Bugfix: undefined variable in maintenance reevaluate.
tjacovich Mar 22, 2022
a559635
Removed extra function declaration from rebase. Bumped astropy to 5.0…
tjacovich Mar 22, 2022
1ada51b
Added intitial skeleton of test for alembic.
tjacovich Mar 22, 2022
f3e2385
merged alembic revisions for manual_curation and associated_works.
tjacovich Mar 22, 2022
c8fd8cf
Added test for alembic head to unittests.
tjacovich Mar 22, 2022
d2adab9
Updated name of alembic revision
tjacovich Mar 22, 2022
20effa0
Updating merged maintenance tasks to handle associated works.
tjacovich Mar 23, 2022
24d0faf
Updates to associated works db functions and tasks. Removed self refe…
tjacovich Mar 23, 2022
e7c276f
incremental update
tjacovich Mar 24, 2022
c7b396b
Updated ASSOCIATED titles to align with Associated Works RFC.
tjacovich Mar 24, 2022
408d8eb
Updated unittests for forward.py
tjacovich Mar 24, 2022
20dcc6a
Modified citation update functions to remove self from associated wor…
tjacovich Mar 24, 2022
e685f54
Fixed issue with update citations when associated_works is None
tjacovich Mar 24, 2022
1c06ab8
Updates in response to PR comments. Changed all_doi to concept_doi. R…
tjacovich Apr 12, 2022
9900143
Merged updates to readme into associated_works
tjacovich Apr 12, 2022
84af1f7
Updates to readme.
tjacovich Apr 12, 2022
1f2a11a
Updated formatting to better align with acutter formatting.
tjacovich Apr 12, 2022
a0b7129
Fixed = formatting
tjacovich Apr 15, 2022
8ec7d8c
Merged upstream bugfix for manual curation into associate_works featu…
tjacovich Apr 18, 2022
ebd087f
modified curated_metadata alembic revision to resolve conflict with a…
tjacovich May 3, 2022
9451ba3
Merged PR #53
tjacovich May 3, 2022
f5d9956
Created initial merged alembic revision.
tjacovich May 3, 2022
19e5a3e
Modified alembic revisions to fix upgrade conflict with associated wo…
tjacovich May 3, 2022
0b13173
Added associated works field to populate_bibcode_columns(). Updated u…
tjacovich May 3, 2022
c87c7c5
Merge branch 'master' into associated_works to keep branch up to date.
tjacovich May 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ADSCitationCapture/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ def get_github_metadata(app, citation_url):
github_api = None
try:
path = urllib.parse.urlparse(citation_url).path.split("/")
github_api = app.conf['GITHUB_API_URL']+"repos/{}/{}/license".format(path[1],path[2])
github_api = app.conf['GITHUB_API_URL']+"repos/{}/{}/license".format(path[1],path[2]) if path[1] else None

except Exception as e:
msg = "Failed to parse :{} with Exception: {}".format(citation_url,e)
Expand Down
43 changes: 34 additions & 9 deletions ADSCitationCapture/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def store_event(app, data):
stored = True
return stored

def store_citation_target(app, citation_change, content_type, raw_metadata, parsed_metadata, status):
def store_citation_target(app, citation_change, content_type, raw_metadata, parsed_metadata, status, associated=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's homogenize spaces before/after equal signs (choose the same style as Roman's cookie cutter tool)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Homogenizing around no space when in function def or calls, but space when assigning to variable.

"""
Stores a new citation target in the DB
"""
Expand All @@ -51,6 +51,7 @@ def store_citation_target(app, citation_change, content_type, raw_metadata, pars
citation_target.curated_metadata = {}
citation_target.status = status
citation_target.bibcode = parsed_metadata.get("bibcode", None)
citation_target.associated_works = associated
session.add(citation_target)
try:
session.commit()
Expand All @@ -62,7 +63,7 @@ def store_citation_target(app, citation_change, content_type, raw_metadata, pars
stored = True
return stored

def _update_citation_target_metadata_session(session, content, raw_metadata, parsed_metadata, curated_metadata = {}, status=None, bibcode = None):
def _update_citation_target_metadata_session(session, content, raw_metadata, parsed_metadata, curated_metadata = {}, status=None, bibcode=None, associated=None):
"""
Actual calls to database session for update_citation_target_metadata
"""
Expand All @@ -79,6 +80,9 @@ def _update_citation_target_metadata_session(session, content, raw_metadata, par
citation_target.parsed_cited_metadata = parsed_metadata
citation_target.curated_metadata = curated_metadata
citation_target.bibcode = bibcode
if(citation_target.associated_works != associated):
logger.debug("associated works set for {} set from {} to {}".format(citation_target.content, citation_target.associated_works, associated))
citation_target.associated_works = associated
if status is not None:
citation_target.status = status
session.add(citation_target)
Expand All @@ -87,14 +91,14 @@ def _update_citation_target_metadata_session(session, content, raw_metadata, par
metadata_updated = True
return metadata_updated

def update_citation_target_metadata(app, content, raw_metadata, parsed_metadata, curated_metadata = {}, status=None, bibcode = None):
def update_citation_target_metadata(app, content, raw_metadata, parsed_metadata, curated_metadata={}, status=None, bibcode=None, associated=None):
"""
Update metadata for a citation target
"""
metadata_updated = False
if not bibcode: bibcode = parsed_metadata.get('bibcode', None)
with app.session_scope() as session:
metadata_updated = _update_citation_target_metadata_session(session, content, raw_metadata, parsed_metadata, curated_metadata, status, bibcode)
metadata_updated = _update_citation_target_metadata_session(session, content, raw_metadata, parsed_metadata, curated_metadata, status=status, bibcode=bibcode, associated=associated)
return metadata_updated


Expand Down Expand Up @@ -148,9 +152,11 @@ def _extract_key_citation_target_data(records_db, disable_filter=False):
{
'bibcode': record_db.bibcode,
'alternate_bibcode': record_db.parsed_cited_metadata.get('alternate_bibcode', []),
'version': record_db.parsed_cited_metadata.get('version', None),
'content': record_db.content,
'content_type': record_db.content_type,
'curated_metadata': record_db.curated_metadata if record_db.curated_metadata is not None else {},
'associated_works': record_db.associated_works,
}
for record_db in records_db
if disable_filter or record_db.parsed_cited_metadata.get('bibcode', None) is not None
Expand Down Expand Up @@ -200,13 +206,31 @@ def _get_citation_targets_session(session, only_status='REGISTERED'):
"""
if only_status:
records_db = session.query(CitationTarget).filter_by(status=only_status).all()
disable_filter = only_status in ['DISCARDED','EMITTABLE']
disable_filter = only_status in ['DISCARDED', 'EMITTABLE']
else:
records_db = session.query(CitationTarget).all()
disable_filter = True
records = _extract_key_citation_target_data(records_db, disable_filter=disable_filter)
return records


def get_associated_works_by_doi(app, all_versions_doi, only_status='REGISTERED'):
dois = all_versions_doi['versions']
concept_doi = all_versions_doi['concept_doi'].lower()
try:
versions = {"Version "+str(records.get('version', '')): records.get('bibcode', '') for records in get_citation_targets_by_doi(app, dois, only_status)}
root_ver = get_citation_targets_by_doi(app, [concept_doi], only_status)
if root_ver != []:
root_record = {'Software Source':root_ver[0]['bibcode']}
versions.update(root_record)
if versions != {}:
return versions
else:
logger.info('No associated works for %s in database', dois[0])
return None
except:
logger.info('No associated works for %s in database', dois[0])
return None

def get_citation_targets(app, only_status='REGISTERED'):
"""
Return a list of dict with all citation targets (or only the registered ones)
Expand Down Expand Up @@ -382,13 +406,14 @@ def populate_bibcode_column(main_session):
records = _get_citation_targets_session(main_session, only_status = None)
for record in records:
content = record.get('content', None)
bibcode = record
bibcode = record.get('bibcode', None)
associated = record.get('associate_works', {})
logger.debug("Collecting metadata for {}".format(record.get('content')))
citation_in_db = False
metadata = {}
metadata = _get_citation_target_metadata_session(main_session, content, citation_in_db, metadata, curate=False)
if metadata:
logger.debug("Updating Bibcode field for {}".format(record.get('content')))
logger.debug("Populating Bibcode field for {}".format(record.get('content')))
raw_metadata = metadata.get('raw', {})
parsed_metadata = metadata.get('parsed', {})
curated_metadata = metadata.get('curated',{})
Expand All @@ -402,5 +427,5 @@ def populate_bibcode_column(main_session):
else:
bibcode = parsed_metadata.get('bibcode',None)

_update_citation_target_metadata_session(main_session, content, raw_metadata, parsed_metadata, curated_metadata, status, bibcode)
_update_citation_target_metadata_session(main_session, content, raw_metadata, parsed_metadata, curated_metadata, status, bibcode, associated)

43 changes: 43 additions & 0 deletions ADSCitationCapture/doi.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,3 +209,46 @@ def _parse_metadata_zenodo_doi(raw_metadata):
parsed_metadata['bibcode'] = bibcode
return parsed_metadata

def fetch_all_versions_doi(base_doi_url, base_datacite_url, parsed_metadata):
"""
Takes zenodo parsed metadata and fetches DOI for all versions of zenodo repository
"""
return _fetch_all_versions_doi(base_doi_url, base_datacite_url, parsed_metadata)

def _fetch_all_versions_doi(base_doi_url, base_datacite_url, parsed_metadata):
"""
Takes zenodo parsed metadata and fetches DOI for base repository as well as DOI for all versions.
"""

if parsed_metadata.get('version_of', None) not in (None,"",[],''):
#check if target is a software version and not the base doi.
try:
logger.info("{} is version of: {}".format(parsed_metadata['bibcode'], parsed_metadata.get('version_of', None)))
#try to recover the base doi for the target
raw_metadata = fetch_metadata(base_doi_url, base_datacite_url, parsed_metadata.get('version_of')[0])
parsed_all_version = parse_metadata(raw_metadata)
if parsed_all_version is not None:
logger.debug("Found Associated Versions: {}".format(parsed_all_version.get('versions', None)))
#return dois for all versions of the target software and the base doi.
versions_json = {'concept_doi': parsed_metadata.get('version_of', None)[0], 'versions': parsed_all_version.get('versions', None)}
logger.debug("{} version dict is {}".format(parsed_metadata['bibcode'], parsed_all_version.get('versions', None)))
return versions_json
except Exception as e:
logger.exception("Failed to fetch metadata with Exception: {}".format(e))
return {'concept_doi': None, 'versions': None}

elif parsed_metadata.get('versions',None) not in (None, [],""):
#If citation target is base doi for software.
logger.info("{} is the root version".format(parsed_metadata["properties"]['DOI']))

try:
#return all versions including the base doi.
logger.debug("Found Associated Versions: {}".format(parsed_metadata.get('versions',None)))
return {'concept_doi': parsed_metadata.get('properties')['DOI'], 'versions': parsed_metadata.get('versions',None)}

except Exception as e:
logger.exception("Attempt to return versions failed with Exception: {}".format(e))
return {'concept_doi': None, 'versions': None}

else:
return {'concept_doi': None, 'versions': None}
13 changes: 8 additions & 5 deletions ADSCitationCapture/forward.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@


# =============================== FUNCTIONS ======================================= #
def build_record(app, citation_change, parsed_metadata, citations, entry_date=None):
def build_record(app, citation_change, parsed_metadata, citations, db_versions, entry_date=None):
if citation_change.content_type != CitationChangeContentType.doi:
raise Exception("Only DOI records can be forwarded to master")
# Extract required values
Expand Down Expand Up @@ -125,11 +125,11 @@ def build_record(app, citation_change, parsed_metadata, citations, entry_date=No
else:
status = 0 # active
record = DenormalizedRecord(**record_dict)
nonbib_record = _build_nonbib_record(app, citation_change, record, status)
nonbib_record = _build_nonbib_record(app, citation_change, record, db_versions, status)
return record, nonbib_record


def _build_nonbib_record(app, citation_change, record, status):
def _build_nonbib_record(app, citation_change, record, db_versions, status):
doi = citation_change.content
nonbib_record_dict = {
'status': status,
Expand All @@ -139,8 +139,8 @@ def _build_nonbib_record(app, citation_change, record, status):
'data': [],
'data_links_rows': [
{'link_type': 'ESOURCE', 'link_sub_type': 'PUB_HTML',
'url': [app.conf['DOI_URL'] + doi], 'title': [''], 'item_count':0}, # `item_count` only used for DATA and not ESOURCES
],
'url': [app.conf['DOI_URL'] + doi], 'title': [''], 'item_count':0},
], # `item_count` only used for DATA and not ESOURCES
'citation_count_norm': record.citation_count_norm,
'grants': [],
'ned_objects': [],
Expand All @@ -150,6 +150,9 @@ def _build_nonbib_record(app, citation_change, record, status):
'simbad_objects': [],
'total_link_counts': 0 # Only used for DATA and not for ESOURCES
}
if db_versions not in [{"":""}, None]:
nonbib_record_dict['data_links_rows'].append({'link_type': 'ASSOCIATED', 'link_sub_type': '',
'url': db_versions.values(), 'title': db_versions.keys(), 'item_count':0})
nonbib_record = NonBibRecord(**nonbib_record_dict)
nonbib_record.esource.extend(record.esources)
nonbib_record.reference.extend(record.reference)
Expand Down
1 change: 1 addition & 0 deletions ADSCitationCapture/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ class CitationTarget(Base):
parsed_cited_metadata = Column(JSONB)
curated_metadata = Column(JSONB)
status = Column(target_status_type)
associated_works = Column(JSONB)
created = Column(UTCDateTime, default=get_date)
updated = Column(UTCDateTime, onupdate=get_date)
citations = relationship("Citation", primaryjoin="CitationTarget.content==Citation.content")
Expand Down
Loading