Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load figure caption from the CORD-19 data and add links to PMC. #16

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

adamjhn
Copy link
Collaborator

@adamjhn adamjhn commented Sep 21, 2020

Currently the links only work with PMC new url format, some papers are still using the old url format.

…ntly the links only work with PMC new url format, some papers are still using the old url format.
Copy link
Contributor

@ramcdougal ramcdougal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the to_datetime part of the example as that touches on the new functionality and is simple to understand. I wonder if the rest of the load_csv changes introduce too much complexity to be useful as an example, and that what you're really doing is the start of a research project that uses the pipeline (possibly as a sub-module?) but should be in its own repo?

Also, note the comment about fields vs list_fields; we ought to be able to have one list and just do the right thing based on data type.

field: _nicestr(paper[field])
if field in paper["field_order"]
else paper[field]
for d in ["field_order", "list_field_order"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do field_order and list_field_order need to be separate? I could imagine one might want things with or without lists in any order, not all the non-lists before all the lists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. can't we figure this out based on data type?

label = caption.split(":")[0]
fignum = int(re.findall("\d+", label)[0])
except:
fignum = int(k.lstrip("TABREF")) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think lstrip is what you're looking for here; consider:

>>> 'TARANTULA'.lstrip('TABREF')
'NTULA'

@@ -110,12 +110,18 @@ def _nicestr(item):
if any("," in thing for thing in item):
joiner = "; "
return joiner.join(thing for thing in item)
elif isinstance(item, datetime.datetime):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants