-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load figure caption from the CORD-19 data and add links to PMC. #16
base: master
Are you sure you want to change the base?
Conversation
…ntly the links only work with PMC new url format, some papers are still using the old url format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the to_datetime
part of the example as that touches on the new functionality and is simple to understand. I wonder if the rest of the load_csv
changes introduce too much complexity to be useful as an example, and that what you're really doing is the start of a research project that uses the pipeline (possibly as a sub-module?) but should be in its own repo?
Also, note the comment about fields vs list_fields; we ought to be able to have one list and just do the right thing based on data type.
Project/pipeline_views.py
Outdated
field: _nicestr(paper[field]) | ||
if field in paper["field_order"] | ||
else paper[field] | ||
for d in ["field_order", "list_field_order"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do field_order and list_field_order need to be separate? I could imagine one might want things with or without lists in any order, not all the non-lists before all the lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. can't we figure this out based on data type?
examples/load_csv.py
Outdated
label = caption.split(":")[0] | ||
fignum = int(re.findall("\d+", label)[0]) | ||
except: | ||
fignum = int(k.lstrip("TABREF")) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think lstrip is what you're looking for here; consider:
>>> 'TARANTULA'.lstrip('TABREF')
'NTULA'
@@ -110,12 +110,18 @@ def _nicestr(item): | |||
if any("," in thing for thing in item): | |||
joiner = "; " | |||
return joiner.join(thing for thing in item) | |||
elif isinstance(item, datetime.datetime): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea.
Currently the links only work with PMC new url format, some papers are still using the old url format.