-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore (base, export): Soft code text_featurize
method; ensure unique feature names
#94
Conversation
text_featurize
method; ensure unique feature names
…export test diagnostics
@kjappelbaum Are there any changes you'd like to see here? |
if __name__ == "__main__": | ||
repetitive_labels, all_labels = get_repetitive_labels(FEATURIZER) | ||
|
||
print("Diagnostics:") | ||
print("=" * 50) | ||
print("Number of featurizers implemented:", len(FEATURIZER.featurizers)) | ||
print("=" * 50) | ||
print("Number of labels:", len(all_labels)) | ||
print("Number of repeated labels:", len(repetitive_labels)) | ||
print("Number of unique labels:", len(set(all_labels))) | ||
|
||
if len(repetitive_labels) == 0: | ||
print("=" * 50) | ||
print("All detected labels:\n") | ||
|
||
for i, label in enumerate(all_labels, 1): | ||
print(f"{i:.3g}.", label) | ||
exit() | ||
|
||
for k, v in repetitive_labels.items(): | ||
if v["count"] > 1: | ||
print(f"{k}", v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you run this now, are there still repetitive labels? Could we run this in the GitHub actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. All is good. You can run on your end and confirm. When I first ran it, there were, so I actually had fix things to ensure there weren't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! What do you think of adding it as something we run in the GitHub actions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would make sense. I have never setup Github Actions before, so this should be educative for me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! What do you think of adding it as something we run in the GitHub actions?
This is done 👍. Check it out. Also, as you might observe, I had to refactor some of the tests.
.github/workflows/tests.yml
Outdated
@@ -2,8 +2,9 @@ name: Tests | |||
|
|||
on: | |||
push: | |||
branches: [ main ] | |||
branches: [ main, text_featurize ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this should not be there in general? Why not use workflow_dispatch
instead for manual tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll check out this workflow_dispatch
.
.github/workflows/tests.yml
Outdated
@@ -40,6 +41,8 @@ jobs: | |||
- name: Install dependencies | |||
run: | | |||
pip install -e ".[symmetry,tests]" | |||
- name: Export pre-tests | |||
run: python src/chemcaption/export/pre_test_export.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this "only" prints stuff, right?
That is, it would not raise an error in the actions if we found repetitive labels? For this we would need to raise an exception in the python code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I didn't think of that. I'll do that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed now.
…o pre_export_tests.py
.github/workflows/tests.yml
Outdated
@@ -40,6 +42,8 @@ jobs: | |||
- name: Install dependencies | |||
run: | | |||
pip install -e ".[symmetry,tests]" | |||
- name: Export pre-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name "Export pre-tests" ideal? I understand that the main purpose is to check for uniqueness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed now. Renamed as Test for label uniqueness
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks all good to me here. I didn't run the unittests but I assume that they pass
The only test failing here is the |
ok, we can open a new issue for that and merge that soon. |
@@ -2,8 +2,10 @@ name: Tests | |||
|
|||
on: | |||
push: | |||
branches: [ main ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in principle ok, but was there any particular reason for that change?
This PR should:
text_featurize
featurizer methods. This will allow flexibility with respect to the parts of speech for molecular text.Should close issues #74 and #90.