Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from Tensorflow to Pytorch #115

Merged
merged 21 commits into from
Aug 23, 2018
Merged

Switch from Tensorflow to Pytorch #115

merged 21 commits into from
Aug 23, 2018

Conversation

senwu
Copy link
Collaborator

@senwu senwu commented Aug 22, 2018

In this PR, we switch the learning part to PyTorch and support Logistic Regression and LSTM.

One thing to be addressed here is the input of training/prediction function needs to be (candidates, features) instead of features.

@senwu senwu requested a review from lukehsiao August 22, 2018 09:41
@lukehsiao lukehsiao added the enhancement New feature or request label Aug 22, 2018
@lukehsiao lukehsiao added this to the v0.3.0 milestone Aug 22, 2018
.travis.yml Outdated
@@ -44,20 +50,30 @@ before_install:
- python --version
- pip --version

# Install PyTorch for Linux with Python 3.6 and no CUDA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Install PyTorch for Linux and no CUDA

Since this is doing both versions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -1,132 +1,84 @@
from __future__ import absolute_import, division, print_function, unicode_literals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe since we're only supporting Python 3.6+ we shouldn't need these, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

import torch.optim as optim

from .classifier import Classifier
from .utils import LabelBalancer, reshape_marginals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use explicit imports, not relative ones, e.g. from fonduer.learning.classifier import Classifier

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -90,6 +92,7 @@ def score(
f_beta = 0.0
return p, r, f_beta

# TODO: need update, this only works for debugging labeling functions now
def error_analysis(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


from fonduer.learning.classifier import Classifier
from fonduer.learning.utils import LabelBalancer, reshape_marginals
def SoftCrossEntropyLoss(input, target):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

and not torch.cuda.is_available()
):
self.model_kwargs["host_device"] = "CPU"
self.logger.info("GPU is not available, switching to CPU...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We notify when using CPU, should we also log when GPU is being used? Maybe just at DEBUG level.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one for GPU as well.

@@ -1,194 +1,118 @@
from __future__ import absolute_import, division, unicode_literals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for __future__

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

current_dir = new_dir
tries += 1

return config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be the same config in utils, right? Shouldn't be duplicated?

Add tests to tests/utils/test_config.py to make sure its behaving as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will also want to update the docs to show these parameters like we do for features.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

from scipy.sparse import issparse

from fonduer.learning.disc_learning import NoiseAwareModel
from fonduer.learning.disc_models.rnn.config import get_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should use utils config. See previous comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

return {v: k for k, v in self.d.iteritems()}


def mention_to_tokens(mention, token_type="words", lowercase=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@lukehsiao
Copy link
Contributor

lukehsiao commented Aug 22, 2018

I'm also seeing this warning when running make docs

WARNING: autodoc: failed to import module 'fonduer.learning'; the following exception was raised:
Traceback (most recent call last):
  File "/home/lwhsiao/repos/fonduer/.venv/lib/python3.6/site-packages/sphinx/ext/autodoc/importer.py", line 152, in import_module
    __import__(modname)
  File "/home/lwhsiao/repos/fonduer/fonduer/learning/__init__.py", line 1, in <module>
    from fonduer.learning.disc_models.logistic_regression import LogisticRegression
  File "/home/lwhsiao/repos/fonduer/fonduer/learning/disc_models/logistic_regression.py", line 8, in <module>
    from fonduer.learning.disc_learning import NoiseAwareModel
  File "/home/lwhsiao/repos/fonduer/fonduer/learning/disc_learning.py", line 26, in <module>
    class NoiseAwareModel(Classifier, nn.Module):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

@lukehsiao lukehsiao changed the title Pytorch Switch from Tensorflow to Pytorch Aug 22, 2018
lukehsiao added a commit that referenced this pull request Aug 23, 2018
This avoids the metaclass conflict of NoiseAwareModel.

See #115
This avoids the metaclass conflict of NoiseAwareModel.

See #115

The different learning parameters are explained in this section.

[TODO] give descriptions for the following::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help give some descriptions here?

Copy link
Contributor

@lukehsiao lukehsiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Any improvement we can make to the docs are always helpful, but we can look at that more earnestly in a future PR.

@senwu senwu merged commit 6817fdd into master Aug 23, 2018
@senwu senwu deleted the pytorch branch August 23, 2018 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants