Glove model - Draft Pull Request #189

raphaelsty · 2019-12-09T23:13:08Z

Hi ☺️,

I know that everyone is waiting for glove but this work still in progress. This draft pull request is linked to the issue #148.

Remarks on the model:
For reasons of simplicity and transparency in the model, I did not convert the tokens into an index. I used the words directly as an identifier for dictionaries.

It's possible to initialize the weights of the words with pre-trained ones thanks to the method load.

I have few questions:
How to avoid Lambda func in collections.defaultdict ? Is this really a problem here?

Do you think that the output of the model transform_one()`` and ``fit_transform_one() is the right one? I mean, a dictionary dictionary: {'weather': {0: 1.281993..., 1: 1.148658...}}

Remarks on the code:
I have to benchmark the model to make sure that the results are good and that I have not made any mistakes. I will find computer resources to run it on wikipedia as the authors did.

Feel free to review my code and give me feedback.

Raphaël

MaxHalford · 2019-12-10T07:29:17Z

creme/decomposition/glove.py

+__all__ = ['Glove']
+
+
+class Glove(base.Transformer, vectorize.VectorizerMixin):


Can you call it GloVe?

MaxHalford · 2019-12-10T07:30:15Z

creme/decomposition/glove.py

+        self.t = t
+        self.window_size = window_size
+        self.cooccurrences = defaultdict(float)
+        self.weights = defaultdict(


Yeah you need to use a function or functools.partial. The issue with lambda is that it means the class can't be pickled, which is important :)

MaxHalford · 2019-12-10T07:31:37Z

creme/decomposition/glove.py

+        """
+        # Extracts words of the document as a list of words:
+        tokens = self.tokenizer(self.preprocess(self._get_text(tokens)))
+        return {token: self.weights[token] for token in tokens}


I think we should return the average of the works embeddings. I'm not sure what most people do but I guess averaging is the way to go.

MaxHalford · 2019-12-10T07:32:44Z

creme/decomposition/glove.py

+            path (str): path of the weights of the pretrained model.
+
+        """
+        file = open(path,'r')


What format is this function assuming the file is in? If the format has a name could you please specify it in the comment? :)

MaxHalford reviewed Feb 4, 2020

View reviewed changes

raphaelsty closed this Feb 29, 2020

raphaelsty force-pushed the master branch from 1db2a76 to 3265cc4 Compare February 29, 2020 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glove model - Draft Pull Request #189

Glove model - Draft Pull Request #189

raphaelsty commented Dec 9, 2019 •

edited

Loading

MaxHalford Dec 10, 2019

MaxHalford Dec 10, 2019

MaxHalford Dec 10, 2019

MaxHalford Dec 10, 2019

		__all__ = ['Glove']


		class Glove(base.Transformer, vectorize.VectorizerMixin):

Glove model - Draft Pull Request #189

Glove model - Draft Pull Request #189

Conversation

raphaelsty commented Dec 9, 2019 • edited Loading

MaxHalford Dec 10, 2019

Choose a reason for hiding this comment

MaxHalford Dec 10, 2019

Choose a reason for hiding this comment

MaxHalford Dec 10, 2019

Choose a reason for hiding this comment

MaxHalford Dec 10, 2019

Choose a reason for hiding this comment

raphaelsty commented Dec 9, 2019 •

edited

Loading