You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I discovered a bug occurring when running examples/countries.py due to an incompatibility with the sklearn t-SNE. It can be resolved by making a few small changes.
Current Behavior
Part 1
Executing countries.py fails with File "/[env]/lib/python3.x/site-packages/sklearn/manifold/_t_sne.py", line 792, in _check_params_vs_input if self.perplexity >= X.shape[0] AttributeError: 'list' object has no attribute 'shape'
See Possible Solution for a fix
Part 2 (after having resolved Part 1)
File /[env]/lib/python3.x/site-packages/sklearn/manifold/_t_sne.py, line 793, in_check_params_vs_input raise ValueError("perplexity must be less than n_samples") ValueError: perplexity must be less than n_samples
This is because countries.py, line 28 calls fit_transform with an entity list of 22 objects (which t-SNE uses as n_samples).
Steps to Reproduce
Install rdf2vec and its dependencies
Run examples/countries.py
Environment
Operating system: Fedora Linux 35
pyRDF2Vec version: 0.2.3
Python version: 3.8
Possible Solution
The issue in Part 1 can be resolved by modifying TSNE._check_params_vs_input in /[env]/lib/python3.x/site-packages/sklearn/manifold/_t_sne.py.
Changing X.shape[0] to len(X) solves this particular problem and the code continues executing.
Part 2 can be resolved by setting the value for perplexity in sklearn/manifold/_t_sne.py: TSNE.__init__ to a value smaller than 22. Even 21.9 will work.
In the above example, we try to create embeddings for the 22 entities in samples/countries-cities/entities.tsv. TSNE throws an error because its perplexity value can't be higher than the number of entities.
Read this to understand the intuition behind perplexity in t-SNE.
Also, be cautious when using this modified version of t-SNE outside a dedicated environment for pyRDF2Vec as it'll likely cause problems.
The text was updated successfully, but these errors were encountered:
min(len(X), default_perplexity) might be a cleaner solution!
Good idea, but the value for perplexity has to be smaller than, not equal to, len(X).
The best workaround using your idea I can think of is instantiating TSNE something like this: X_tsne = TSNE(perplexity=len(x) - 0.01 if len(x) < 30 else 30). Kind of inelegant, but it'll do the job.
🐛 Bug
I discovered a bug occurring when running examples/countries.py due to an incompatibility with the sklearn t-SNE. It can be resolved by making a few small changes.
Current Behavior
Part 1
Executing countries.py fails with
File "/[env]/lib/python3.x/site-packages/sklearn/manifold/_t_sne.py", line 792, in _check_params_vs_input
if self.perplexity >= X.shape[0]
AttributeError: 'list' object has no attribute 'shape'
See Possible Solution for a fix
Part 2 (after having resolved Part 1)
File /[env]/lib/python3.x/site-packages/sklearn/manifold/_t_sne.py, line 793, in_check_params_vs_input
raise ValueError("perplexity must be less than n_samples")
ValueError: perplexity must be less than n_samples
This is because
countries.py
, line 28 callsfit_transform
with an entity list of 22 objects (which t-SNE uses asn_samples
).Steps to Reproduce
Environment
Possible Solution
The issue in Part 1 can be resolved by modifying
TSNE._check_params_vs_input
in/[env]/lib/python3.x/site-packages/sklearn/manifold/_t_sne.py
.Changing
X.shape[0]
tolen(X)
solves this particular problem and the code continues executing.Part 2 can be resolved by setting the value for
perplexity
insklearn/manifold/_t_sne.py: TSNE.__init__
to a value smaller than22
. Even21.9
will work.In the above example, we try to create embeddings for the 22 entities in
samples/countries-cities/entities.tsv
. TSNE throws an error because itsperplexity
value can't be higher than the number of entities.Read this to understand the intuition behind perplexity in t-SNE.
Also, be cautious when using this modified version of t-SNE outside a dedicated environment for pyRDF2Vec as it'll likely cause problems.
The text was updated successfully, but these errors were encountered: