NGT performance #29

VarIr · 2019-09-09T11:16:40Z

Approx. neighbor search with ngtpy can be accelerated:

Enable AVX on MacOS (temporarily disabled due to upstream bug in NGT. It is already enabled on Linux.)
Use NGT's optimization step (until then, the method is actually (P)ANNG, not ONNG, I assume). Currently, this seems to be possible only via the cmd line tools, not via the Python API.
Set good default parameters for ONNG

VarIr · 2019-09-23T08:32:51Z

It seems ONNG can be enabled in ngtpy, but it is currently not documented. However, there is an example here: yahoojapan/NGT#30

VarIr · 2019-09-26T09:15:34Z

New NGT release 1.7.10 should fix this: https://github.com/yahoojapan/NGT/releases/tag/v1.7.10

VarIr · 2019-11-11T09:20:55Z

1.8.0 brought docs for ONNG. It is already activate here, but index building is extremely slow due to difficult parameterization. Need to check.

jaytimbadia · 2021-01-24T14:19:05Z

Approx. neighbor search with ngtpy can be accelerated:

Enable AVX on MacOS (temporarily disabled due to upstream bug in NGT. It is already enabled on Linux.)

Use NGT's optimization step (until then, the method is actually (P)ANNG, not ONNG, I assume). Currently, this seems to be possible only via the cmd line tools, not via the Python API.

Set good default parameters for ONNG

Hi,
Seems like really good work.

I am using bert to find semantic similarity using cosine distance, but it may lead to high dimension problem.
So can I use hubness here, I mean will it make bert embedding any better?

Thankyou!

VarIr · 2021-01-24T16:04:09Z

Thanks for your interest. That's something I've been thinking about, but never found time to actually check.

BERT embeddings are typically high-dimensional, so hubness might play a role.
You could first estimate the intrinsic dimension of these embeddings (b/c this actually drives hubness), e.g. with this method. If this is much lower than the embedding dimension, it's unlikely that hubness reduction leads to improvements.
Alternatively, you could directly compare performance in your tasks with and without hubness reduction.
If there's a performance improvement, I'd be curious to know.

jaytimbadia · 2021-01-24T17:00:57Z

Thanks for your interest. That's something I've been thinking about, but never found time to actually check.

BERT embeddings are typically high-dimensional, so hubness might play a role.
You could first estimate the intrinsic dimension of these embeddings (b/c this actually drives hubness), e.g. with this method. If this is much lower than the embedding dimension, it's unlikely that hubness reduction leads to improvements.
Alternatively, you could directly compare performance in your tasks with and without hubness reduction.
If there's a performance improvement, I'd be curious to know.

Thank you so much for the reply.
I calculated the intrinsic dimension for bert and its coming out to be 18, very low than I expected.
Anyways, one question, can we use intrinsic dimensionality to check the quality of embeddings we generate?
For Eg: bert -> (100, 768) has pretty low, so what does it mean, while some random matrix -> (100, 768) I gave had around 155. So what it means is bert quite well trained?

If yes, we can use this, I mean whenever we generate embeddings we can check its intrinsic dimension if less, so less constraint it has, easier to fine-tune further, right?

I would love to know your thoughts!!

VarIr · 2021-01-24T17:14:46Z

18 isn't particularly high, but we've seen datasets, where this came with high hubness (see e.g. p. 2885/6 of this previous paper.
I am not aware of research directly linking intrinsic dimension to the quality (however this would be defined, anyway) of embeddings. Interesting research questions you pose there :)

VarIr added the enhancement New feature or request label Sep 9, 2019

VarIr self-assigned this Sep 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NGT performance #29

NGT performance #29

VarIr commented Sep 9, 2019 •

edited

Loading

VarIr commented Sep 23, 2019

VarIr commented Sep 26, 2019

VarIr commented Nov 11, 2019

jaytimbadia commented Jan 24, 2021

VarIr commented Jan 24, 2021

jaytimbadia commented Jan 24, 2021

VarIr commented Jan 24, 2021

NGT performance #29

NGT performance #29

Comments

VarIr commented Sep 9, 2019 • edited Loading

VarIr commented Sep 23, 2019

VarIr commented Sep 26, 2019

VarIr commented Nov 11, 2019

jaytimbadia commented Jan 24, 2021

VarIr commented Jan 24, 2021

jaytimbadia commented Jan 24, 2021

VarIr commented Jan 24, 2021

VarIr commented Sep 9, 2019 •

edited

Loading