Skip to content

Commit

Permalink
use inefficient-but-all-tests-pass 'uniform' for now, w/ big FIXME co…
Browse files Browse the repository at this point in the history
…mment
  • Loading branch information
gojomo committed Oct 6, 2020
1 parent e090400 commit 1edbb4c
Showing 1 changed file with 14 additions and 3 deletions.
17 changes: 14 additions & 3 deletions gensim/models/keyedvectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -1924,9 +1924,20 @@ def prep_vectors(target_shape, prior_vectors=None, seed=0, dtype=REAL):
return prior_vectors
target_count, vector_size = target_shape
rng = np.random.default_rng(seed=seed) # use new instance of numpy's recommended generator/algorithm
new_vectors = rng.random(target_shape, dtype=dtype) # [0.0, 1.0)
new_vectors *= 2.0 # [0.0, 2.0)
new_vectors -= 1.0 # [-1.0, 1.0)
# FIXME: `uniform` passes all tests, but generates temporary double-sized np.float64 array,

This comment has been minimized.

Copy link
@mpenkov

mpenkov Oct 10, 2020

Collaborator

This wall of text (and the comment below) is too intimidating. No one will want to read it. How about this summary?

FIXME: we use `uniform` here because it makes all the tests pass at the expense
of increased memory usage. The memory-efficient alternative (disabled below) reliably
fails `TestBackMappingTranslationMatric.test_infer_vector`. TODO debug/fix ASAP.

This comment has been minimized.

Copy link
@gojomo

gojomo Oct 10, 2020

Author Collaborator

This was temporary while the issue was being investigated, and is already gone from the PR given the conclusion of the investigation (the real issue was the flaky test) there.

# then cast-down ito right-sized np.float32, which means momentary 3x RAM usage on the model's
# largest structure (often GB in size)
new_vectors = rng.uniform(-1.0, 1.0, target_shape).astype(dtype)
# Meanwhile, this alternative, which by docs/reasoning/visual-inspection should be equivalent
# while never creating the unneeded oversized np.float64 array, passes all *2Vec class
# functional tests, but mysteriously (but reliably!) fails one obscure barely-sensible test
# of a fringe downstream functionality: `TestBackMappingTranslationMatric.test_infer_vector`.
# I'd adjust or jettison that test entirely *except* that the failure is *so* reliable, and
# *so* mysterious, that it may be warning of something very subtle. So for now, very briefly,
# sticking with the RAM-wasteful-but-all-tests-passing approach above, TODO debug/fix ASAP.
# new_vectors = rng.random(target_shape, dtype=dtype) # [0.0, 1.0)
# new_vectors *= 2.0 # [0.0, 2.0)
# new_vectors -= 1.0 # [-1.0, 1.0)
new_vectors /= vector_size
new_vectors[0:prior_vectors.shape[0], 0:prior_vectors.shape[1]] = prior_vectors
return new_vectors

0 comments on commit 1edbb4c

Please sign in to comment.