Commit 25cbbe6
[SPARK-17548][MLLIB] Word2VecModel.findSynonyms no longer spuriously rejects the best match when invoked with a vector
## What changes were proposed in this pull request?
This pull request changes the behavior of `Word2VecModel.findSynonyms` so that it will not spuriously reject the best match when invoked with a vector that does not correspond to a word in the model's vocabulary. Instead of blindly discarding the best match, the changed implementation discards a match that corresponds to the query word (in cases where `findSynonyms` is invoked with a word) or that has an identical angle to the query vector.
## How was this patch tested?
I added a test to `Word2VecSuite` to ensure that the word with the most similar vector from a supplied vector would not be spuriously rejected.
Author: William Benton <willb@redhat.com>
Closes #15105 from willb/fix/findSynonyms.1 parent f15d41b commit 25cbbe6
File tree
5 files changed
+83
-24
lines changed- mllib/src
- main/scala/org/apache/spark
- mllib
- api/python
- feature
- ml/feature
- test/scala/org/apache/spark/mllib/feature
- python/pyspark/mllib
5 files changed
+83
-24
lines changedLines changed: 11 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
221 | 221 | | |
222 | 222 | | |
223 | 223 | | |
224 | | - | |
225 | | - | |
226 | | - | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
227 | 227 | | |
228 | 228 | | |
229 | 229 | | |
230 | | - | |
| 230 | + | |
| 231 | + | |
231 | 232 | | |
232 | 233 | | |
233 | 234 | | |
234 | | - | |
235 | | - | |
236 | | - | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
237 | 239 | | |
238 | 240 | | |
239 | | - | |
| 241 | + | |
240 | 242 | | |
241 | | - | |
| 243 | + | |
242 | 244 | | |
243 | 245 | | |
244 | 246 | | |
| |||
Lines changed: 19 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
46 | 52 | | |
47 | | - | |
48 | | - | |
| 53 | + | |
49 | 54 | | |
50 | 55 | | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
51 | 63 | | |
52 | | - | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
53 | 68 | | |
54 | 69 | | |
55 | 70 | | |
56 | 71 | | |
57 | 72 | | |
| 73 | + | |
58 | 74 | | |
59 | 75 | | |
60 | 76 | | |
| |||
Lines changed: 28 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
518 | 518 | | |
519 | 519 | | |
520 | 520 | | |
521 | | - | |
| 521 | + | |
522 | 522 | | |
523 | 523 | | |
524 | 524 | | |
525 | 525 | | |
526 | 526 | | |
527 | 527 | | |
528 | 528 | | |
529 | | - | |
| 529 | + | |
530 | 530 | | |
531 | 531 | | |
532 | 532 | | |
533 | | - | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
534 | 536 | | |
535 | 537 | | |
536 | 538 | | |
537 | 539 | | |
538 | 540 | | |
539 | 541 | | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
540 | 557 | | |
541 | 558 | | |
542 | 559 | | |
| |||
563 | 580 | | |
564 | 581 | | |
565 | 582 | | |
566 | | - | |
567 | | - | |
568 | | - | |
569 | | - | |
570 | | - | |
571 | | - | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
572 | 591 | | |
573 | 592 | | |
574 | 593 | | |
| |||
Lines changed: 16 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
68 | 69 | | |
69 | 70 | | |
70 | 71 | | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
71 | 87 | | |
72 | 88 | | |
73 | 89 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
544 | 544 | | |
545 | 545 | | |
546 | 546 | | |
547 | | - | |
548 | | - | |
| 547 | + | |
549 | 548 | | |
550 | 549 | | |
551 | 550 | | |
| |||
567 | 566 | | |
568 | 567 | | |
569 | 568 | | |
| 569 | + | |
| 570 | + | |
570 | 571 | | |
571 | 572 | | |
572 | 573 | | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
573 | 578 | | |
574 | 579 | | |
575 | 580 | | |
576 | | - | |
| 581 | + | |
577 | 582 | | |
578 | 583 | | |
579 | 584 | | |
| |||
591 | 596 | | |
592 | 597 | | |
593 | 598 | | |
| 599 | + | |
594 | 600 | | |
595 | 601 | | |
596 | 602 | | |
| |||
0 commit comments