You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to ask regarding computing the syntactic distance between languages.
If I understood correctly, pre-computed syntactic distances obtained by
lang2vec.distance("syntactic", [l1, l2])
is the cosine distance between two languages, which should be properly replicated by
fromscipy.spatial.distanceimportcosinea=lang2vec.get_features(l1, "syntax_wals")[l1]
b=lang2vec.get_features(l2, "syntax_wals")[l2]
cosine(a, b)
And for missing features in a and b (which has -- as their values), I followed what is mentioned here: #7 (comment).
However, I find them mismatch. I also tried it with syntax_knn instead of syntax_wals, but they still mismatch.
And for some of the languages that are involved in pre-computed distances, they only have -- for all features, not actually being able to compute distances with other languages. (e.g., syntactic distance between frr, dan is provided, as shown as an example in README, but l2v.get_features("frr", "syntax_wals") gives a list of "--"s.)
Below are average Pearson correlation coefficients and pvalues between pre-computed and manually computed distances of each language.
Hi, thank you for your work!
I wanted to ask regarding computing the syntactic distance between languages.
If I understood correctly, pre-computed syntactic distances obtained by
is the cosine distance between two languages, which should be properly replicated by
And for missing features in
a
andb
(which has--
as their values), I followed what is mentioned here: #7 (comment).However, I find them mismatch. I also tried it with
syntax_knn
instead ofsyntax_wals
, but they still mismatch.And for some of the languages that are involved in pre-computed distances, they only have
--
for all features, not actually being able to compute distances with other languages. (e.g., syntactic distance betweenfrr
,dan
is provided, as shown as an example in README, butl2v.get_features("frr", "syntax_wals")
gives a list of"--"
s.)Below are average Pearson correlation coefficients and pvalues between pre-computed and manually computed distances of each language.
syntax_wals
& pre-computed : coef - 0.6325433738084123 / pvalue - 0.13051253893837714syntax_knn
& pre-computed : coef - 0.6257979297204636 / pvalue - 0.1392174749552544I would really appreciate it if you could provide more details on computing the distance if I missed something here!
Thank you so much :)
The text was updated successfully, but these errors were encountered: