Add array_emd function #26

scottgigante · 2018-01-19T06:13:58Z

As discussed in #25

wmayner

Thanks for the PR!

Seems like there's pretty bad precision in certain cases, like array_emd([1], [2]). This should be 1, but it's 0.9666... on my machine. This has to do with NumPy's heuristics for bin sizes, which result in many bins even in small cases like that. I'm not sure how important this is, but it may be worth thinking about.

wmayner · 2018-01-19T18:44:06Z

pyemd/emd.pyx

+            histogram. Defaults to the range of the union of `first_array` 
+            and `second_array`. Note: if the given range is not a superset
+            of the default range, no warning will be given.
+


In general, trailing whitespace should be stripped. There's probably a plugin for your text editor that can do this automatically for you.

wmayner · 2018-01-19T19:26:25Z

pyemd/emd.pyx

@@ -45,6 +46,10 @@ def validate(first_histogram, second_histogram, distance_matrix):
    if (first_histogram.shape[0] != second_histogram.shape[0]):
        raise ValueError('Histogram lengths must be equal')

+def euclidean_pairwise_distance(x):


There should be two blank lines before top-level function definitions. There are automated tools for checking these sorts of things, called “linters”, such as pylint.

wmayner · 2018-01-19T21:39:18Z

pyemd/emd.pyx

+    return emd(first_histogram, 
+               second_histogram, 
+               distance_matrix, 
+               extra_mass_penalty)


This call should explicitly pass extra_mass_penalty as a keyword argument:

emd(..., extra_mass_penalty=extra_mass_penalty)

wmayner · 2018-01-19T21:40:25Z

README.rst

+    >>> first_array = [1,2,3,4]
+    >>> second_array = [2,3,4,5]
+    >>> array_emd(first_array, second_array, bins=2)
+    0.5


There should be a few unit tests added to test/test_pyemd.py, in addition to this.

wmayner · 2018-01-19T21:41:56Z

pyemd/emd.pyx

@@ -91,6 +96,76 @@ def emd(np.ndarray[np.float64_t, ndim=1, mode="c"] first_histogram,
                                    distance_matrix, 
                                    extra_mass_penalty)

+def array_emd(first_array,


I think this should be called emd_samples (if you can think of something better, I'm all ears), since the other emd functions also take arrays, and they all begin with emd.

wmayner

Looks good—just a couple of style nitpicks.

Last thing is to write some tests, and then I think this is good to go.

wmayner · 2018-01-19T22:38:05Z

pyemd/emd.pyx

                 max(np.max(first_array), np.max(second_array)))

+    if type(bins) == str:


isinstance(bins, str) should be preferred over explicitly checking type equality.

wmayner · 2018-01-19T22:39:08Z

pyemd/emd.pyx

+    if type(bins) == str:
+        hist, _ = np.histogram(np.concatenate([first_array,
+                                               second_array]),
+            range=range, bins=bins)


Nitpick: arguments should be aligned.

wmayner · 2018-01-19T22:40:15Z

pyemd/__init__.py

@@ -33,6 +33,14 @@
    >>> emd_with_flow(first_signature, second_signature, distance_matrix)
    (3.5, [[0.0, 0.0], [0.0, 1.0]])

+You can also calculate the EMD directly from two arrays:


Could elaborate and say “arrays of observations” or “samples” instead of “arrays” and mention that the histogram is computed automatically.

wmayner · 2018-01-19T22:43:08Z

pyemd/emd.pyx

    if distance == 'euclidean':
        distance = euclidean_pairwise_distance
-    
+


Nitpick: comments should be capitalized, to be consistent with the others.

scottgigante · 2018-01-19T22:44:59Z

Apologies, I didn’t intend for this to be done – I just need to push to go between my windows and Linux environments. Thanks for your comments – I’ll add a comment to the PR when I think it’s done.

wmayner · 2018-01-19T22:50:11Z

Gotcha, no worries. I'll wait to review until you comment.

scottgigante · 2018-01-22T01:54:18Z

Okay, I'm pretty happy with that now. Thanks for your patience!

wmayner · 2018-01-22T17:12:29Z

Great, looks good. Thank you!

Release v0.5.0 - Add the `emd_samples()` function (PR #26). - Clarify docstrings. - Update documentation in README. - Refactor tests. - Explicitly support Python 3.4 and 3.5.

wmayner · 2018-01-22T23:32:01Z

Released in v0.5.0.

Add array_emd function

ef57b4f

wmayner reviewed Jan 19, 2018

View reviewed changes

fix linting, add auto binsize

7df86c3

wmayner requested changes Jan 19, 2018

View reviewed changes

scottgigante added 2 commits January 21, 2018 20:38

add tests

6b8e950

fix formatting errors

c3d7446

scottgigante force-pushed the develop branch from 11257ba to c3d7446 Compare January 22, 2018 01:39

fix py27 float64 error

da5675e

wmayner approved these changes Jan 22, 2018

View reviewed changes

wmayner merged commit 56ea285 into wmayner:develop Jan 22, 2018

wmayner added a commit that referenced this pull request Jan 22, 2018

Merge tag '0.5.0' into develop

0de03ef

Release v0.5.0 - Add the `emd_samples()` function (PR #26). - Clarify docstrings. - Update documentation in README. - Refactor tests. - Explicitly support Python 3.4 and 3.5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add array_emd function #26

Add array_emd function #26

scottgigante commented Jan 19, 2018

wmayner left a comment •

edited

Loading

wmayner Jan 19, 2018

wmayner Jan 19, 2018

wmayner Jan 19, 2018

wmayner Jan 19, 2018

wmayner Jan 19, 2018

wmayner left a comment

wmayner Jan 19, 2018

wmayner Jan 19, 2018

wmayner Jan 19, 2018

wmayner Jan 19, 2018

scottgigante commented Jan 19, 2018 via email •

edited

Loading

wmayner commented Jan 19, 2018

scottgigante commented Jan 22, 2018

wmayner commented Jan 22, 2018

wmayner commented Jan 22, 2018

		max(np.max(first_array), np.max(second_array)))

		if type(bins) == str:

		if distance == 'euclidean':
		distance = euclidean_pairwise_distance

Add array_emd function #26

Add array_emd function #26

Conversation

scottgigante commented Jan 19, 2018

wmayner left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wmayner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottgigante commented Jan 19, 2018 via email • edited Loading

wmayner commented Jan 19, 2018

scottgigante commented Jan 22, 2018

wmayner commented Jan 22, 2018

wmayner commented Jan 22, 2018

wmayner left a comment •

edited

Loading

scottgigante commented Jan 19, 2018 via email •

edited

Loading