Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Vectorize ECDF's __call__ method #602

Merged
merged 5 commits into from
Feb 22, 2022
Merged

Conversation

Smit-create
Copy link
Member

Closes #97

cc @jstac

@pep8speaks
Copy link

pep8speaks commented Feb 11, 2022

Hello @Smit-create! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-02-15 10:45:20 UTC

@Smit-create Smit-create changed the title ENH: Vectorize ECDF's __call__ function ENH: Vectorize ECDF's __call__ method Feb 11, 2022
@coveralls
Copy link

coveralls commented Feb 14, 2022

Coverage Status

Coverage increased (+0.01%) to 94.461% when pulling 6cc08be on Smit-create:issue_97 into 5be42da on QuantEcon:master.

@jstac
Copy link
Contributor

jstac commented Feb 14, 2022

Hi @Smit-create , many thanks for trying to fix this. What we're looking for is for the following code to execute successfully:

obs = np.random.randn(1000)
e = ECDF(obs)
t = np.linspace(-1, 1, 100)
print(e(t))

The last line should print the value of e(x) for each float x in the array t. (But it should also work of t is just a float.)

Your code still fails this test. I'd be glad if you take another try.

@Smit-create
Copy link
Member Author

I'd be glad if you take another try.

Thanks for the review! I tried to fix that in some other way.

@jstac
Copy link
Contributor

jstac commented Feb 15, 2022

Thanks @Smit-create , it's a nice try. I didn't know about that numpy function.

There's still a slight problem. I wonder if you can help figure it out. If I run

obs = np.random.randn(100)
e = ECDF(obs)
t = np.linspace(0, 1, 10)
print(e(t))

the output looks good, but the dtype of e(t) is np.object rather than float, which will cause performance problems.

What do you think about using these methods? https://numba.pydata.org/numba-doc/latest/user/vectorize.html

We routinely use numba so it's fine to add it as an import here.

@Smit-create
Copy link
Member Author

the output looks good, but the dtype of e(t) is np.object

Oh, I think that's easy to fix. We can use the following diff to fix that:

diff --git a/quantecon/ecdf.py b/quantecon/ecdf.py
index 12a2014..647ce7a 100644
--- a/quantecon/ecdf.py
+++ b/quantecon/ecdf.py
@@ -51,4 +51,4 @@ class ECDF:
         def f(a):
             return np.mean(self.observations <= a)
         vf = np.frompyfunc(f, 1, 1)
-        return vf(x)
+        return vf(x).astype(np.float)

With this above diff I get:

>>> import numpy as np
>>> from quantecon import ECDF
>>> obs = np.random.randn(100)
>>> e = ECDF(obs)
>>> t = np.linspace(0, 1, 10)
>>> print(e(t))
[0.49 0.56 0.6  0.65 0.69 0.71 0.75 0.77 0.8  0.83]
>>> e(t)
array([0.49, 0.56, 0.6 , 0.65, 0.69, 0.71, 0.75, 0.77, 0.8 , 0.83])
>>> t
array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

What do you think about using these methods?

I am not exactly sure that we need to use vectorize here, because NumPy docs say that it will use broadcasting rules, which in our case may not be applicable always. This commit 22d7f58 used the numpy's vectorize function which was failing on out test in #602 (comment)

@jstac
Copy link
Contributor

jstac commented Feb 22, 2022

Excellent job @Smit-create . Many thanks for this (and sorry for the slow response)!

@jstac jstac merged commit 417cf38 into QuantEcon:master Feb 22, 2022
@Smit-create
Copy link
Member Author

Thanks for the review!

@Smit-create Smit-create deleted the issue_97 branch February 22, 2022 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ecdf function needs to be vectorized
4 participants