Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up make_holiday_features by up to 35% #1962

Merged
merged 2 commits into from
Aug 28, 2021

Conversation

MarcoGorelli
Copy link
Contributor

First, thank you for this awesome library! While looking through it, I noticed some potential performance improvements relating to how pandas is used.

To demonstrate the improvements:

In [3]: holidays
Out[3]: 
     0  2012-06-06              seans-bday  0.0  1.0
0    1  2013-06-06              seans-bday  0.0  1.0
1    2  2012-01-01            NewYear'sDay  NaN  NaN
2    3  2012-01-02  NewYear'sDay(Observed)  NaN  NaN
3    4  2012-01-16  MartinLutherKingJr.Day  NaN  NaN
4    5  2012-02-20    Washington'sBirthday  NaN  NaN
5    6  2012-05-28             MemorialDay  NaN  NaN
6    7  2012-07-04         IndependenceDay  NaN  NaN
7    8  2012-09-03                LaborDay  NaN  NaN
8    9  2012-10-08             ColumbusDay  NaN  NaN
9   10  2012-11-11             VeteransDay  NaN  NaN
10  11  2012-11-12   VeteransDay(Observed)  NaN  NaN
11  12  2012-11-22            Thanksgiving  NaN  NaN
12  13  2012-12-25            ChristmasDay  NaN  NaN
13  14  2013-01-01            NewYear'sDay  NaN  NaN
14  15  2013-01-21  MartinLutherKingJr.Day  NaN  NaN
15  16  2013-02-18    Washington'sBirthday  NaN  NaN
16  17  2013-05-27             MemorialDay  NaN  NaN
17  18  2013-07-04         IndependenceDay  NaN  NaN
18  19  2013-09-02                LaborDay  NaN  NaN
19  20  2013-10-14             ColumbusDay  NaN  NaN
20  21  2013-11-11             VeteransDay  NaN  NaN
21  22  2013-11-28            Thanksgiving  NaN  NaN
22  23  2013-12-25            ChristmasDay  NaN  NaN
23  24  2014-01-01            NewYear'sDay  NaN  NaN
24  25  2014-01-20  MartinLutherKingJr.Day  NaN  NaN
25  26  2014-02-17    Washington'sBirthday  NaN  NaN
26  27  2014-05-26             MemorialDay  NaN  NaN
27  28  2014-07-04         IndependenceDay  NaN  NaN
28  29  2014-09-01                LaborDay  NaN  NaN
29  30  2014-10-13             ColumbusDay  NaN  NaN
30  31  2014-11-11             VeteransDay  NaN  NaN
31  32  2014-11-27            Thanksgiving  NaN  NaN
32  33  2014-12-25            ChristmasDay  NaN  NaN

In [4]: %%timeit
   ...: for idx, row in holidays.iterrows(): pass
   ...: 
   ...: 
2.44 ms ± 20.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %%timeit
   ...: for row in holidays.itertuples(): pass
   ...: 
   ...: 
417 µs ± 3.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [7]: dates
Out[7]: 
0     2012-05-18
1     2012-05-21
2     2012-05-22
3     2012-05-23
4     2012-05-24
         ...    
505   2014-05-23
506   2014-05-27
507   2014-05-28
508   2014-05-29
509   2014-05-30
Name: ds, Length: 510, dtype: datetime64[ns]

In [8]: %%timeit
   ...: pd.DatetimeIndex(dates.apply(lambda x: x.date()))
   ...: 
   ...: 
2.17 ms ± 49 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [9]: %%timeit
   ...: pd.DatetimeIndex(dates.dt.date)
   ...: 
   ...: 
334 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

To demonstrate using the setup of the test_custom_seasonality test:

On master:

In [3]: %%timeit
   ...: m.make_holiday_features(m.history['ds'], holidays)
   ...: 
   ...: 
2.79 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

On this branch:

In [2]: %%timeit
   ...: m.make_holiday_features(m.history['ds'], holidays)
   ...: 
   ...: 
1.79 ms ± 28.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@facebook-github-bot
Copy link
Contributor

Hi @MarcoGorelli!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@tcuongd tcuongd self-requested a review August 25, 2021 22:49
@tcuongd tcuongd self-assigned this Aug 25, 2021
Copy link
Collaborator

@tcuongd tcuongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me, thanks heaps for this Marco!! Green CI checks as well.

I noticed you're a pandas maintainer as well, so if you notice pandas being botched anywhere else in the package please let us know! 😂

@tcuongd tcuongd merged commit 17dbb86 into facebook:master Aug 28, 2021
@MarcoGorelli MarcoGorelli deleted the speed-up-make-holidays-feature branch August 28, 2021 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants