Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add sparse top-down reconciliation via TopDownSparse #277

Merged
merged 9 commits into from
Aug 15, 2024

Conversation

christophertitchen
Copy link
Contributor

@christophertitchen christophertitchen commented Jul 25, 2024

This PR adds a top-down reconciliation class called TopDownSparse for sparse "summing" matrices, $S$, to significantly reduce the wall time for reconciling large hierarchical structures using disaggregation proportions, which now take $98 \%$ to $100 \%$ less time to reconcile.

TopDownSparse uses the _get_PW_matrices method to efficiently instantiate and allocate $P$ as a COO matrix and then convert it to a CSR matrix, which is much faster than using a LIL matrix for constructing a large sparse matrix. Although the sparsity structure of $P$ for top-down reconciliation means that a CSC matrix would outperform a CSR matrix for $SP\hat{y}_{h}$, it is not worth converting $S$ to a CSC matrix. As with #276, the "weight" matrix, $W$, is only instantiated and allocated if performing probabilistic reconciliation, not for scenarios where only point reconciliation is required.

The same tests for TopDown in nbs/methods.ipynb are now evaluated on TopDownSparse as well, and the class supports all of the disaggregation methods of TopDown. This required a small change in _get_child_nodes to check for the presence of a sparse matrix if disaggregating using forecast proportions.

Reference Method method Formula
Gross and Sohl, 1990 A average_proportions $$p_j=\frac{1}{T} \sum_{t=1}^T \frac{y_{j, t}}{y_t}$$
Gross and Sohl, 1990 F proportion_averages $$p_j=\sum_{t=1}^T \frac{y_{j, t}}{T} / \sum_{t=1}^T \frac{y_t}{T}$$
Athanasopoulos, Ahmed, and Hyndman, 2009 FP forecast_proportions $$p_j=\prod_{\ell=0}^{K-1} \frac{\hat{y}_{j, h}^{(\ell)}}{\hat{S}_{j, h}^{(\ell+1)}}$$

The benchmarks for fit_predict across a few large hierarchical datasets are below, measured on a machine with 6/12 x 2.6 GHz (dynamic frequency scaling to 4.5 GHz) cores and 16 GB RAM.

Note: $P$ is only used for methods A and F, not FP, so there will be no performance gain with TopDownSparse when disaggregating using forecast proportions, only historical proportions. Although the private and M5 datasets are not strictly hierarchical, this is not an issue when disaggregating using methods A and F, because the proportions do not depend on the disaggregation pathways from the top node to the bottom nodes. However, this is not the case with method FP, which is invalid for these structures as there are multiple $p_j$ for each $j$, depending on the respective disaggregation pathway. There are practical approaches to deal with this while accepting further information loss, like creating a synthetic disaggregation pathway directly from the top node to the bottom nodes, such that $p_j=\hat{y}_{j, h} / \hat{y}_{h}$, selecting a single disaggregation pathway for each $j$, or averaging them, but that is beyond the scope of this PR. Also, note that the private dataset is quite interesting in the sense that the number of aggregated time series is greater than the number of bottom level time series, which is not usually the case for practical problems. This is not an issue here, but becomes more consequential in min trace reconciliation as inverting $S'W^{-1}S$ in the structural representation becomes more efficient than inverting $CWC'$ in the zero-constrained representation as $n_{a} > n_{b}$, not that min trace reconciliation scales particularly well on large hierarchical structures anyway, apart from perhaps structural scaling.

Dataset $\textup{dim} \left ( P \right )$ $K$ $T$ $h$ Method TopDown TopDownSparse Reduction
Private $\small{\left( 22\, 705 \times 96\, 249 \right)}$ $45$ $36$ $6$ A $59.5\, s$ $14.3\, ms$ $100 \%$
Private $\small{\left( 22\, 705 \times 96\, 249 \right)}$ $45$ $36$ $6$ F $59.1\, s$ $12.6\, ms$ $100 \%$
M5 $\small{\left( 30\, 490 \times 42\, 840 \right)}$ $12$ $1\, 969$ $28$ A $34.1\, s$ $590\, ms$ $98.3 \%$
M5 $\small{\left( 30\, 490 \times 42\, 840 \right)}$ $12$ $1\, 969$ $28$ F $33.9\, s$ $307\, ms$ $99.1 \%$
M5 $\small{\left( 30\, 490 \times 42\, 840 \right)}$ $12$ $1\, 969$ $7$ A $33.8\, s$ $582\, ms$ $98.3 \%$
M5 $\small{\left( 30\, 490 \times 42\, 840 \right)}$ $12$ $1\, 969$ $7$ F $33.7\, s$ $298\, ms$ $99.1 \%$
Tourism $\small{\left( 304 \times 555 \right)}$ $8$ $228$ $12$ A $2.09\, ms$ $1.91\, ms$ $8.6 \%$
Tourism $\small{\left( 304 \times 555 \right)}$ $8$ $228$ $12$ F $1.41\, ms$ $823\, \mu s$ $41.6 \%$
Tourism n/a $8$ $228$ $12$ FP $172\, ms$ $173\, ms$ $-0.6 \%$

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@elephaint
Copy link
Contributor

Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?

@elephaint elephaint self-requested a review July 25, 2024 22:47
@christophertitchen
Copy link
Contributor Author

Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?

Thanks, Olivier! I just added a test to check for equivalence between TopDown and TopDownSparse to within to the machine epsilon of a double, $2^{-52}$.

@christophertitchen
Copy link
Contributor Author

Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?

Hi, @elephaint. Can you think of any more changes or is it OK to be merged now?

@elephaint
Copy link
Contributor

elephaint commented Aug 14, 2024

Great work - can you also maybe add a small test demonstrating output equivalence (up to dtype precision)?

Hi, @elephaint. Can you think of any more changes or is it OK to be merged now?

Thanks - sorry for the late reply. Great work! I have a small comment, I think other than that it's good to go.

Thanks also for adding the output equivalence test!

Copy link
Contributor

@elephaint elephaint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See small comment - might want to initialize P as a csr_matrix directly.

hierarchicalforecast/methods.py Outdated Show resolved Hide resolved
Copy link
Contributor

@elephaint elephaint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, awesome! One minor thing, left a suggestion in the code.

hierarchicalforecast/methods.py Outdated Show resolved Hide resolved
Co-authored-by: Olivier Sprangers <45119856+elephaint@users.noreply.github.com>
Copy link
Contributor

@elephaint elephaint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@elephaint elephaint merged commit 21229a3 into Nixtla:main Aug 15, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants