Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature element ancestors #770

Conversation

Kevin-Dekker
Copy link
Collaborator

For large/deep dimensions, it could be more efficient to use the descendants-based method. Especially when checking non-leaf level elements for ancestors this method might have performance benefits.

image

Kevin-Dekker and others added 6 commits March 16, 2022 16:06
- Use dict(zip(keys, values)) instead of a for loop

- Use itertuples to make each row until last column of the dataframe into a tuple

Efficiency Statistics (1000 iterations)

Comparison in markdown tables:

Current (before commit/merge)
|      |   length: 1000 width: 3 |   length: 1000 width: 13 |   length: 5000 width: 3 |   length: 5000 width: 13 |   length: 15000 width: 3 |   length: 15000 width: 13 |
|:-----|------------------------:|-------------------------:|------------------------:|-------------------------:|-------------------------:|--------------------------:|
| mean |                  0.0049 |                   0.0179 |                  0.0477 |                   0.1101 |                   0.1464 |                    0.3459 |
| std  |                  0.0059 |                   0.0079 |                  0.0166 |                   0.026  |                   0.0308 |                    0.0602 |
| max  |                  0.0472 |                   0.0642 |                  0.1056 |                   0.2145 |                   0.2607 |                    0.6604 |

Suggested (after commit/merge)
|      |   length: 1000 width: 3 |   length: 1000 width: 13 |   length: 5000 width: 3 |   length: 5000 width: 13 |   length: 15000 width: 3 |   length: 15000 width: 13 |
|:-----|------------------------:|-------------------------:|------------------------:|-------------------------:|-------------------------:|--------------------------:|
| mean |                  0.0016 |                   0.0069 |                  0.0157 |                   0.0425 |                   0.0473 |                    0.1294 |
| std  |                  0.0036 |                   0.0046 |                  0.0053 |                   0.0121 |                   0.0123 |                    0.0269 |
| max  |                  0.0167 |                   0.0282 |                  0.038  |                   0.0934 |                   0.094  |                    0.2652 |
The TM1 feature ELISPAR checks whether one provided element is the parent of another provided element in a given dimension and hierarchy.
The TM1 feature ELISPAR checks whether one provided element is an ancestor of another provided element in a given dimension and hierarchy.

Both mentioned features are available in TM1 Rules and TurboIntegrator, but not directly available in tm1py. This commit reuses the 'get_parents' function to define the ELISPAR functionality.
This commit uses mdxpy to retrieve an element's ancestors through mdx to define the ELISPAR functionality.

The TM1 function ELISPAR is implemented under the name: element_is_parent_of.
The TM1 function ELISANC is implemented under the name: element_is_ancestor_of.
The element_is_descendant_of function checks whether a given member is a descendant of a given member.
add element_is_ancestor_of_by_parents as a recursive yield function

 To do: performance evaluation
results show that one of two methods is most performant for ELISANC functionality. The first is through getting the descendants of the ancestor at a derived number of levels below the ancestor. The other is using a recursive tm1drilldown. The first one has overhead in deriving the level of the element and the ancestor but the intersect is generally more efficient, especially if the queried element is not leaf level.

More tests are required to check the validity and performance under varying dimension shapes.
@MariusWirtz
Copy link
Collaborator

MariusWirtz commented Jul 21, 2022

Thanks @Kevin-Dekker!

I understand that calculating the distance and using the precise DESCENDANTS function is faster in most scenarios.
Except for the last scenario "Layer 9 Product 3". I understand that this is because the relationship between the ancestor and child is simple and the initial overhead of retrieving the element levels in the DESCENDANTS approach doesn't pay off in terms of performance.

You mentioned that if the ancestor or child doesn't exist, the function returns an error.
I tend to think that that is a good thing, but perhaps it's nice if we make this configurable.

For the element_is_parent_of function I guess @rclapp's approach is superior. Retrieving all parents could be expensive in a dimension where elements roll up into lots of consolidation. (e.g. date dimensions with YTD, MAT rollups).

I think it would be good to use mdxpy for the MDX building. It should take care of some of the edge cases like escaping ] in an element name.

@MariusWirtz
Copy link
Collaborator

I am not sure if we should offer both approaches in the function if the DESCENDANTS approach is faster in most cases.
As a user, I think I would rather choose if potential not-existing elements raise an error.
What do you think @Kevin-Dekker, @rclapp?

@rclapp
Copy link
Collaborator

rclapp commented Jul 21, 2022 via email

@MariusWirtz
Copy link
Collaborator

@rclapp
I think the performance difference comes not so much from choosing DESCENDANTS over TM1DRILLDOWNMEMBERS but from passing the distance into the DESCENDANTS function so that TM1 doesn't need to expand the full hierarchy internally when evaluating the MDX.

I agree that the default behavior regarding non-existing elements should be as in TI. That would be most intuitive for users.

Simplicity is important but so is performance IMO. This is one of the functions that users might be tempted to call within a loop. I think performance is key for this one.

Set expressions that refer to non existent members still work, and just return an empty set, and therefore a false (this is default api behavior). However, queries that use a non existent dimension raise an exception. I tried to add member checking, but decided to remove it since the api call does actually work.

that's very elegant!

@rclapp
Copy link
Collaborator

rclapp commented Jul 21, 2022

On the performance side, there is a gain using the cardinality to tell you if the relationship exists so I recommend we stick with that implementation.

On the MDX side, as you stated, it is the distance that drives the performance. Whenever the distance includes "BEFORE" the performance takes a real hit. The MDX defaults are the worse performing of the them all , SELF_BEFORE_AFTER. I am fine changing to Descendants if we know the permutation that preforms the best.

TBH, I tend to avoid using it because it is so overloaded and not super intuitive.

@MariusWirtz
Copy link
Collaborator

On the performance side, there is a gain using the cardinality to tell you if the relationship exists so I recommend we stick with that implementation.

Agree!

On the MDX side, as you stated, it is the distance that drives the performance. Whenever the distance includes "BEFORE" the performance takes a real hit. The MDX defaults are the worse performing of the them all , SELF_BEFORE_AFTER. I am fine changing to Descendants if we know the permutation that preforms the best.

Do you mean retrieve the (TM1) levels of the elements, calculate and distance and limit the expansion in the DESCENDANTS as @Kevin-Dekker suggested?
I would be very interested to learn if this speeds things up in your test case as well.

I agree. We could probably use a flag that performs better than the "SELF_BEFORE_AFTER" since we already know the distance between ancestor and child.

@rclapp
Copy link
Collaborator

rclapp commented Jul 21, 2022

I ran a few tests and DESCENDANTS is 2x longer when testing against the full structure of the tree. Level 2 vs Level 0.

  • INTERSECT({TM1DRILLDOWNMEMBER({[Big Dim].[Big Dim].[All Divisors]}, ALL, RECURSIVE)}, {[Big Dim].[Big Dim].[5]}) 7.9s

Taking the level of the element being tested

  • INTERSECT({DESCENDANTS([Big Dim].[Big Dim].[All Divisors], [Big Dim].[Big Dim].[5].level)}, {[Big Dim].[Big Dim].[5]}) 12.2s

Passing the calculated difference between the elements (= 2 - 0)

  • INTERSECT({DESCENDANTS([Big Dim].[Big Dim].[All Divisors], 2)}, {[Big Dim].[Big Dim].[5]}) 12.3s

However, when testing a 1 level drill Descendants is faster.

  • INTERSECT({TM1DRILLDOWNMEMBER({[Big Dim].[Big Dim].[All Divisors]}, ALL, RECURSIVE)}, {[Big Dim].[Big Dim].[Div by 3]}) 6s

  • INTERSECT({DESCENDANTS([Big Dim].[Big Dim].[All Divisors], [Big Dim].[Big Dim].[div by 3].level)}, {[Big Dim].[Big Dim].[div by 3]}) 1.2s

@rclapp
Copy link
Collaborator

rclapp commented Jul 21, 2022

Using the API to request levels takes about a second each as well.

@Kevin-Dekker
Copy link
Collaborator Author

Interesting findings! I think they are in line with my tests & expectations; the descendant method is slower than the drilldown method on ancestor-element tuples where the element is leaf level, but faster when the element is above leaf level.

I ran some extra tests using the same dimension as @rclapp. The tests (n=20 per element) confirm this:
image
run time on x-axis

Further, I agree with the objective to keep it simple for the user, but I kept the method choice in because the runtime of this function might be essential.

If the descendants method is really less efficient to retrieve leaf level elements, we could retrieve the element level and in case it's leaf use drilldown else use descendants, this has the following test (n=20 per element) results:
image

Note that there are some outliers in the graphs due to the small sample size.

The descendant method does seem to add value with big/deep dimensions when the element in question is not leaf.
What are your thoughts?

@rclapp
Copy link
Collaborator

rclapp commented Jul 21, 2022 via email

@MariusWirtz
Copy link
Collaborator

I merged #767 already.
@Kevin-Dekker please rebase this one on the current master.

An Enum to choose between the two approaches sounds reasonable.

If we offer two different MDX approaches, I think they should behave consistently regarding nonexisting elements.

I wonder if we should default to the approach that is performing better on leaves (testing leaves seems the more common use case to me) or instead, check the type of the second element and then default to the approach that is more appropriate?

@rclapp
Copy link
Collaborator

rclapp commented Jul 22, 2022 via email

@MariusWirtz
Copy link
Collaborator

closed in favor of #771

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants