Feature element ancestors #770

Kevin-Dekker · 2022-07-21T10:51:16Z

For large/deep dimensions, it could be more efficient to use the descendants-based method. Especially when checking non-leaf level elements for ancestors this method might have performance benefits.

- Use dict(zip(keys, values)) instead of a for loop - Use itertuples to make each row until last column of the dataframe into a tuple Efficiency Statistics (1000 iterations) Comparison in markdown tables: Current (before commit/merge) | | length: 1000 width: 3 | length: 1000 width: 13 | length: 5000 width: 3 | length: 5000 width: 13 | length: 15000 width: 3 | length: 15000 width: 13 | |:-----|------------------------:|-------------------------:|------------------------:|-------------------------:|-------------------------:|--------------------------:| | mean | 0.0049 | 0.0179 | 0.0477 | 0.1101 | 0.1464 | 0.3459 | | std | 0.0059 | 0.0079 | 0.0166 | 0.026 | 0.0308 | 0.0602 | | max | 0.0472 | 0.0642 | 0.1056 | 0.2145 | 0.2607 | 0.6604 | Suggested (after commit/merge) | | length: 1000 width: 3 | length: 1000 width: 13 | length: 5000 width: 3 | length: 5000 width: 13 | length: 15000 width: 3 | length: 15000 width: 13 | |:-----|------------------------:|-------------------------:|------------------------:|-------------------------:|-------------------------:|--------------------------:| | mean | 0.0016 | 0.0069 | 0.0157 | 0.0425 | 0.0473 | 0.1294 | | std | 0.0036 | 0.0046 | 0.0053 | 0.0121 | 0.0123 | 0.0269 | | max | 0.0167 | 0.0282 | 0.038 | 0.0934 | 0.094 | 0.2652 |

…ame" This reverts commit ac0b445.

The TM1 feature ELISPAR checks whether one provided element is the parent of another provided element in a given dimension and hierarchy. The TM1 feature ELISPAR checks whether one provided element is an ancestor of another provided element in a given dimension and hierarchy. Both mentioned features are available in TM1 Rules and TurboIntegrator, but not directly available in tm1py. This commit reuses the 'get_parents' function to define the ELISPAR functionality. This commit uses mdxpy to retrieve an element's ancestors through mdx to define the ELISPAR functionality. The TM1 function ELISPAR is implemented under the name: element_is_parent_of. The TM1 function ELISANC is implemented under the name: element_is_ancestor_of. The element_is_descendant_of function checks whether a given member is a descendant of a given member.

add element_is_ancestor_of_by_parents as a recursive yield function To do: performance evaluation

results show that one of two methods is most performant for ELISANC functionality. The first is through getting the descendants of the ancestor at a derived number of levels below the ancestor. The other is using a recursive tm1drilldown. The first one has overhead in deriving the level of the element and the ancestor but the intersect is generally more efficient, especially if the queried element is not leaf level. More tests are required to check the validity and performance under varying dimension shapes.

MariusWirtz · 2022-07-21T11:53:39Z

Thanks @Kevin-Dekker!

I understand that calculating the distance and using the precise DESCENDANTS function is faster in most scenarios.
Except for the last scenario "Layer 9 Product 3". I understand that this is because the relationship between the ancestor and child is simple and the initial overhead of retrieving the element levels in the DESCENDANTS approach doesn't pay off in terms of performance.

You mentioned that if the ancestor or child doesn't exist, the function returns an error.
I tend to think that that is a good thing, but perhaps it's nice if we make this configurable.

For the element_is_parent_of function I guess @rclapp's approach is superior. Retrieving all parents could be expensive in a dimension where elements roll up into lots of consolidation. (e.g. date dimensions with YTD, MAT rollups).

I think it would be good to use mdxpy for the MDX building. It should take care of some of the edge cases like escaping ] in an element name.

MariusWirtz · 2022-07-21T12:04:04Z

I am not sure if we should offer both approaches in the function if the DESCENDANTS approach is faster in most cases.
As a user, I think I would rather choose if potential not-existing elements raise an error.
What do you think @Kevin-Dekker, @rclapp?

rclapp · 2022-07-21T13:34:40Z

In my testing I found that descendants was almost always slower than tm1drilldownmember (using a dimension of 2M elements). Regardless of what mdx function we use I would tend toward the simple implementation, the closer to TI the better. If users desire a more robust option it's easy for them to write it. The exception path is an interesting one. In TI there seems to be no errors thrown even if the entire set of parameters refer to non existent objects. I started down that path, but realized that it would be problematic if we suppressed api level exceptions. So, I ended up with a compromise. Set expressions that refer to non existent members still work, and just return an empty set, and therefore a false (this is default api behavior). However, queries that use a non existent dimension raise an exception. I tried to add member checking, but decided to remove it since the api call does actually work.

…

Sent from my mobile device On Jul 21, 2022 8:05 AM, Marius Wirtz ***@***.***> wrote: I am not sure if we should offer both approaches in the function if the DESCENDANTS approach is faster in most cases. As a user, I think I would rather choose if potential not-existing elements raise an error. What do you think @Kevin-Dekker<https://github.com/Kevin-Dekker>, @rclapp<https://github.com/rclapp>? - Reply to this email directly, view it on GitHub<#770 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEK7GZR65M2SXT74JGYYHHLVVE4EBANCNFSM54HFQNZA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

MariusWirtz · 2022-07-21T14:08:49Z

@rclapp
I think the performance difference comes not so much from choosing DESCENDANTS over TM1DRILLDOWNMEMBERS but from passing the distance into the DESCENDANTS function so that TM1 doesn't need to expand the full hierarchy internally when evaluating the MDX.

I agree that the default behavior regarding non-existing elements should be as in TI. That would be most intuitive for users.

Simplicity is important but so is performance IMO. This is one of the functions that users might be tempted to call within a loop. I think performance is key for this one.

Set expressions that refer to non existent members still work, and just return an empty set, and therefore a false (this is default api behavior). However, queries that use a non existent dimension raise an exception. I tried to add member checking, but decided to remove it since the api call does actually work.

that's very elegant!

rclapp · 2022-07-21T14:29:21Z

On the performance side, there is a gain using the cardinality to tell you if the relationship exists so I recommend we stick with that implementation.

On the MDX side, as you stated, it is the distance that drives the performance. Whenever the distance includes "BEFORE" the performance takes a real hit. The MDX defaults are the worse performing of the them all , SELF_BEFORE_AFTER. I am fine changing to Descendants if we know the permutation that preforms the best.

TBH, I tend to avoid using it because it is so overloaded and not super intuitive.

MariusWirtz · 2022-07-21T14:48:36Z

On the performance side, there is a gain using the cardinality to tell you if the relationship exists so I recommend we stick with that implementation.

Agree!

On the MDX side, as you stated, it is the distance that drives the performance. Whenever the distance includes "BEFORE" the performance takes a real hit. The MDX defaults are the worse performing of the them all , SELF_BEFORE_AFTER. I am fine changing to Descendants if we know the permutation that preforms the best.

Do you mean retrieve the (TM1) levels of the elements, calculate and distance and limit the expansion in the DESCENDANTS as @Kevin-Dekker suggested?
I would be very interested to learn if this speeds things up in your test case as well.

I agree. We could probably use a flag that performs better than the "SELF_BEFORE_AFTER" since we already know the distance between ancestor and child.

rclapp · 2022-07-21T15:31:36Z

I ran a few tests and DESCENDANTS is 2x longer when testing against the full structure of the tree. Level 2 vs Level 0.

INTERSECT({TM1DRILLDOWNMEMBER({[Big Dim].[Big Dim].[All Divisors]}, ALL, RECURSIVE)}, {[Big Dim].[Big Dim].[5]}) 7.9s

Taking the level of the element being tested

INTERSECT({DESCENDANTS([Big Dim].[Big Dim].[All Divisors], [Big Dim].[Big Dim].[5].level)}, {[Big Dim].[Big Dim].[5]}) 12.2s

Passing the calculated difference between the elements (= 2 - 0)

INTERSECT({DESCENDANTS([Big Dim].[Big Dim].[All Divisors], 2)}, {[Big Dim].[Big Dim].[5]}) 12.3s

However, when testing a 1 level drill Descendants is faster.

INTERSECT({TM1DRILLDOWNMEMBER({[Big Dim].[Big Dim].[All Divisors]}, ALL, RECURSIVE)}, {[Big Dim].[Big Dim].[Div by 3]}) 6s
INTERSECT({DESCENDANTS([Big Dim].[Big Dim].[All Divisors], [Big Dim].[Big Dim].[div by 3].level)}, {[Big Dim].[Big Dim].[div by 3]}) 1.2s

rclapp · 2022-07-21T15:33:35Z

Using the API to request levels takes about a second each as well.

Kevin-Dekker · 2022-07-21T17:15:05Z

Interesting findings! I think they are in line with my tests & expectations; the descendant method is slower than the drilldown method on ancestor-element tuples where the element is leaf level, but faster when the element is above leaf level.

I ran some extra tests using the same dimension as @rclapp. The tests (n=20 per element) confirm this:

run time on x-axis

Further, I agree with the objective to keep it simple for the user, but I kept the method choice in because the runtime of this function might be essential.

If the descendants method is really less efficient to retrieve leaf level elements, we could retrieve the element level and in case it's leaf use drilldown else use descendants, this has the following test (n=20 per element) results:

Note that there are some outliers in the graphs due to the small sample size.

The descendant method does seem to add value with big/deep dimensions when the element in question is not leaf.
What are your thoughts?

rclapp · 2022-07-21T17:27:10Z

I think exposing the method makes sense. We could pick a default so users that are not so performance focused can get started faster. We can abstract the method used in the query by adding a parameter here https://github.com/cubewise-code/tm1py/pull/767/files#:~:text=def%20_build_drill_intersection_mdx(self%2C%20dimension_name%3A%20str%2C%20hierarchy_name%3A%20str%2C%20first_element_name%3A%20str%2C Then just expose that option in the public method and compare it to an ENUM class MDXDrillMethod(Enum): TM1DrillDownMember= "TM1DRILLDOWNMEMBER" Descendants = "DESCENDANTS" From: Kevin-Dekker ***@***.***> Reply-To: cubewise-code/tm1py ***@***.***> Date: Thursday, July 21, 2022 at 1:15 PM To: cubewise-code/tm1py ***@***.***> Cc: "Clapp, Ryan" ***@***.***>, Mention ***@***.***> Subject: Re: [cubewise-code/tm1py] Feature element ancestors (PR #770) Interesting findings! I think they are in line with my tests & expectations; the descendant method is slower than the drilldown method on ancestor-element tuples where the element is leaf level, but faster when the element is above leaf level. I ran some extra tests using the same dimension as @rclapp<https://github.com/rclapp>. The tests (n=20 per element) confirm this: [image]<https://user-images.githubusercontent.com/101728567/180270570-7ff3ff14-19c4-4827-9328-6d16143504db.png> run time on x-axis Further, I agree with the objective to keep it simple for the user, but I kept the method choice in because the runtime of this function might be essential. If the descendants method is really less efficient to retrieve leaf level elements, we could retrieve the element level and in case it's leaf use drilldown else use descendants, this has the following test (n=20 per element) results: [image]<https://user-images.githubusercontent.com/101728567/180263313-1a75207c-b7dd-4805-a522-5fd9a09aea3c.png> Note that there are some outliers in the graphs due to the small sample size. The descendant method does seem to add value with big/deep dimensions when the element in question is not leaf. What are your thoughts? — Reply to this email directly, view it on GitHub<#770 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEK7GZR7Y763VNQB6ZVVOUDVVGASHANCNFSM54HFQNZA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

MariusWirtz · 2022-07-22T12:19:02Z

I merged #767 already.
@Kevin-Dekker please rebase this one on the current master.

An Enum to choose between the two approaches sounds reasonable.

If we offer two different MDX approaches, I think they should behave consistently regarding nonexisting elements.

I wonder if we should default to the approach that is performing better on leaves (testing leaves seems the more common use case to me) or instead, check the type of the second element and then default to the approach that is more appropriate?

rclapp · 2022-07-22T12:33:42Z

I'm working on the enum, but found that descendants needs some work in mdxpy so I am doing that too.

…

Sent from my mobile device On Jul 22, 2022 8:20 AM, Marius Wirtz ***@***.***> wrote: I merged #767<#767> already. @Kevin-Dekker<https://github.com/Kevin-Dekker> please rebase this one on the current master. An Enum to choose between the two approaches sounds reasonable. If we offer two different MDX approaches, I think they should behave consistently regarding nonexisting elements. I wonder if we should default to the approach that is performing better on leaves (testing leaves seems the more common use case to me) or instead, check the type of the second element and then default to the approach that is more appropriate? - Reply to this email directly, view it on GitHub<#770 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEK7GZTHQW7M5PE5FM6KDLLVVKGUFANCNFSM54HFQNZA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

MariusWirtz · 2022-07-26T13:49:28Z

closed in favor of #771

Kevin-Dekker and others added 6 commits March 16, 2022 16:06

Revert "perf: improve performance of build_cellset_from_pandas_datafr…

bf3de55

…ame" This reverts commit ac0b445.

Merge branch 'cubewise-code:master' into master

1e43b5f

fix element_is_ancestor_of

2cc2fe9

add element_is_ancestor_of_by_parents as a recursive yield function To do: performance evaluation

MariusWirtz mentioned this pull request Jul 21, 2022

Feature element functions #767

Merged

MariusWirtz closed this Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature element ancestors #770

Feature element ancestors #770

Kevin-Dekker commented Jul 21, 2022

MariusWirtz commented Jul 21, 2022 •

edited

Loading

MariusWirtz commented Jul 21, 2022

rclapp commented Jul 21, 2022 via email

MariusWirtz commented Jul 21, 2022

rclapp commented Jul 21, 2022

MariusWirtz commented Jul 21, 2022

rclapp commented Jul 21, 2022 •

edited

Loading

rclapp commented Jul 21, 2022

Kevin-Dekker commented Jul 21, 2022

rclapp commented Jul 21, 2022 via email

MariusWirtz commented Jul 22, 2022

rclapp commented Jul 22, 2022 via email

MariusWirtz commented Jul 26, 2022

Feature element ancestors #770

Feature element ancestors #770

Conversation

Kevin-Dekker commented Jul 21, 2022

MariusWirtz commented Jul 21, 2022 • edited Loading

MariusWirtz commented Jul 21, 2022

rclapp commented Jul 21, 2022 via email

MariusWirtz commented Jul 21, 2022

rclapp commented Jul 21, 2022

MariusWirtz commented Jul 21, 2022

rclapp commented Jul 21, 2022 • edited Loading

rclapp commented Jul 21, 2022

Kevin-Dekker commented Jul 21, 2022

rclapp commented Jul 21, 2022 via email

MariusWirtz commented Jul 22, 2022

rclapp commented Jul 22, 2022 via email

MariusWirtz commented Jul 26, 2022

MariusWirtz commented Jul 21, 2022 •

edited

Loading

rclapp commented Jul 21, 2022 •

edited

Loading