-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature element ancestors #770
Feature element ancestors #770
Conversation
- Use dict(zip(keys, values)) instead of a for loop - Use itertuples to make each row until last column of the dataframe into a tuple Efficiency Statistics (1000 iterations) Comparison in markdown tables: Current (before commit/merge) | | length: 1000 width: 3 | length: 1000 width: 13 | length: 5000 width: 3 | length: 5000 width: 13 | length: 15000 width: 3 | length: 15000 width: 13 | |:-----|------------------------:|-------------------------:|------------------------:|-------------------------:|-------------------------:|--------------------------:| | mean | 0.0049 | 0.0179 | 0.0477 | 0.1101 | 0.1464 | 0.3459 | | std | 0.0059 | 0.0079 | 0.0166 | 0.026 | 0.0308 | 0.0602 | | max | 0.0472 | 0.0642 | 0.1056 | 0.2145 | 0.2607 | 0.6604 | Suggested (after commit/merge) | | length: 1000 width: 3 | length: 1000 width: 13 | length: 5000 width: 3 | length: 5000 width: 13 | length: 15000 width: 3 | length: 15000 width: 13 | |:-----|------------------------:|-------------------------:|------------------------:|-------------------------:|-------------------------:|--------------------------:| | mean | 0.0016 | 0.0069 | 0.0157 | 0.0425 | 0.0473 | 0.1294 | | std | 0.0036 | 0.0046 | 0.0053 | 0.0121 | 0.0123 | 0.0269 | | max | 0.0167 | 0.0282 | 0.038 | 0.0934 | 0.094 | 0.2652 |
…ame" This reverts commit ac0b445.
The TM1 feature ELISPAR checks whether one provided element is the parent of another provided element in a given dimension and hierarchy. The TM1 feature ELISPAR checks whether one provided element is an ancestor of another provided element in a given dimension and hierarchy. Both mentioned features are available in TM1 Rules and TurboIntegrator, but not directly available in tm1py. This commit reuses the 'get_parents' function to define the ELISPAR functionality. This commit uses mdxpy to retrieve an element's ancestors through mdx to define the ELISPAR functionality. The TM1 function ELISPAR is implemented under the name: element_is_parent_of. The TM1 function ELISANC is implemented under the name: element_is_ancestor_of. The element_is_descendant_of function checks whether a given member is a descendant of a given member.
add element_is_ancestor_of_by_parents as a recursive yield function To do: performance evaluation
results show that one of two methods is most performant for ELISANC functionality. The first is through getting the descendants of the ancestor at a derived number of levels below the ancestor. The other is using a recursive tm1drilldown. The first one has overhead in deriving the level of the element and the ancestor but the intersect is generally more efficient, especially if the queried element is not leaf level. More tests are required to check the validity and performance under varying dimension shapes.
Thanks @Kevin-Dekker! I understand that calculating the distance and using the precise DESCENDANTS function is faster in most scenarios. You mentioned that if the For the I think it would be good to use mdxpy for the MDX building. It should take care of some of the edge cases like escaping |
I am not sure if we should offer both approaches in the function if the DESCENDANTS approach is faster in most cases. |
In my testing I found that descendants was almost always slower than tm1drilldownmember (using a dimension of 2M elements). Regardless of what mdx function we use I would tend toward the simple implementation, the closer to TI the better. If users desire a more robust option it's easy for them to write it.
The exception path is an interesting one. In TI there seems to be no errors thrown even if the entire set of parameters refer to non existent objects. I started down that path, but realized that it would be problematic if we suppressed api level exceptions. So, I ended up with a compromise. Set expressions that refer to non existent members still work, and just return an empty set, and therefore a false (this is default api behavior). However, queries that use a non existent dimension raise an exception. I tried to add member checking, but decided to remove it since the api call does actually work.
…Sent from my mobile device
On Jul 21, 2022 8:05 AM, Marius Wirtz ***@***.***> wrote:
I am not sure if we should offer both approaches in the function if the DESCENDANTS approach is faster in most cases.
As a user, I think I would rather choose if potential not-existing elements raise an error.
What do you think @Kevin-Dekker<https://github.com/Kevin-Dekker>, @rclapp<https://github.com/rclapp>?
-
Reply to this email directly, view it on GitHub<#770 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEK7GZR65M2SXT74JGYYHHLVVE4EBANCNFSM54HFQNZA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@rclapp I agree that the default behavior regarding non-existing elements should be as in TI. That would be most intuitive for users. Simplicity is important but so is performance IMO. This is one of the functions that users might be tempted to call within a loop. I think performance is key for this one.
that's very elegant! |
On the performance side, there is a gain using the cardinality to tell you if the relationship exists so I recommend we stick with that implementation. On the MDX side, as you stated, it is the distance that drives the performance. Whenever the distance includes "BEFORE" the performance takes a real hit. The MDX defaults are the worse performing of the them all , SELF_BEFORE_AFTER. I am fine changing to Descendants if we know the permutation that preforms the best. TBH, I tend to avoid using it because it is so overloaded and not super intuitive. |
Agree!
Do you mean retrieve the (TM1) levels of the elements, calculate and distance and limit the expansion in the I agree. We could probably use a flag that performs better than the "SELF_BEFORE_AFTER" since we already know the distance between ancestor and child. |
I ran a few tests and DESCENDANTS is 2x longer when testing against the full structure of the tree. Level 2 vs Level 0.
Taking the level of the element being tested
Passing the calculated difference between the elements (= 2 - 0)
However, when testing a 1 level drill Descendants is faster.
|
Using the API to request levels takes about a second each as well. |
Interesting findings! I think they are in line with my tests & expectations; the descendant method is slower than the drilldown method on ancestor-element tuples where the element is leaf level, but faster when the element is above leaf level. I ran some extra tests using the same dimension as @rclapp. The tests (n=20 per element) confirm this: Further, I agree with the objective to keep it simple for the user, but I kept the method choice in because the runtime of this function might be essential. If the descendants method is really less efficient to retrieve leaf level elements, we could retrieve the element level and in case it's leaf use drilldown else use descendants, this has the following test (n=20 per element) results: Note that there are some outliers in the graphs due to the small sample size. The descendant method does seem to add value with big/deep dimensions when the element in question is not leaf. |
I think exposing the method makes sense. We could pick a default so users that are not so performance focused can get started faster.
We can abstract the method used in the query by adding a parameter here https://github.com/cubewise-code/tm1py/pull/767/files#:~:text=def%20_build_drill_intersection_mdx(self%2C%20dimension_name%3A%20str%2C%20hierarchy_name%3A%20str%2C%20first_element_name%3A%20str%2C
Then just expose that option in the public method and compare it to an ENUM
class MDXDrillMethod(Enum):
TM1DrillDownMember= "TM1DRILLDOWNMEMBER"
Descendants = "DESCENDANTS"
From: Kevin-Dekker ***@***.***>
Reply-To: cubewise-code/tm1py ***@***.***>
Date: Thursday, July 21, 2022 at 1:15 PM
To: cubewise-code/tm1py ***@***.***>
Cc: "Clapp, Ryan" ***@***.***>, Mention ***@***.***>
Subject: Re: [cubewise-code/tm1py] Feature element ancestors (PR #770)
Interesting findings! I think they are in line with my tests & expectations; the descendant method is slower than the drilldown method on ancestor-element tuples where the element is leaf level, but faster when the element is above leaf level.
I ran some extra tests using the same dimension as @rclapp<https://github.com/rclapp>. The tests (n=20 per element) confirm this:
[image]<https://user-images.githubusercontent.com/101728567/180270570-7ff3ff14-19c4-4827-9328-6d16143504db.png>
run time on x-axis
Further, I agree with the objective to keep it simple for the user, but I kept the method choice in because the runtime of this function might be essential.
If the descendants method is really less efficient to retrieve leaf level elements, we could retrieve the element level and in case it's leaf use drilldown else use descendants, this has the following test (n=20 per element) results:
[image]<https://user-images.githubusercontent.com/101728567/180263313-1a75207c-b7dd-4805-a522-5fd9a09aea3c.png>
Note that there are some outliers in the graphs due to the small sample size.
The descendant method does seem to add value with big/deep dimensions when the element in question is not leaf.
What are your thoughts?
—
Reply to this email directly, view it on GitHub<#770 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEK7GZR7Y763VNQB6ZVVOUDVVGASHANCNFSM54HFQNZA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I merged #767 already. An Enum to choose between the two approaches sounds reasonable. If we offer two different MDX approaches, I think they should behave consistently regarding nonexisting elements. I wonder if we should default to the approach that is performing better on leaves (testing leaves seems the more common use case to me) or instead, check the type of the second element and then default to the approach that is more appropriate? |
I'm working on the enum, but found that descendants needs some work in mdxpy so I am doing that too.
…Sent from my mobile device
On Jul 22, 2022 8:20 AM, Marius Wirtz ***@***.***> wrote:
I merged #767<#767> already.
@Kevin-Dekker<https://github.com/Kevin-Dekker> please rebase this one on the current master.
An Enum to choose between the two approaches sounds reasonable.
If we offer two different MDX approaches, I think they should behave consistently regarding nonexisting elements.
I wonder if we should default to the approach that is performing better on leaves (testing leaves seems the more common use case to me) or instead, check the type of the second element and then default to the approach that is more appropriate?
-
Reply to this email directly, view it on GitHub<#770 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEK7GZTHQW7M5PE5FM6KDLLVVKGUFANCNFSM54HFQNZA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
closed in favor of #771 |
For large/deep dimensions, it could be more efficient to use the descendants-based method. Especially when checking non-leaf level elements for ancestors this method might have performance benefits.