Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dtype to be inferred by pandas with argument infer_dtype in execute_mdx_dataframe_shaped #879

Conversation

Kevin-Dekker
Copy link
Collaborator

What
Allows tm1py user to specify the infer_dtype argument on tm1.cells.execute_mdx_dataframe_shaped.

If infer_dtype=True, tm1py lets pandas decide the dtype of each column. If infer_dtype=False, it will default to the current behaviour and load all columns as string unless there is no columns (see "if not number_columns") or no rows (see "if not number_rows") (in which case tm1.cells.execute_mdx_dataframe_shaped will already infer the dtype automatically).

If the infer_dtype argument is not specified, tm1.cells.execute_mdx_dataframe_shaped will default to its current behaviour equivalent to infer_dtype=False.

Why
Default behaviour requires to mutate dtypes after retrieving dataframe. If all columns are numerical, the dataframe retrieved would still have string values. This PR allows the user to choose.

Test
Below test shows how a numeric column would be inferred as a proper dtype automatically. The numeric column can them directly be multiplied. The string column on the other hand not.

@MariusWirtz , the test is based on the tm1_cloud sample instance:

import logging

import mdxpy
import numpy as np
from mdxpy import MdxHierarchySet, Member

from TM1py import TM1Service
from my_credentials import tm1_cloud


def get_df_mdx():
    """
    retrieves an mdx (interchangeable with any given mdx that you want to use to retrieve a dataframe from tm1
    """
    logging.info('Getting balance sheet cube metadata from tm1')
    balance_sheet = tm1.cubes.get("Balance Sheet")
    balance_sheet_dims = balance_sheet.dimensions
    logging.info('Getting dimension elements from dimensions in balance sheet cube')
    elements = {dim: [ele for ele in tm1.elements.get_leaf_element_names(dim, dim)] for dim in balance_sheet_dims}
    currency = 'Local'
    region = '14'
    account = '1110'
    balance_sheet_measure = 'Amount'
    q = mdxpy.MdxBuilder('Balance Sheet')
    q.add_hierarchy_set_to_row_axis(MdxHierarchySet.member(Member.of('Region', region)))
    q.add_hierarchy_set_to_row_axis(
        MdxHierarchySet.members([Member.of('Version', ele) for ele in elements['Version']]))
    q.add_hierarchy_set_to_row_axis(
        MdxHierarchySet.members([Member.of('Department', ele) for ele in elements['Department']]))
    q.add_hierarchy_set_to_column_axis(MdxHierarchySet.all_leaves('Time', 'Time'))
    q.add_member_to_where(Member.of('Currency', currency))
    q.add_member_to_where(Member.of('Balance Sheet Measure', balance_sheet_measure))
    q.add_member_to_where(Member.of('Account', account))
    q.rows_non_empty().columns_non_empty()
    return q.to_mdx()


if __name__ == '__main__':
    with TM1Service(**tm1_cloud) as tm1:
        mdx = get_df_mdx()
        df_inferred_dtype = tm1.cells.execute_mdx_dataframe_shaped(mdx, infer_dtype=True)
        df_str_dtype = tm1.cells.execute_mdx_dataframe_shaped(mdx, infer_dtype=False)

    np.set_printoptions(precision=20)
    print(f"numeric column as inferred type: \n{df_inferred_dtype['2017-M06'].values}\n")
    print(f"numeric column as str: \n{df_str_dtype['2017-M06'].values}\n")
    print(f"numeric column as inferred type multiplied by 3: \n{(df_inferred_dtype['2017-M06'] * 3).values}\n")
    print(f"numeric column as string multiplied by 3: \n{(df_str_dtype['2017-M06'] * 3).values}\n")

Outcome

numeric column as inferred type: 
[3236799.           3560478.9000000004 2437577.          ]

numeric column as str: 
['3236799.0' '3560478.9000000004' '2437577.0']

numeric column as inferred type multiplied by 3: 
[ 9710397.          10681436.700000001  7312731.         ]

numeric column as string multiplied by 3: 
['3236799.03236799.03236799.0'
 '3560478.90000000043560478.90000000043560478.9000000004'
 '2437577.02437577.02437577.0']

Allows tm1py user to specify the infer_dtype argument on tm1.cells.execute_mdx_dataframe_shaped. If infer_dtype=True, tm1py lets pandas decide the dtype of each column. If infer_dtype=False, it will default to the current behaviour and load all columns as string unless there is no columns (see "if not number_columns") or no rows (see "if not number_rows") (in which case tm1.cells.execute_mdx_dataframe_shaped will already infer the dtype automatically).

If the infer_dtype argument is not specified, tm1.cells.execute_mdx_dataframe_shaped will default to its current behaviour equivalent to infer_dtype=False.
@MariusWirtz
Copy link
Collaborator

Thank you @Kevin-Dekker.

Works like a charm!

@MariusWirtz MariusWirtz merged commit eb74040 into cubewise-code:master Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants