Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/write with blob #882

Merged
merged 12 commits into from
Mar 30, 2023
Merged

Feature/write with blob #882

merged 12 commits into from
Mar 30, 2023

Conversation

MariusWirtz
Copy link
Collaborator

The successor of #881

Changes:

  • use private application
  • don't create an actual CSV file on the disk
  • name application and process with hash based on tm1 session + local thread
  • break process creation into a separate function

floorsietsmanike and others added 8 commits March 27, 2023 22:40
- use private application
- don't create actual file on disk
- name application and process with hash based on tm1 session + local thread
- break process creation into separate function
@MariusWirtz
Copy link
Collaborator Author

Things to consider before the merge

  • Gather stats on the performance of. raw TI vs use_ti vs use_blob. use_blob and raw TI should presumably be en par
  • Provided that use_blob is faster than use_ti we should change the write_async function to make use of use_blob as well
  • challenge the uniqueness of process names and application names in multithreading environment

Copy link
Collaborator

@rclapp rclapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks!

@MariusWirtz
Copy link
Collaborator Author

MariusWirtz commented Mar 28, 2023

Here are some stats (I'm running TM1 and Python on a notebook with 4 CPUs). Huge progress IMO

Function Runtime
write with use_ti 0:03:37
write_async with use_ti 0:00:59
write with use_blob 0:00:36
Turbo Integrator 0:00:33
write_async with use_blob 0:00:11

To reproduce:

from datetime import datetime

from TM1py.Objects.Cube import Cube
from TM1py.Objects.Dimension import Dimension
from TM1py.Objects.Hierarchy import Hierarchy
from TM1py.Services.TM1Service import TM1Service

sdata_params = {
    "address": "",
    "port": 8010,
    "ssl": True,
    "user": "Admin",
    "password": "apple"
}

CUBE = "TM1py Cube"


def setup(tm1: TM1Service):
    hierarchy = Hierarchy("TM1py Dimension 1", "TM1py Dimension 1")
    hierarchy.remove_all_elements()
    for e in range(1_000_000):
        element = str(e).zfill(7)
        hierarchy.add_element(element, "Numeric")
    dimension = Dimension(name="TM1py Dimension 1", hierarchies=[hierarchy])
    tm1.dimensions.update_or_create(dimension)

    hierarchy = Hierarchy("TM1py Dimension 2", "TM1py Dimension 2")
    hierarchy.add_element("Measure", "Numeric")
    dimension = Dimension(name="TM1py Dimension 2", hierarchies=[hierarchy])
    tm1.dimensions.update_or_create(dimension)

    cube = Cube(name="TM1py Cube", dimensions=["TM1py Dimension 1", "TM1py Dimension 2"])
    tm1.cubes.update_or_create(cube)
    print("Created Cube")


with TM1Service(**sdata_params) as tm1:
    setup(tm1)

    cells = dict()
    element2 = "Measure"
    for e in range(1_000_000):
        element1 = str(e).zfill(7)
        value = e
        cells[element1, element2] = value

    for _ in range(2):
        tm1.processes.execute_ti_code([f"CubeClearData('{CUBE}');"])
        before = datetime.now()
        tm1.cells.write(CUBE, cells, use_ti=True)
        elapsed_time = datetime.now() - before
        print(f"Write use_ti: {elapsed_time}")

        tm1.processes.execute_ti_code([f"CubeClearData('{CUBE}');"])
        before = datetime.now()
        tm1.cells.write_async(CUBE, cells, max_workers=4, slice_size=250_000)
        elapsed_time = datetime.now() - before
        print(f"Write_async: {elapsed_time}")

        tm1.processes.execute_ti_code([f"CubeClearData('{CUBE}');"])
        before = datetime.now()
        tm1.cells.write(CUBE, cells, use_blob=True)
        elapsed_time = datetime.now() - before
        print(f"Write use_blob: {elapsed_time}")

@MariusWirtz MariusWirtz force-pushed the feature/write-with-blob branch from ffa6b81 to 8d2cdfe Compare March 28, 2023 19:43
@rclapp
Copy link
Collaborator

rclapp commented Mar 28, 2023

With such a big improvement can you push this to PyPi sometime soon?

@ajain86
Copy link

ajain86 commented Mar 28, 2023

I am intrigued by seeing such a big difference in use_ti vs use_blob. In my experience, the creation of the previous unbound TI with the CellPutN statements, did not add a significant processing time. Any insight on what might be creating this improvement.

@MariusWirtz
Copy link
Collaborator Author

With such a big improvement can you push this to PyPi sometime soon?

Yes. We should do this short term.

@MariusWirtz
Copy link
Collaborator Author

MariusWirtz commented Mar 28, 2023

I am intrigued by seeing such a big difference in use_ti vs use_blob. In my experience, the creation of the previous unbound TI with the CellPutN statements, did not add a significant processing time. Any insight on what might be creating this improvement.

Here is my hypothesis:
The TM1 server does some kind of "scanning" when a process is created. That operation is expensive in terms of performance and the cost scales linear with the number of lines in the process.
#774 (comment)

In the old approach, we would create a process with 1 million lines to write 1 million cells. Now we always create a process with 15 lines or so, even when we write 1 million cells.

@MariusWirtz MariusWirtz force-pushed the feature/write-with-blob branch from 8d2cdfe to 574797b Compare March 28, 2023 20:57
@MariusWirtz MariusWirtz force-pushed the feature/write-with-blob branch from 574797b to b11d817 Compare March 28, 2023 21:02
@MariusWirtz
Copy link
Collaborator Author

Any more feedback is welcome on this one!
Since we release it to PyPI soon we should get the namings and arguments and design decisions right short term.

Once it's out on PyPI we can't really change it much anymore without breaking compatibility

@rclapp
Copy link
Collaborator

rclapp commented Mar 30, 2023 via email

@macsir
Copy link
Contributor

macsir commented Mar 30, 2023

How exciting with such a big performance improvement! Can't wait for testing it in the cloud environment. 👍

@gbryant-dev
Copy link
Contributor

That would just use the contents endpoint, right? Similar to how arc let's you upload a source file.

That's correct, a document can be uploaded to /Contents('Blobs') instead of /Contents('Applications').

@MariusWirtz
Copy link
Collaborator Author

That would just use the contents endpoint, right? Similar to how arc let's you upload a source file.

That's correct, a document can be uploaded to /Contents('Blobs') instead of /Contents('Applications').

I'm on it now :)

I will introduce a new service class FileService to create, read, update, delete blobs as we can do in Arc.

@MariusWirtz MariusWirtz force-pushed the feature/write-with-blob branch from b30a423 to e43613b Compare March 30, 2023 19:34
@MariusWirtz MariusWirtz changed the title wip: Feature/write with blob Feature/write with blob Mar 30, 2023
@MariusWirtz MariusWirtz added this to the 1.11 milestone Mar 30, 2023
Copy link
Contributor

@gbryant-dev gbryant-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Naming isn't my one of my strengths so feel free to ignore the suggestion! 🙂

TM1py/Services/FileService.py Show resolved Hide resolved
TM1py/Services/CellService.py Outdated Show resolved Hide resolved
@MariusWirtz MariusWirtz force-pushed the feature/write-with-blob branch from e43613b to bd8b8f6 Compare March 30, 2023 21:20
@MariusWirtz
Copy link
Collaborator Author

Looks good! Naming isn't my one of my strengths so feel free to ignore the suggestion! 🙂

Good catch. Thanks!

I think we might be able to use the same approach to speed up the execute_mdx_csv and execute_mdx_dataframe functions.
I'm not expecting anything near 10x performance but if we get even 10% or 20% performance improvement that would add a lot of value IMO.

Does anyone want to take a shot at #884?
with #884 and #848 done we can release v 1.11 next week.

@MariusWirtz MariusWirtz merged commit 4e87ba8 into master Mar 30, 2023
@MariusWirtz MariusWirtz deleted the feature/write-with-blob branch October 15, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants