Feature/write with blob #882

MariusWirtz · 2023-03-27T20:49:48Z

The successor of #881

Changes:

use private application
don't create an actual CSV file on the disk
name application and process with hash based on tm1 session + local thread
break process creation into a separate function

…nction write_through_blob()

- use private application - don't create actual file on disk - name application and process with hash based on tm1 session + local thread - break process creation into separate function

MariusWirtz · 2023-03-27T20:58:06Z

Things to consider before the merge

Gather stats on the performance of. raw TI vs use_ti vs use_blob. use_blob and raw TI should presumably be en par
Provided that use_blob is faster than use_ti we should change the write_async function to make use of use_blob as well
challenge the uniqueness of process names and application names in multithreading environment

rclapp

Awesome! Thanks!

MariusWirtz · 2023-03-28T18:50:59Z

Here are some stats (I'm running TM1 and Python on a notebook with 4 CPUs). Huge progress IMO

Function	Runtime
`write` with `use_ti`	0:03:37
`write_async` with `use_ti`	0:00:59
`write` with `use_blob`	0:00:36
`Turbo Integrator`	0:00:33
`write_async` with `use_blob`	0:00:11

To reproduce:

from datetime import datetime

from TM1py.Objects.Cube import Cube
from TM1py.Objects.Dimension import Dimension
from TM1py.Objects.Hierarchy import Hierarchy
from TM1py.Services.TM1Service import TM1Service

sdata_params = {
    "address": "",
    "port": 8010,
    "ssl": True,
    "user": "Admin",
    "password": "apple"
}

CUBE = "TM1py Cube"


def setup(tm1: TM1Service):
    hierarchy = Hierarchy("TM1py Dimension 1", "TM1py Dimension 1")
    hierarchy.remove_all_elements()
    for e in range(1_000_000):
        element = str(e).zfill(7)
        hierarchy.add_element(element, "Numeric")
    dimension = Dimension(name="TM1py Dimension 1", hierarchies=[hierarchy])
    tm1.dimensions.update_or_create(dimension)

    hierarchy = Hierarchy("TM1py Dimension 2", "TM1py Dimension 2")
    hierarchy.add_element("Measure", "Numeric")
    dimension = Dimension(name="TM1py Dimension 2", hierarchies=[hierarchy])
    tm1.dimensions.update_or_create(dimension)

    cube = Cube(name="TM1py Cube", dimensions=["TM1py Dimension 1", "TM1py Dimension 2"])
    tm1.cubes.update_or_create(cube)
    print("Created Cube")


with TM1Service(**sdata_params) as tm1:
    setup(tm1)

    cells = dict()
    element2 = "Measure"
    for e in range(1_000_000):
        element1 = str(e).zfill(7)
        value = e
        cells[element1, element2] = value

    for _ in range(2):
        tm1.processes.execute_ti_code([f"CubeClearData('{CUBE}');"])
        before = datetime.now()
        tm1.cells.write(CUBE, cells, use_ti=True)
        elapsed_time = datetime.now() - before
        print(f"Write use_ti: {elapsed_time}")

        tm1.processes.execute_ti_code([f"CubeClearData('{CUBE}');"])
        before = datetime.now()
        tm1.cells.write_async(CUBE, cells, max_workers=4, slice_size=250_000)
        elapsed_time = datetime.now() - before
        print(f"Write_async: {elapsed_time}")

        tm1.processes.execute_ti_code([f"CubeClearData('{CUBE}');"])
        before = datetime.now()
        tm1.cells.write(CUBE, cells, use_blob=True)
        elapsed_time = datetime.now() - before
        print(f"Write use_blob: {elapsed_time}")

rclapp · 2023-03-28T19:48:15Z

With such a big improvement can you push this to PyPi sometime soon?

TM1py/Services/CellService.py

ajain86 · 2023-03-28T20:07:37Z

I am intrigued by seeing such a big difference in use_ti vs use_blob. In my experience, the creation of the previous unbound TI with the CellPutN statements, did not add a significant processing time. Any insight on what might be creating this improvement.

TM1py/Services/CellService.py

MariusWirtz · 2023-03-28T20:12:24Z

With such a big improvement can you push this to PyPi sometime soon?

Yes. We should do this short term.

MariusWirtz · 2023-03-28T20:21:21Z

I am intrigued by seeing such a big difference in use_ti vs use_blob. In my experience, the creation of the previous unbound TI with the CellPutN statements, did not add a significant processing time. Any insight on what might be creating this improvement.

Here is my hypothesis:
The TM1 server does some kind of "scanning" when a process is created. That operation is expensive in terms of performance and the cost scales linear with the number of lines in the process.
#774 (comment)

In the old approach, we would create a process with 1 million lines to write 1 million cells. Now we always create a process with 15 lines or so, even when we write 1 million cells.

Fixes #848 and #774

MariusWirtz · 2023-03-28T21:05:46Z

Any more feedback is welcome on this one!
Since we release it to PyPI soon we should get the namings and arguments and design decisions right short term.

Once it's out on PyPI we can't really change it much anymore without breaking compatibility

TM1py/Services/CellService.py

rclapp · 2023-03-30T04:13:44Z

That would just use the contents endpoint, right? Similar to how arc let's you upload a source file.

Sent from my mobile device On Mar 29, 2023 6:18 PM, Marius Wirtz ***@***.***> wrote: @MariusWirtz commented on this pull request.

________________________________ In TM1py/Services/CellService.py<#882 (comment)>:

+ csv_content = StringIO()

+ csv_writer = csv.writer( + csv_content, + delimiter=",", + quoting=csv.QUOTE_ALL) + csv_writer.writerows( + list(elements) + [value.replace('\r', '').replace('\n', '') if isinstance(value, str) else value] + for elements, value + in cellset_as_dict.items()) + + # Define the path and file name + filename = f'{unique_hash}.csv' + + # Upload CSV to TM1 server using ApplicationService: + # Create a folder TM1py + tm1py_folder = FolderApplication(path='', name=folder_name) I think you have a point there @gbryant-dev<https://github.com/gbryant-dev>. We should change it to use blobs and ignore the concepts of TM1 applications. — Reply to this email directly, view it on GitHub<#882 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEK7GZSPYJP4ERSYN56SCU3W6SYKPANCNFSM6AAAAAAWJUQTKI>. You are receiving this because you commented.Message ID: ***@***.***>

macsir · 2023-03-30T09:11:07Z

How exciting with such a big performance improvement! Can't wait for testing it in the cloud environment. 👍

gbryant-dev · 2023-03-30T17:34:16Z

That would just use the contents endpoint, right? Similar to how arc let's you upload a source file.

That's correct, a document can be uploaded to /Contents('Blobs') instead of /Contents('Applications').

MariusWirtz · 2023-03-30T17:43:23Z

That would just use the contents endpoint, right? Similar to how arc let's you upload a source file.

That's correct, a document can be uploaded to /Contents('Blobs') instead of /Contents('Applications').

I'm on it now :)

I will introduce a new service class FileService to create, read, update, delete blobs as we can do in Arc.

gbryant-dev

Looks good! Naming isn't my one of my strengths so feel free to ignore the suggestion! 🙂

TM1py/Services/FileService.py

TM1py/Services/CellService.py

MariusWirtz · 2023-03-30T21:27:50Z

Looks good! Naming isn't my one of my strengths so feel free to ignore the suggestion! 🙂

Good catch. Thanks!

I think we might be able to use the same approach to speed up the execute_mdx_csv and execute_mdx_dataframe functions.
I'm not expecting anything near 10x performance but if we get even 10% or 20% performance improvement that would add a lot of value IMO.

Does anyone want to take a shot at #884?
with #884 and #848 done we can release v 1.11 next week.

floorsietsmanike and others added 8 commits March 27, 2023 22:40

First commit

ea91699

Started development on write.py for use_blob option

c890dde

Added use_blob parameter to write() function, and added underlying fu…

878c8b5

…nction write_through_blob()

Added use_blob parameter to write() function, and added underlying fu…

846997c

…nction write_through_blob()

Added use_blob parameter to write() function, and added underlying fu…

8a173b1

…nction write_through_blob()

Added use_blob parameter to write() function, and added underlying fu…

efc3e2f

…nction write_through_blob()

Added use_blob parameter to write() function, and added underlying fu…

b7ce502

…nction write_through_blob()

write use_blob tidy up and add tests

ea39c39

- use private application - don't create actual file on disk - name application and process with hash based on tm1 session + local thread - break process creation into separate function

MariusWirtz mentioned this pull request Mar 27, 2023

Add use_blob() to write() function to increase performance #881

Closed

rclapp approved these changes Mar 28, 2023

View reviewed changes

MariusWirtz force-pushed the feature/write-with-blob branch from ffa6b81 to 8d2cdfe Compare March 28, 2023 19:43

rkvinoth reviewed Mar 28, 2023

View reviewed changes

TM1py/Services/CellService.py Outdated Show resolved Hide resolved

rkvinoth reviewed Mar 28, 2023

View reviewed changes

TM1py/Services/CellService.py Outdated Show resolved Hide resolved

rkvinoth reviewed Mar 28, 2023

View reviewed changes

TM1py/Services/CellService.py Outdated Show resolved Hide resolved

rkvinoth reviewed Mar 28, 2023

View reviewed changes

TM1py/Services/CellService.py Outdated Show resolved Hide resolved

rkvinoth reviewed Mar 28, 2023

View reviewed changes

TM1py/Services/CellService.py Show resolved Hide resolved

MariusWirtz force-pushed the feature/write-with-blob branch from 8d2cdfe to 574797b Compare March 28, 2023 20:57

Use use_blob in write_async functions

b11d817

Fixes #848 and #774

MariusWirtz force-pushed the feature/write-with-blob branch from 574797b to b11d817 Compare March 28, 2023 21:02

gbryant-dev reviewed Mar 29, 2023

View reviewed changes

TM1py/Services/CellService.py Outdated Show resolved Hide resolved

MariusWirtz added 2 commits March 30, 2023 20:21

Use blobs instead of applications in write

b8a3f14

Introduce FileService to work with blobs

8f025ff

MariusWirtz force-pushed the feature/write-with-blob branch from b30a423 to e43613b Compare March 30, 2023 19:34

MariusWirtz changed the title ~~wip: Feature/write with blob~~ Feature/write with blob Mar 30, 2023

MariusWirtz added this to the 1.11 milestone Mar 30, 2023

gbryant-dev reviewed Mar 30, 2023

View reviewed changes

TM1py/Services/FileService.py Show resolved Hide resolved

TM1py/Services/CellService.py Outdated Show resolved Hide resolved

Prettify use_blob and FileService feature

bd8b8f6

MariusWirtz force-pushed the feature/write-with-blob branch from e43613b to bd8b8f6 Compare March 30, 2023 21:20

MariusWirtz merged commit 4e87ba8 into master Mar 30, 2023

MariusWirtz mentioned this pull request Apr 13, 2023

Drop MAX_STATEMENTS in write use_ti logic in versions > 11.8.15 #774

Closed

MariusWirtz deleted the feature/write-with-blob branch October 15, 2024 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/write with blob #882

Feature/write with blob #882

MariusWirtz commented Mar 27, 2023

MariusWirtz commented Mar 27, 2023

rclapp left a comment

MariusWirtz commented Mar 28, 2023 •

edited

Loading

rclapp commented Mar 28, 2023

ajain86 commented Mar 28, 2023

MariusWirtz commented Mar 28, 2023

MariusWirtz commented Mar 28, 2023 •

edited

Loading

MariusWirtz commented Mar 28, 2023

rclapp commented Mar 30, 2023 via email

macsir commented Mar 30, 2023

gbryant-dev commented Mar 30, 2023

MariusWirtz commented Mar 30, 2023

gbryant-dev left a comment

MariusWirtz commented Mar 30, 2023

Feature/write with blob #882

Feature/write with blob #882

Conversation

MariusWirtz commented Mar 27, 2023

MariusWirtz commented Mar 27, 2023

rclapp left a comment

Choose a reason for hiding this comment

MariusWirtz commented Mar 28, 2023 • edited Loading

rclapp commented Mar 28, 2023

ajain86 commented Mar 28, 2023

MariusWirtz commented Mar 28, 2023

MariusWirtz commented Mar 28, 2023 • edited Loading

MariusWirtz commented Mar 28, 2023

rclapp commented Mar 30, 2023 via email

macsir commented Mar 30, 2023

gbryant-dev commented Mar 30, 2023

MariusWirtz commented Mar 30, 2023

gbryant-dev left a comment

Choose a reason for hiding this comment

MariusWirtz commented Mar 30, 2023

MariusWirtz commented Mar 28, 2023 •

edited

Loading

MariusWirtz commented Mar 28, 2023 •

edited

Loading