-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/write with blob #882
Conversation
…nction write_through_blob()
…nction write_through_blob()
…nction write_through_blob()
…nction write_through_blob()
…nction write_through_blob()
- use private application - don't create actual file on disk - name application and process with hash based on tm1 session + local thread - break process creation into separate function
Things to consider before the merge
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks!
Here are some stats (I'm running TM1 and Python on a notebook with 4 CPUs). Huge progress IMO
To reproduce: from datetime import datetime
from TM1py.Objects.Cube import Cube
from TM1py.Objects.Dimension import Dimension
from TM1py.Objects.Hierarchy import Hierarchy
from TM1py.Services.TM1Service import TM1Service
sdata_params = {
"address": "",
"port": 8010,
"ssl": True,
"user": "Admin",
"password": "apple"
}
CUBE = "TM1py Cube"
def setup(tm1: TM1Service):
hierarchy = Hierarchy("TM1py Dimension 1", "TM1py Dimension 1")
hierarchy.remove_all_elements()
for e in range(1_000_000):
element = str(e).zfill(7)
hierarchy.add_element(element, "Numeric")
dimension = Dimension(name="TM1py Dimension 1", hierarchies=[hierarchy])
tm1.dimensions.update_or_create(dimension)
hierarchy = Hierarchy("TM1py Dimension 2", "TM1py Dimension 2")
hierarchy.add_element("Measure", "Numeric")
dimension = Dimension(name="TM1py Dimension 2", hierarchies=[hierarchy])
tm1.dimensions.update_or_create(dimension)
cube = Cube(name="TM1py Cube", dimensions=["TM1py Dimension 1", "TM1py Dimension 2"])
tm1.cubes.update_or_create(cube)
print("Created Cube")
with TM1Service(**sdata_params) as tm1:
setup(tm1)
cells = dict()
element2 = "Measure"
for e in range(1_000_000):
element1 = str(e).zfill(7)
value = e
cells[element1, element2] = value
for _ in range(2):
tm1.processes.execute_ti_code([f"CubeClearData('{CUBE}');"])
before = datetime.now()
tm1.cells.write(CUBE, cells, use_ti=True)
elapsed_time = datetime.now() - before
print(f"Write use_ti: {elapsed_time}")
tm1.processes.execute_ti_code([f"CubeClearData('{CUBE}');"])
before = datetime.now()
tm1.cells.write_async(CUBE, cells, max_workers=4, slice_size=250_000)
elapsed_time = datetime.now() - before
print(f"Write_async: {elapsed_time}")
tm1.processes.execute_ti_code([f"CubeClearData('{CUBE}');"])
before = datetime.now()
tm1.cells.write(CUBE, cells, use_blob=True)
elapsed_time = datetime.now() - before
print(f"Write use_blob: {elapsed_time}") |
ffa6b81
to
8d2cdfe
Compare
With such a big improvement can you push this to PyPi sometime soon? |
I am intrigued by seeing such a big difference in use_ti vs use_blob. In my experience, the creation of the previous unbound TI with the CellPutN statements, did not add a significant processing time. Any insight on what might be creating this improvement. |
Yes. We should do this short term. |
Here is my hypothesis: In the old approach, we would create a process with 1 million lines to write 1 million cells. Now we always create a process with 15 lines or so, even when we write 1 million cells. |
8d2cdfe
to
574797b
Compare
574797b
to
b11d817
Compare
Any more feedback is welcome on this one! Once it's out on PyPI we can't really change it much anymore without breaking compatibility |
That would just use the contents endpoint, right? Similar to how arc let's you upload a source file.
Sent from my mobile device
On Mar 29, 2023 6:18 PM, Marius Wirtz ***@***.***> wrote:
@MariusWirtz commented on this pull request.
________________________________
In TM1py/Services/CellService.py<#882 (comment)>:
+ csv_content = StringIO()
+ csv_writer = csv.writer(
+ csv_content,
+ delimiter=",",
+ quoting=csv.QUOTE_ALL)
+ csv_writer.writerows(
+ list(elements) + [value.replace('\r', '').replace('\n', '') if isinstance(value, str) else value]
+ for elements, value
+ in cellset_as_dict.items())
+
+ # Define the path and file name
+ filename = f'{unique_hash}.csv'
+
+ # Upload CSV to TM1 server using ApplicationService:
+ # Create a folder TM1py
+ tm1py_folder = FolderApplication(path='', name=folder_name)
I think you have a point there @gbryant-dev<https://github.com/gbryant-dev>.
We should change it to use blobs and ignore the concepts of TM1 applications.
—
Reply to this email directly, view it on GitHub<#882 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEK7GZSPYJP4ERSYN56SCU3W6SYKPANCNFSM6AAAAAAWJUQTKI>.
You are receiving this because you commented.Message ID: ***@***.***>
|
How exciting with such a big performance improvement! Can't wait for testing it in the cloud environment. 👍 |
That's correct, a document can be uploaded to |
I'm on it now :) I will introduce a new service class |
b30a423
to
e43613b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Naming isn't my one of my strengths so feel free to ignore the suggestion! 🙂
e43613b
to
bd8b8f6
Compare
Good catch. Thanks! I think we might be able to use the same approach to speed up the Does anyone want to take a shot at #884? |
The successor of #881
Changes: