feature: provide a way to use pagination concurrently to retrieve objects using the SDK #159

wvandeun · 2024-11-29T23:09:16Z

Component

Python SDK

Describe the Feature Request

When you execute a query that retrieves a large amount of nodes from the db, using the filters or all method, then the SDK will leverage pagination to break the query into smaller pages.

The retrieval of this pages happens in a serial way, which is ideal. We can do this better and faster if we retrieve the pages concurrently.

Pseud code of what it could look like

from infrahub_sdk import InfrahubClient

client = InfrahubClient()
resp = await client.get("query { LocationSuite { count }}")

count = int(resp["LocationSuite"]["count"])

batch = await client.create_batch()

num = 1
page_size=50
while num < count:
    batch.add(task=client.all, kind="nodekind", offset=num, limit=page_size)
    num += page_size

Describe the Use Case

Retrieval of a large number of nodes using a GraphQL query is taking some time, since we are retrieving the pages one by one in a serial way. Being able to get the pages in a concurrent way should improve the speed.

Additional Information

No response

minitriga · 2025-01-06T17:36:23Z

I have a working example of this for both async and sync clients using the batch functionality within the SDK.
Sync:

from rich import print as rprint
import os
from infrahub_sdk import InfrahubClientSync
from infrahub_sdk import Config

client = InfrahubClientSync(config=Config(pagination_size=2))

def main():
    branches = client.all(kind="OrganizationGeneric", batch=True)
    rprint(branches)

if __name__ == "__main__":
    main()

Async:

from asyncio import run as aiorun

from rich import print as rprint

from infrahub_sdk import Config, InfrahubClient

client = InfrahubClient(config=Config(pagination_size=2))

async def create_data(number: int):
    data = {
        "name": f"Vendor {number}",
    }
    obj = await client.create(kind="OrganizationGeneric", data=data)
    await obj.save()
    print(f"New OrganizationGeneric created with the Id {obj.id}")


async def main():
    # for i in range(1,1000):
    #     await create_data(i)
    branches = await client.all(kind="OrganizationGeneric", batch=True)
    rprint(len(branches))


if __name__ == "__main__":
    aiorun(main())

I have manually set the pagination size to 2 to slow things down but without batch=True the query for 1021 locations takes poetry run python test_sync.py 2.71s user 0.44s system 47% cpu 6.597 total and left to process the queries in serial it takes poetry run python test_sync.py 4.38s user 0.66s system 20% cpu 24.167 total.

@wvandeun mentioned that batch is not the best argument name so open to suggestions.

wvandeun added the type/feature New feature or request label Nov 29, 2024

minitriga self-assigned this Jan 6, 2025

minitriga added a commit that referenced this issue Jan 6, 2025

resolves #159: Add batch argument to fetch data in batches

100fcfb

minitriga mentioned this issue Jan 6, 2025

resolves #159: Add batch argument to fetch data in batches #219

Merged

exalate-issue-sync bot unassigned minitriga Jan 7, 2025

minitriga added a commit that referenced this issue Jan 7, 2025

resolves #159: Add batch argument to fetch data in batches

276f550

dgarros closed this as completed in #219 Jan 9, 2025

dgarros closed this as completed in dfd2816 Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: provide a way to use pagination concurrently to retrieve objects using the SDK #159

feature: provide a way to use pagination concurrently to retrieve objects using the SDK #159

wvandeun commented Nov 29, 2024 •

edited by BeArchiTek

Loading

minitriga commented Jan 6, 2025 •

edited

Loading

feature: provide a way to use pagination concurrently to retrieve objects using the SDK #159

feature: provide a way to use pagination concurrently to retrieve objects using the SDK #159

Comments

wvandeun commented Nov 29, 2024 • edited by BeArchiTek Loading

Component

Describe the Feature Request

Describe the Use Case

Additional Information

minitriga commented Jan 6, 2025 • edited Loading

wvandeun commented Nov 29, 2024 •

edited by BeArchiTek

Loading

minitriga commented Jan 6, 2025 •

edited

Loading