Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: unable to generate data diff for a branch with a large data change #4438

Closed
wvandeun opened this issue Sep 24, 2024 · 1 comment
Closed
Assignees
Labels
group/backend Issue related to the backend (API Server, Git Agent) type/bug Something isn't working as expected

Comments

@wvandeun
Copy link
Contributor

Component

API Server / GraphQL

Infrahub version

0.16.0

Current Behavior

When you have a branch that holds a large data change (30 devices, with 100 interfaces each) and you open a proposed change, it seems that you are unable to generate a data diff.

The process of generating the data diff seems to keep on going forever and never stops.

Expected Behavior

We should be able to generate a data diff reliably, even for large data sets.

Steps to Reproduce

  • start infrahub instance with the demo schema inv dev.start demo.load-infra-schema
  • create a branch infrahubctl branch create test
  • run this script to load the data in the branch infrahubctl run <script.py> --branch test
import logging
from infrahub_sdk import InfrahubClient


async def run(client: InfrahubClient, log: logging.Logger, branch: str, num_devices: int=30) -> None:
    site = await client.create("LocationSite", name="atl1")
    await site.save(allow_upsert=True)

    num_devices = int(num_devices)

    device_batch = await client.create_batch()
    interface_batch = await client.create_batch()

    for i in range(num_devices):
        device = await client.create("InfraDevice", name=f"atl1-test{i}", site=site, type="testing")
        device_batch.add(task=device.save, node=device, allow_upsert=True)
        log.info(f"Added device {device.name.value}")
        
    async for node, result in device_batch.execute():
        print(f"device {node.name.value} was created in Infrahub succesfully")
        client.store.set(key=node.name.value, node=node)

    for i in range(num_devices):
        for j in range(100):
            interface = await client.create("InfraInterfaceL2", name=f"Ethernet{j}", l2_mode="Access", speed=10000, device=client.store.get(key=f"atl1-test{i}"))
            interface_batch.add(task=interface.save, node=interface, allow_upsert=True)
            log.info(f"  Added interface {interface.name.value} for device {interface.device.peer.name.value}")

    async for node, result in interface_batch.execute():
        print(f"interface {node.name.value} {node.device.peer.name.value} was created in Infrahub succesfully")
  • open a proposed change for the test branch
  • click on the data tab and refresh the diff

Additional Information

No response

@wvandeun wvandeun added type/bug Something isn't working as expected group/backend Issue related to the backend (API Server, Git Agent) labels Sep 24, 2024
@exalate-issue-sync exalate-issue-sync bot added this to the Infrahub - 0.16.2 milestone Sep 24, 2024
@ajtmccarty
Copy link
Contributor

#4376 includes a complete rewrite of the cypher query used to calculate a diff and extensive changes to the Python logic around diffs. it shows substantial performance improvements and is definitely a step in the right direction. for very large changes (1000s of nodes), it can still take several minutes to completely calculate the diff. we have more internal issues to track further improving performance and will continue to work on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
group/backend Issue related to the backend (API Server, Git Agent) type/bug Something isn't working as expected
Projects
None yet
Development

No branches or pull requests

2 participants