Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: neo4j rasises neo4j.exceptions.DatabaseError exception when storing a large GraphQL query #4399

Closed
wvandeun opened this issue Sep 20, 2024 · 3 comments
Assignees
Labels
group/backend Issue related to the backend (API Server, Git Agent) type/bug Something isn't working as expected

Comments

@wvandeun
Copy link
Contributor

wvandeun commented Sep 20, 2024

Component

API Server / GraphQL

Infrahub version

0.16.0

Current Behavior

When you try to store a large GraphQL query in Infrahub (query used for a transformation synced into Infrahub, or a user create CoreGraphQLQuery object) you get a neo4j.exceptions.DatabaseError exception.

Traceback (most recent call last):
  File \"/usr/local/lib/python3.12/site-packages/graphql/execution/execute.py\", line 530, in await_result
    return_type, field_nodes, info, path, await result
                                          ^^^^^^^^^^^^
  File \"/source/backend/infrahub/graphql/mutations/main.py\", line 68, in mutate
    obj, mutation = await cls.mutate_create(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/source/backend/infrahub/graphql/mutations/graphql_query.py\", line 74, in mutate_create
    obj, result = await super().mutate_create(root=root, info=info, data=data, branch=branch, at=at)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/source/backend/infrahub/graphql/mutations/main.py\", line 159, in mutate_create
    obj = await cls.mutate_create_object(data=data, db=db, branch=branch, at=at)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/source/backend/infrahub/database/__init__.py\", line 406, in wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/source/backend/infrahub/graphql/mutations/main.py\", line 187, in mutate_create_object
    await obj.save(db=dbt)
  File \"/source/backend/infrahub/core/node/__init__.py\", line 456, in save
    await self._create(at=save_at, db=db)
  File \"/source/backend/infrahub/core/node/__init__.py\", line 410, in _create
    await query.execute(db=db)
  File \"/source/backend/infrahub/core/query/__init__.py\", line 540, in execute
    results, metadata = await db.execute_query_with_metadata(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/source/backend/infrahub/database/__init__.py\", line 295, in execute_query_with_metadata
    results = [item async for item in response]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/usr/local/lib/python3.12/site-packages/neo4j/_async/work/result.py\", line 378, in __aiter__
    await self._connection.fetch_message()
  File \"/usr/local/lib/python3.12/site-packages/neo4j/_async/io/_common.py\", line 188, in inner
    await coroutine_func(*args, **kwargs)
  File \"/usr/local/lib/python3.12/site-packages/neo4j/_async/io/_bolt.py\", line 860, in fetch_message
    res = await self._process_message(tag, fields)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/usr/local/lib/python3.12/site-packages/neo4j/_async/io/_bolt5.py\", line 370, in _process_message
    await response.on_failure(summary_metadata or {})
  File \"/usr/local/lib/python3.12/site-packages/neo4j/_async/io/_common.py\", line 245, in on_failure
    raise Neo4jError.hydrate(**metadata)

neo4j.exceptions.DatabaseError: {code: Neo.DatabaseError.Statement.ExecutionFailed} {message: Property value is too large to index, please see index documentation for limitations. Index: Index( id=7, name='node_range_attr_value_value', type='RANGE', schema=(:AttributeValue {value}), indexProvider='range-1.0' ), entity id: 40115, property size: 13876.}"}

Expected Behavior

The CoreGraphQLQuery object gets created, or the user gets a proper error message.

Steps to Reproduce

  • load an infrahub instance
  • load a schema
  • create a new CoreGraphQLQuery object
  • define a "significant" large graphql query (14KB in size is sufficient)

Additional Information

The error message seems to be related to the fact we want to index the "query" attribute value. Not sure if that makes a lot of sense in the context of a GraphQLQuery.

@wvandeun wvandeun added type/bug Something isn't working as expected group/backend Issue related to the backend (API Server, Git Agent) labels Sep 20, 2024
@exalate-issue-sync exalate-issue-sync bot added this to the Infrahub - 0.16.1 milestone Sep 20, 2024
@ogenstad
Copy link
Contributor

The problem is with this index:

SHOW INDEXES where name = "node_range_attr_value_value";


╒═══╤═════════════════════════════╤════════╤═════════════════╤═══════╤══════════╤══════════════════╤══════════╤═════════════╤════════════════╤════════╤═════════╕
│idnamestatepopulationPercenttypeentityTypelabelsOrTypespropertiesindexProviderowningConstraintlastReadreadCount│
╞═══╪═════════════════════════════╪════════╪═════════════════╪═══════╪══════════╪══════════════════╪══════════╪═════════════╪════════════════╪════════╪═════════╡
│7"node_range_attr_value_value""ONLINE"100.0"RANGE""NODE"    │["AttributeValue"]│["value"] │"range-1.0"nullnull0        │
└───┴─────────────────────────────┴────────┴─────────────────┴───────┴──────────┴──────────────────┴──────────┴─────────────┴────────────────┴────────┴─────────┘

This doc is for an earlier version of neo4j, but seems relevant: https://neo4j.com/developer/kb/index-limitations-and-workaround/

The native-btree-1.0 index provider has a key size limit of 8167 bytes.

While we're not using the same exact index provider it seems to have the same size limit.

A quick workaround can be to remove the index from the database, but there will probably be some performance penalty to this.

DROP index node_range_attr_value_value;

The problem is not related to the GraphQL query objects themselves, instead it's the value of any field.

If we look at the definition of these objects as defined below a solution might be to set a size limit for the kinds Text and others, and then use something other than the AttributeValue label for other field types such as TextArea and JSON such as LargeAttributeValue where the large one is unindexed. Though I don't yet know the impact this will have on the query engine.

Alternatively we reconsider the use of this index all together.

        {
            "name": "GraphQLQuery",
            "namespace": "Core",
            "description": "A pre-defined GraphQL Query",
            "include_in_menu": False,
            "icon": "mdi:graphql",
            "label": "GraphQL Query",
            "default_filter": "name__value",
            "order_by": ["name__value"],
            "display_labels": ["name__value"],
            "generate_profile": False,
            "branch": BranchSupportType.AWARE.value,
            "uniqueness_constraints": [["name__value"]],
            "documentation": "/topics/graphql",
            "attributes": [
                {"name": "name", "kind": "Text", "unique": True},
                {"name": "description", "kind": "Text", "optional": True},
                {"name": "query", "kind": "TextArea"},
                {
                    "name": "variables",
                    "kind": "JSON",
                    "description": "variables in use in the query",
                    "optional": True,
                    "read_only": True,
                },

@dgarros
Copy link
Collaborator

dgarros commented Sep 20, 2024

Great summary @ogenstad

I'm leaning toward this solution

use something other than the AttributeValue label for other field types such as TextArea and JSON such as LargeAttributeValue where the large one is unindexed.

@dgarros
Copy link
Collaborator

dgarros commented Sep 23, 2024

Fixed in #4412

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
group/backend Issue related to the backend (API Server, Git Agent) type/bug Something isn't working as expected
Projects
None yet
Development

No branches or pull requests

3 participants