feat: metrics in neo4j adapter [COG-1082] #487

alekszievr · 2025-01-30T16:54:47Z

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

New Features
- Enhanced graph management capabilities allow users to verify graph existence, project complete graphs, and remove graphs, delivering more comprehensive graph insights.
Refactor
- Adjusted default task behavior for streamlined performance.
- Updated timestamp handling to ensure accurate and consistent record tracking.

…oken-counting

…-tokens-to-metric-table

…add-num-tokens-to-metric-table

coderabbitai · 2025-01-30T16:54:54Z

Walkthrough

This pull request updates task instantiation, graph management, and timestamp handling across different modules. In the API layer, the default task for store_descriptive_metrics no longer includes optional parameters. The Neo4j adapter has been revised by removing the old get_graph_metrics method and introducing new asynchronous methods (graph_exists, project_entire_graph, and drop_graph) to better handle graph lifecycle operations and metric computations. Additionally, the database model for graph metrics now leverages server-side timestamp generation through SQLAlchemy’s func.now().

Changes

File(s)	Change Summary
cognee/api/v1/cognify/cognify_v2.py	Removed `include_optional=True` argument in the instantiation of `Task(store_descriptive_metrics)` within the `get_default_tasks` function.
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py	Removed the old `get_graph_metrics` method and added new async methods: `graph_exists`, `project_entire_graph`, and `drop_graph` for dynamic graph management and enhanced graph metric calculation.
cognee/modules/data/models/GraphMetrics.py	Updated the `created_at` and `updated_at` columns in the `GraphMetrics` model to use database-side timestamp generation with `func.now()`, replacing the Python lambda-based timestamp generation.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant N as Neo4jAdapter
    participant G as Graph Database Service

    C->>N: Call graph_exists(graph_name)
    N->>G: Query available graph names
    G-->>N: Return list of graphs
    N-->>C: Return existence status

    C->>N: Call project_entire_graph(graph_name)
    N->>G: Request projection of all nodes & relationships
    G-->>N: Return in-memory projected graph
    N-->>C: Provide projected graph

    C->>N: Call drop_graph(graph_name)
    N->>G: Execute graph drop command
    G-->>N: Confirm deletion
    N-->>C: Return drop confirmation

Possibly related PRs

feat: Calculate graph metrics for networkx graph [COG-1082] #484: Modifies the instantiation of the Task for store_descriptive_metrics by altering the include_optional parameter.
Changes Neo4j add_edge method and implements unit tests around get_graph_from_model logic #474: Involves similar modifications to task instantiations within get_default_tasks, affecting tasks such as add_data_points alongside store_descriptive_metrics.

Suggested reviewers

borisarzentar
lxobr

Poem

I'm a little rabbit, hopping through the code,
Finding new pathways where tasks and graphs reload.
Carrots of logic and fields of neat design,
Timestamps now sing with a database shine!
With each new method, I hop in delight,
Celebrating fresh changes from morning 'til night.
🐰 Hop on, dear coder, the code's looking bright!

Tip

🌐 Web search-backed reviews and chat

We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
You can disable this feature by setting web_search: false in the knowledge_base settings.
Please share any feedback in the Discord discussion.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c13fdec and 58e5275.

📒 Files selected for processing (1)

cognee/api/v1/cognify/cognify_v2.py (0 hunks)

💤 Files with no reviewable changes (1)

cognee/api/v1/cognify/cognify_v2.py

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: test
GitHub Check: test
GitHub Check: run_notebook_test / test
GitHub Check: windows-latest
GitHub Check: docker-compose-test

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…og-1082-metrics-in-networkx-adapter

…g-1082-metrics-in-neo4j-adapter

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py

…g-1082-metrics-in-neo4j-adapter

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 2

🧹 Nitpick comments (8)

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (4)
573-577: Add logging after dropping the graph.
It might be helpful to log whether the graph was successfully dropped or was absent, for better traceability in production.
 async def drop_graph(self, graph_name="myGraph"):
     if await self.graph_exists(graph_name):
         drop_query = f"CALL gds.graph.drop('{graph_name}');"
         await self.query(drop_query)
+        logger.debug(f"Dropped graph '{{graph_name}}' successfully.")
633-636: Diameter not yet implemented.
If diameter is critical, consider GDS Shortest Path or BFS expansions. Let us know if you’d like assistance with a workable approach.

637-642: Average shortest path not yet implemented.
Likewise, GDS offers built-in algorithms for average path length. Let us know if you’d like to integrate it.

643-645: Average clustering not yet implemented.
For completeness, you may explore GDS or external libraries to compute clustering.
cognee/infrastructure/databases/graph/graph_db_interface.py (1)

59-59: Document the new parameter include_optional.
Adding a short docstring describing its usage will help future maintainers understand which metrics are impacted by this flag.
cognee/modules/data/methods/store_descriptive_metrics.py (1)
26-29: Add validation for the include_optional parameter.

Consider adding validation for the include_optional parameter to ensure it's a boolean value.
 async def store_descriptive_metrics(data_points: list[DataPoint], include_optional: bool):
+    if not isinstance(include_optional, bool):
+        raise ValueError("include_optional must be a boolean value")
     db_engine = get_relational_engine()
     graph_engine = await get_graph_engine()
     graph_metrics = await graph_engine.get_graph_metrics(include_optional)
cognee/api/v1/cognify/cognify_v2.py (1)
168-168: Consider making include_optional configurable.

The include_optional parameter is hardcoded to True. Consider making this configurable through the cognify config to allow flexibility in whether optional metrics are computed.
-            Task(store_descriptive_metrics, include_optional=True),
+            Task(store_descriptive_metrics, include_optional=cognee_config.include_optional_metrics),
cognee/infrastructure/databases/graph/networkx/adapter.py (1)
416-422: Improve error handling in clustering coefficient calculation.

The current implementation swallows exception details. Consider logging the full exception traceback for better debugging.
     def _get_avg_clustering(graph):
         try:
             return nx.average_clustering(nx.DiGraph(graph))
         except Exception as e:
-            logger.warning("Failed to calculate clustering coefficient: %s", e)
+            logger.warning("Failed to calculate clustering coefficient", exc_info=True)
             return None

🛑 Comments failed to post (2)

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)
647-649: ⚠️ Potential issue

Potential index/key error in node/edge data extraction.
nodes[0]["nodes"] or edges[0]["elements"] might raise an exception if the query returns an empty list or no matching keys. Consider validating non-empty results.
 num_nodes = len(nodes[0].get("nodes", [])) if nodes and "nodes" in nodes[0] else 0
 num_edges = len(edges[0].get("elements", [])) if edges and "elements" in edges[0] else 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
        num_nodes = len(nodes[0].get("nodes", [])) if nodes and "nodes" in nodes[0] else 0
        num_edges = len(edges[0].get("elements", [])) if edges and "elements" in edges[0] else 0
cognee/infrastructure/databases/graph/networkx/adapter.py (1)
442-447: 🛠️ Refactor suggestion

Use None instead of -1 for missing optional metrics.

Using -1 as a sentinel value for missing optional metrics could be misleading as it might be interpreted as a valid metric value. Consider using None instead.
         optional_metrics = {
-            "num_selfloops": -1,
-            "diameter": -1,
-            "avg_shortest_path_length": -1,
-            "avg_clustering": -1,
+            "num_selfloops": None,
+            "diameter": None,
+            "avg_shortest_path_length": None,
+            "avg_clustering": None,
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
            optional_metrics = {
                "num_selfloops": None,
                "diameter": None,
                "avg_shortest_path_length": None,
                "avg_clustering": None,
            }

gitguardian · 2025-02-07T13:36:30Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
9573981	Triggered	Generic Password	`91b42ab`	.env.template	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

alekszievr and others added 18 commits January 28, 2025 12:11

Count the number of tokens in documents

458eeac

Merge branch 'COG-970-refactor-tokenizing' into feat/cog-1071-input-t…

51eadef

…oken-counting

Merge branch 'COG-970-refactor-tokenizing' into feat/cog-1071-input-t…

ba608a4

…oken-counting

save token count to relational db

f6663ab

Merge branch 'COG-970-refactor-tokenizing' into feat/cog-1132-add-num…

9182be8

…-tokens-to-metric-table

Add metrics to metric table

72dfec4

Merge branch 'dev' into feat/cog-1071-input-token-counting

9bd5917

Merge branch 'feat/cog-1071-input-token-counting' into feat/cog-1132-…

227d94e

…add-num-tokens-to-metric-table

Store list as json instead of array in relational db table

22b6459

Merge branch 'dev' into feat/cog-1132-add-num-tokens-to-metric-table

9764441

Sum in sql instead of python

100e7d7

Unify naming

c182d47

Return data_points in descriptive metric calculation task

44fa2cd

Graph metrics getter template in graph db interface and adapters

06030ff

Calculate descriptive metrics in networkx adapter

67d9908

neo4j metrics

252ac7f

Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface

48a51a3

remove _table from table name

9a94db8

alekszievr changed the base branch from dev to feat/cog-1082-metrics-in-networkx-adapter January 30, 2025 16:55

alekszievr requested a review from lxobr January 30, 2025 17:07

alekszievr self-assigned this Jan 30, 2025

Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface

57fb338

borisarzentar changed the title ~~Feat: metrics in neo4j adapter [COG-1082]~~ feat: metrics in neo4j adapter [COG-1082] Jan 31, 2025

alekszievr added the run-checks label Jan 31, 2025

alekszievr and others added 5 commits February 1, 2025 12:58

Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface

e8dcef1

Merge branch 'dev' into feat/cog-1082-metrics-in-graphdb-interface

b0f6ba7

Use modules for adding to db instead of infrastructure

05138fa

Merge branch 'feat/cog-1082-metrics-in-graphdb-interface' into feat/c…

f064f52

…og-1082-metrics-in-networkx-adapter

Merge branch 'feat/cog-1082-metrics-in-networkx-adapter' into feat/co…

c9ee1bc

…g-1082-metrics-in-neo4j-adapter

borisarzentar reviewed Feb 3, 2025

View reviewed changes

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py Outdated Show resolved Hide resolved

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py Outdated Show resolved Hide resolved

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py Show resolved Hide resolved

alekszievr force-pushed the feat/cog-1082-metrics-in-networkx-adapter branch from e89c9b9 to 27feae8 Compare February 3, 2025 14:37

Merge branch 'dev' into feat/cog-1082-metrics-in-networkx-adapter

af8e798

alekszievr force-pushed the feat/cog-1082-metrics-in-networkx-adapter branch from 27feae8 to af8e798 Compare February 3, 2025 14:46

alekszievr added 2 commits February 3, 2025 15:51

Merge branch 'feat/cog-1082-metrics-in-networkx-adapter' into feat/co…

406057f

…g-1082-metrics-in-neo4j-adapter

minor fixes

d93b5f5

Base automatically changed from feat/cog-1082-metrics-in-networkx-adapter to dev February 3, 2025 17:05

minor cleanup

c13fdec

coderabbitai bot reviewed Feb 3, 2025

View reviewed changes

Merge branch 'dev' into feat/cog-1082-metrics-in-neo4j-adapter

f2ad1d4

alekszievr force-pushed the feat/cog-1082-metrics-in-neo4j-adapter branch from 81a4aa3 to f2ad1d4 Compare February 3, 2025 17:18

Remove graph metric calculation from the default cognify pipeline

3e67828

alekszievr force-pushed the feat/cog-1082-metrics-in-neo4j-adapter branch from a1ffeca to 3e67828 Compare February 4, 2025 11:22

alekszievr added 3 commits February 4, 2025 12:23

Merge branch 'dev' into feat/cog-1082-metrics-in-neo4j-adapter

58e5275

Merge branch 'dev' into feat/cog-1082-metrics-in-neo4j-adapter

dc06b50

Merge branch 'dev' into feat/cog-1082-metrics-in-neo4j-adapter

91b42ab

lxobr approved these changes Feb 7, 2025

View reviewed changes

alekszievr merged commit 8396fed into dev Feb 7, 2025
24 of 26 checks passed

alekszievr deleted the feat/cog-1082-metrics-in-neo4j-adapter branch February 7, 2025 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: metrics in neo4j adapter [COG-1082] #487

feat: metrics in neo4j adapter [COG-1082] #487

alekszievr commented Jan 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 30, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

gitguardian bot commented Feb 7, 2025

feat: metrics in neo4j adapter [COG-1082] #487

feat: metrics in neo4j adapter [COG-1082] #487

Conversation

alekszievr commented Jan 30, 2025 • edited by coderabbitai bot Loading

Description

DCO Affirmation

Summary by CodeRabbit

coderabbitai bot commented Jan 30, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

gitguardian bot commented Feb 7, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

alekszievr commented Jan 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 30, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)