Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: metrics in neo4j adapter [COG-1082] #487

Merged
merged 33 commits into from
Feb 7, 2025

Conversation

alekszievr
Copy link
Contributor

@alekszievr alekszievr commented Jan 30, 2025

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

  • New Features

    • Enhanced graph management capabilities allow users to verify graph existence, project complete graphs, and remove graphs, delivering more comprehensive graph insights.
  • Refactor

    • Adjusted default task behavior for streamlined performance.
    • Updated timestamp handling to ensure accurate and consistent record tracking.

Copy link
Contributor

coderabbitai bot commented Jan 30, 2025

Walkthrough

This pull request updates task instantiation, graph management, and timestamp handling across different modules. In the API layer, the default task for store_descriptive_metrics no longer includes optional parameters. The Neo4j adapter has been revised by removing the old get_graph_metrics method and introducing new asynchronous methods (graph_exists, project_entire_graph, and drop_graph) to better handle graph lifecycle operations and metric computations. Additionally, the database model for graph metrics now leverages server-side timestamp generation through SQLAlchemy’s func.now().

Changes

File(s) Change Summary
cognee/api/v1/cognify/cognify_v2.py Removed include_optional=True argument in the instantiation of Task(store_descriptive_metrics) within the get_default_tasks function.
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py Removed the old get_graph_metrics method and added new async methods: graph_exists, project_entire_graph, and drop_graph for dynamic graph management and enhanced graph metric calculation.
cognee/modules/data/models/GraphMetrics.py Updated the created_at and updated_at columns in the GraphMetrics model to use database-side timestamp generation with func.now(), replacing the Python lambda-based timestamp generation.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant N as Neo4jAdapter
    participant G as Graph Database Service

    C->>N: Call graph_exists(graph_name)
    N->>G: Query available graph names
    G-->>N: Return list of graphs
    N-->>C: Return existence status

    C->>N: Call project_entire_graph(graph_name)
    N->>G: Request projection of all nodes & relationships
    G-->>N: Return in-memory projected graph
    N-->>C: Provide projected graph

    C->>N: Call drop_graph(graph_name)
    N->>G: Execute graph drop command
    G-->>N: Confirm deletion
    N-->>C: Return drop confirmation
Loading

Possibly related PRs

Suggested reviewers

  • borisarzentar
  • lxobr

Poem

I'm a little rabbit, hopping through the code,
Finding new pathways where tasks and graphs reload.
Carrots of logic and fields of neat design,
Timestamps now sing with a database shine!
With each new method, I hop in delight,
Celebrating fresh changes from morning 'til night.
🐰 Hop on, dear coder, the code's looking bright!

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c13fdec and 58e5275.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/cognify_v2.py (0 hunks)
💤 Files with no reviewable changes (1)
  • cognee/api/v1/cognify/cognify_v2.py
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: windows-latest
  • GitHub Check: docker-compose-test

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@alekszievr alekszievr changed the base branch from dev to feat/cog-1082-metrics-in-networkx-adapter January 30, 2025 16:55
@alekszievr alekszievr requested a review from lxobr January 30, 2025 17:07
@alekszievr alekszievr self-assigned this Jan 30, 2025
@borisarzentar borisarzentar changed the title Feat: metrics in neo4j adapter [COG-1082] feat: metrics in neo4j adapter [COG-1082] Jan 31, 2025
@alekszievr alekszievr force-pushed the feat/cog-1082-metrics-in-networkx-adapter branch from e89c9b9 to 27feae8 Compare February 3, 2025 14:37
@alekszievr alekszievr force-pushed the feat/cog-1082-metrics-in-networkx-adapter branch from 27feae8 to af8e798 Compare February 3, 2025 14:46
Base automatically changed from feat/cog-1082-metrics-in-networkx-adapter to dev February 3, 2025 17:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 2

🧹 Nitpick comments (8)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (4)

573-577: Add logging after dropping the graph.
It might be helpful to log whether the graph was successfully dropped or was absent, for better traceability in production.

 async def drop_graph(self, graph_name="myGraph"):
     if await self.graph_exists(graph_name):
         drop_query = f"CALL gds.graph.drop('{graph_name}');"
         await self.query(drop_query)
+        logger.debug(f"Dropped graph '{{graph_name}}' successfully.")

633-636: Diameter not yet implemented.
If diameter is critical, consider GDS Shortest Path or BFS expansions. Let us know if you’d like assistance with a workable approach.


637-642: Average shortest path not yet implemented.
Likewise, GDS offers built-in algorithms for average path length. Let us know if you’d like to integrate it.


643-645: Average clustering not yet implemented.
For completeness, you may explore GDS or external libraries to compute clustering.

cognee/infrastructure/databases/graph/graph_db_interface.py (1)

59-59: Document the new parameter include_optional.
Adding a short docstring describing its usage will help future maintainers understand which metrics are impacted by this flag.

cognee/modules/data/methods/store_descriptive_metrics.py (1)

26-29: Add validation for the include_optional parameter.

Consider adding validation for the include_optional parameter to ensure it's a boolean value.

 async def store_descriptive_metrics(data_points: list[DataPoint], include_optional: bool):
+    if not isinstance(include_optional, bool):
+        raise ValueError("include_optional must be a boolean value")
     db_engine = get_relational_engine()
     graph_engine = await get_graph_engine()
     graph_metrics = await graph_engine.get_graph_metrics(include_optional)
cognee/api/v1/cognify/cognify_v2.py (1)

168-168: Consider making include_optional configurable.

The include_optional parameter is hardcoded to True. Consider making this configurable through the cognify config to allow flexibility in whether optional metrics are computed.

-            Task(store_descriptive_metrics, include_optional=True),
+            Task(store_descriptive_metrics, include_optional=cognee_config.include_optional_metrics),
cognee/infrastructure/databases/graph/networkx/adapter.py (1)

416-422: Improve error handling in clustering coefficient calculation.

The current implementation swallows exception details. Consider logging the full exception traceback for better debugging.

     def _get_avg_clustering(graph):
         try:
             return nx.average_clustering(nx.DiGraph(graph))
         except Exception as e:
-            logger.warning("Failed to calculate clustering coefficient: %s", e)
+            logger.warning("Failed to calculate clustering coefficient", exc_info=True)
             return None
🛑 Comments failed to post (2)
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)

647-649: ⚠️ Potential issue

Potential index/key error in node/edge data extraction.
nodes[0]["nodes"] or edges[0]["elements"] might raise an exception if the query returns an empty list or no matching keys. Consider validating non-empty results.

 num_nodes = len(nodes[0].get("nodes", [])) if nodes and "nodes" in nodes[0] else 0
 num_edges = len(edges[0].get("elements", [])) if edges and "elements" in edges[0] else 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        num_nodes = len(nodes[0].get("nodes", [])) if nodes and "nodes" in nodes[0] else 0
        num_edges = len(edges[0].get("elements", [])) if edges and "elements" in edges[0] else 0
cognee/infrastructure/databases/graph/networkx/adapter.py (1)

442-447: 🛠️ Refactor suggestion

Use None instead of -1 for missing optional metrics.

Using -1 as a sentinel value for missing optional metrics could be misleading as it might be interpreted as a valid metric value. Consider using None instead.

         optional_metrics = {
-            "num_selfloops": -1,
-            "diameter": -1,
-            "avg_shortest_path_length": -1,
-            "avg_clustering": -1,
+            "num_selfloops": None,
+            "diameter": None,
+            "avg_shortest_path_length": None,
+            "avg_clustering": None,
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

            optional_metrics = {
                "num_selfloops": None,
                "diameter": None,
                "avg_shortest_path_length": None,
                "avg_clustering": None,
            }

@alekszievr alekszievr force-pushed the feat/cog-1082-metrics-in-neo4j-adapter branch from 81a4aa3 to f2ad1d4 Compare February 3, 2025 17:18
@alekszievr alekszievr force-pushed the feat/cog-1082-metrics-in-neo4j-adapter branch from a1ffeca to 3e67828 Compare February 4, 2025 11:22
Copy link

gitguardian bot commented Feb 7, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9573981 Triggered Generic Password 91b42ab .env.template View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@alekszievr alekszievr merged commit 8396fed into dev Feb 7, 2025
24 of 26 checks passed
@alekszievr alekszievr deleted the feat/cog-1082-metrics-in-neo4j-adapter branch February 7, 2025 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants