Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Graph de/serialization #2612

Open
wants to merge 18 commits into
base: develop
Choose a base branch
from
Open

Refactor Graph de/serialization #2612

wants to merge 18 commits into from

Conversation

yann-lty
Copy link
Member

@yann-lty yann-lty commented Dec 2, 2024

Description

Refactoring graph de/serialization for code-base simplification, fixing issues and better testability.

This PR introduces a major refactoring of the graph serialization/deserialization logic (aka graph IO).
With those changes, we address some of the known issues in this area, most notably when copy/pasting nodes.

Overall, the main concept is that any operation that needs to operate on serialized Graph data uses deserialization first, to work with Graph API, instead of manipulating a raw dictionary of serialized data.
It also creates a clear distinction in the API between the loading of a Meshroom file, the initialization of a Graph from a template and importing the content of another Graph.

Features

  • Fix copy pasting issues that was failling to re-connect nodes correctly on paste.
  • Deal with advanced copy/pasting scenarios, eg. handling compatibility nodes.
  • Improved graph loading performance.

Implementation details

The goal of this PR is to move away from a monolothic Graph class that mixes a lot of responsibilities, to make the code more manageable and testable.

NodeFactory

The nodeFactory function has been refactored to rely on an underlying class that manages the compatibility checks. While preserving the existing behavior, this class helps understanding the different steps that occurs during that that process.

Graph

Deserialization

The current implementation exposes a unique entry point, as a single load function with several booleans to manage the different cases, to deal with:

  • loading a Meshroom file
  • initializing a Graph instance from a template file
  • importing another graph's content
classDiagram
class Graph {
  load(filepath, setupProjectFile=True, importProject=False, publishOutputs=False)
}
Loading

This makes it complex to manage the possible combinations of options, as all of them are not necessarily meaningful.

In the new approach, those are decoupled into as many entry points as required to handle the distinct logic those operations have:

classDiagram

class Graph {
 + load(filepath: PathLike)
 + initFromTemplate(filepath: PathLike, publishOutputs: bool)
 + importGraphContentFromFile(filepath: PathLike)
 + importGraphContent(graph: Graph)
}
Loading

UID conflict check

The current implementation creates all nodes twice to deal with UID compatibility issues, which increases load times and may keep false-positives depending on the order in which nodes are handled.

In the new approach, we only consider nodes that actually have a UID conflict, and solve them by depth ordering.
Indeed, UID conflicts are contagious: an upstream node can impact the downstream ones, as UID flow through connections.
By replacing upstream nodes first with CompatibilityNodes, we restore their serialized UID, which may solve the "false-positive" conflicts downstream, that were only due to the contagion by this node.

Importing another graph content

This area benefits the most from the refactoring, which solves errors we may currently encounter in the copy-pasting of nodes.
Instead of manipulating raw dictionaries to solve conflicting node names w.r.t the current content of the Graph, it now relies on:

  • the Graph API to make changes to node names in a separate Graph instance
  • node serialization, to get the correctly updated link expressions (serialized Edges)
  • node deserialization, to import nodes in the target Graph instance

Serialization

Introducing new GraphSerializer classes, following a Template Method pattern.
This makes it easier to define several node serialization logic, while inheriting from the standard serialization structure.

@yann-lty yann-lty force-pushed the dev/graphIO branch 5 times, most recently from a5a016e to b3117de Compare December 10, 2024 11:53
@yann-lty yann-lty marked this pull request as ready for review December 12, 2024 15:38
Add a new test suite for graph template loading.
Rewrite `nodeFactory` to reduce cognitive complexity,
while preserving the current behavior.
* API
Instead of having a single `load` function that exposes in its API
some elements only applicable to initializing a graph from a templates,
split it into 2 distinct functions: `load` and `initFromTemplate`.
Apply those changes to users of the API (UI, CLI), and simplify Graph
wrapper classes to better align with those concepts.

* Deserialization
Reduce the cognitive complexity of the deserizalization process
by splitting it into more atomic functions, while maintaining the
current behavior.
Extract the logic of importing the content of a graph within a graph instance from
the graph loading logic.
Add `Graph.importGraphContent` and `Graph.importGraphContentFromFile`
methods.
Use the deserialization API to load the content in another temporary graph instance,
to handle the renaming of nodes using the Graph API, rather than manipulating
entries in a raw dictionnary.
…es for unknown File attributes

When creating a compatibility description for an unknown attribute, do not
consider a link expression as the default value for a File attribute.
This is breaking how the link expression solving system works, as it's resetting
the attribute to its default value after applying the link.
If that expression is kept as the default value, it can be re-evaluated several
times incorrectly.
Added a test case that was failing before that change to illustrate the issue.
Move Graph.IO internal class to its own module, and rename it to `GraphIO`.
This avoid nested classes within the core Graph class, and starts decoupling
the management of graph's IO from the logic of the graph itself.
Move the serialization logic to dedicated serializer classes.
Implement both `GraphSerializer` and `TemplateGraphSerializer`
to cover for the existing serialization use-cases.
Add a new serializer class to manage partial graph serialization logic,
ensuring to remove link expressions on attributes refering to nodes
that are not in the subset of nodes to serialize.
Only perform uid check when we have both a serialized and a computed
UID.
If the node has not been serialized with a UID, it means that it does not
expect to match a specific value on deserialization.
Re-implement node pasting by relying on the graph partial serializer,
to serialize only the subset of selected nodes.
On pasting, use standard graph deserialization and import the content
of the serialized graph in the active graph instance.

Simplify the positioning of pasted nodes to only consider mouse position
or center of the graph, which works well for the major variety of use-cases.
Compute the offset to apply to imported nodes by using the de-serialized
graph content's bounding box.
Add a new method to create a copy of a graph instance, relying on
chaining serialization and deserialization operations.
Add test suite to validate its behavior, and the underlying serialization processes.
Factorize the logic of replacing a node with another one and re-creating
output edges into `Graph.replaceNode`  and `Graph._restoreOutEdges`.
At the end of the deserialization process, solve node uid conflicts iteratively by node depths,
and only replace the conflicting nodes with a CompatibilityNode.

Add new test suite for testing uid conflict handling.
… UidConflict

Only set the expectedUid when undoing the upgrade of a uid conflicting node.
Otherwise, let the other type of conflicts take precedence.
Copy link

codecov bot commented Dec 13, 2024

Codecov Report

Attention: Patch coverage is 96.40288% with 25 lines in your changes missing coverage. Please review.

Project coverage is 73.32%. Comparing base (c2d4159) to head (dbafe84).
Report is 6 commits behind head on develop.

Files with missing lines Patch % Lines
meshroom/core/graph.py 92.46% 11 Missing ⚠️
meshroom/core/graphIO.py 91.50% 9 Missing ⚠️
meshroom/core/nodeFactory.py 94.94% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2612      +/-   ##
===========================================
+ Coverage    69.85%   73.32%   +3.46%     
===========================================
  Files          120      125       +5     
  Lines         7067     7449     +382     
===========================================
+ Hits          4937     5462     +525     
+ Misses        2130     1987     -143     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants