Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize graph CRUD API for update/merge/union #182

Closed
dehann opened this issue Oct 23, 2019 · 36 comments
Closed

Standardize graph CRUD API for update/merge/union #182

dehann opened this issue Oct 23, 2019 · 36 comments

Comments

@dehann
Copy link
Member

dehann commented Oct 23, 2019

To capture discussion and decision: want to make sure we have consistent update/merge/union behavior. I think there was a renaming from update to merge at some point perhaps. Something to the effect of (and free to disagree):

  • update will update an existing key,
  • merge will update or add depending on the key,
  • union cannot update existing key but can add differing keys.

Strictly speaking union should return differing and both on collision, but we not ready to support all that functionality yet I think.

@Affie
Copy link
Member

Affie commented Oct 23, 2019

What is the scope for this? Does it include addVariable/factor. It would be nice to have consistency everywhere.

@dehann
Copy link
Member Author

dehann commented Oct 23, 2019

Personally, we should just make it work (i.e. use what we have) for DFG v0.5.0, but then we should think about

nice to have consistency everywhere

for DFG v0.6.0?

@Affie
Copy link
Member

Affie commented Oct 23, 2019

@Affie
Copy link
Member

Affie commented Oct 24, 2019

@dehann , please explain a bit more on the different solves solveKey (i think you called it super, sub solves). It Impacts this for me.

@GearsAD
Copy link
Collaborator

GearsAD commented Oct 24, 2019

EDIT DF, JT

Sideline: So there's a larger discussion about how to stop users from using DFG.addVariable and start using IIF.addVariable for graphs that pack etc. It's led to a bit of confusion, trying to think of a good way to encourage users to use the IIF functions, and actually hide the DFG functions. Just planting the seed that it will affect this discussion.

  • DF, below
  • JT,
    • addFactor!(dfg, factor) and addVariable!(dfg, variable) are hard to use.
    • So not reexporting it should be enough.
    • We should still make it easy for contributors.

Overall: These will be breaking changes as we would be changing signatures/behavior for addVariable/etc.. Let's do them for DFG 0.6.

  • DF, agree 👍
  • JT 👍

As a philosophy, I'd like us to represent the underlying datastructure with no magic - e.g. most of our graph represents a dictionary - so implement standard behavior like a dictionary so that it feels familiar to users. Think of it as making sure users feel at home.

  • DF, agree with that 👍
  • JT 👍

@dehann
Copy link
Member Author

dehann commented Oct 24, 2019

trying to think of a good way to encourage users to use the IIF functions,

  • easiest is to not use reexport in IIF.
    • JT 👍
  • would it work if there only abstract dispatch implementations in DFG, so when the user defines the graph type it forces specific functions.
    • Remember on the serialization design -- neither DFG or IIF will ever have access to all the packing and unpacking converters. For the sake of design assume there will always be converters in some user package, as well as Main and that is driven by a design requirement. This means we can't abstract away the problem, and hence we have things like DFG.loadDFG(file, Main, fg1).
  • another option might be to not export addVariable!/addFactor! from DFG at all, and only IIF, but that feels weird.

@dehann
Copy link
Member Author

dehann commented Oct 24, 2019

please explain a bit more on the different solves solveKey (i think you called it super, sub solves). It Impacts this for me.

solveKey is another open ended concept that allows the user to request all kinds of ad-hoc changes to a particular factor graph. Things like,

  • tweak this factor solve it all again,
  • collapse those two variables and solve again.
  • Change the measurement on this factor and solve again,
  • summarize that long empty odometry chain with a single factor and use as primary response to user.
  • please reconstruct at higher resolution using BigData elements and keep as secondary piggy back, and only re-solve this little fraction of the graph with secondary setup (image interpolating from IMU inertial odometry the accurate position of each camera frame location at 30Hz between two poses)
  • Human requests test solve on something somewhere,
  • request 3 parallel solves to validate they all get the same answer,
  • compare different solver versions over time on the same factor graph,
  • etc, etc, the list goes on

EDIT: xref https://www.juliarobotics.org/Caesar.jl/latest/faq/#What-is-supersolve?-1

@Affie
Copy link
Member

Affie commented Oct 24, 2019

Ok, Thanks, That cleared it up somewhat.
So I'd also like to treat it as a dictionary concept.

@Affie
Copy link
Member

Affie commented Oct 24, 2019

GearsAD and I had a bit of confusion on the layers and where CRUD operations happens, so lets also try and define it here (I'll try, but please edit it to be more correct):

  • DFGraph (Struct)
    • Variables (Dict) - add/update/get/delete Variable
      • var1 (struct)
        • Solver data (Dict) - solverData, setSolverData, solverDataDict
          • :default
          • :parallel_solves1
          • :parallel_solves2
        • PPE (Dict) - ?
          • :default
          • :parallel_solves1
          • :parallel_solves2
        • Small Data (Dict) - smallData
        • Big Data (DRAFT Dict) - addBigDataEntry,getBigDataEntry,updateBigDataEntry,deleteBigDataEntry,getBigDataEntries,getBigDataKeys
      • var2 (struct)
    • Factors (Dict) - add/update/get/delete factor
      • fac1 (struct)
        • data (struct) - solverData is this ever updated separately?
      • fac2 (struct)

So my Idea here is to get everyone on the same page of what a crud operation mean on each level.
eg. If you update a variable in variables it replaces the whole variable. If you add it adds to the Dict.
if you update a PPE entry it replaces the whole PPE entry.

EDIT: Updated a bit. What seems to stand out is that there are 2 kinds of operations:

  • Structures: replaced
  • Dictionaries: add, update, delete, get, etc.

@dehann
Copy link
Member Author

dehann commented Oct 25, 2019

Its CRUD vs Set-Theory... two terminologies for similar things -- as long as the outcome is sensible.

After reading your previous comment, I started wondering if its not a good idea to actually tabulate this in the data design wiki for the data fields down the rows and which data level across the columns. Maybe we can indicate the container as the association. I'll try an example here but its not that easy at first pass, and should move to the wiki if we going to do this:

Variable

Level0 Level1 Level2 Level3 e.g.
summaryData VariableSummary? x
solverData Dict Var.NodeData{T} x x :default, :parallel_1, ...
softtypename String x
PPE Dict <:Abst.PPEst. x x x :default, :parallel_1, ...
smallData Dict String x x
bigData Dict <: B.D.Element x
...

EDIT: Moved to wiki

@Affie
Copy link
Member

Affie commented Nov 27, 2019

Do we have an answer on this yet? I am starting to use other solve keys.

@Affie
Copy link
Member

Affie commented Nov 27, 2019

Does it make sense to only copy one key in the solver data dict? For example getVariable(fg, :x1, solvekey=:default) and then update it. Or do we copy the whole dict and only update one solvekey. Either way I think it should be split.

@GearsAD
Copy link
Collaborator

GearsAD commented Nov 28, 2019

Unil now I thought we should just copy the whole variable and then just refer to the independent solvekeys. No particular reason for it, except that we haven't explored this functionality much yet. We can do getVariable(..., solvekey=:something), I like that it would reduce the variable size.

As it stands now, get gives you all keys, and update should update all keys. Not clear which answer were you looking for in addition to that?

Can you clarify what you mean by split?

@Affie
Copy link
Member

Affie commented Nov 28, 2019

We can do getVariable(..., solvekey=:something), I like that it would reduce the variable size.

👍 I also like the idea of reducing the variable size. Say you have 100 super solves on the server and just want :default. It is not needed currently but I'ts good to keep in mind from the start if we plan on using it in that way. So my vote is for this.

As it stands now, get gives you all keys, and update should update all keys. Not clear which answer were you looking for in addition to that?

So its not something you are set for in the final design, just as it is now? On Variable level it is fine on the solver data dict is where I have the problem. I feel it defeats the purpose of the whole super solve structure.

Can you clarify what you mean by split?

Sorry if my terminology/English is wrong but i'll try and explain in concept. Don't get stuck on the names such as get/add/update/delete / CRUD / set theory. Its just for the concept.
Split the (make atomic separately) different solves.
So the effect it would have is that you would add an additional layer. eg on variable:

  1. If you get/add/update/delete the variable its going to overwrite everything, including all solver data keys. So the risk of overwriting something is up to you.
    • So if you getVariable(...) it gives you the whole variable with all the solves
    • If you updateVariable(...) it updates the variable overwriting everything, (again up to the user to risk)
  2. This is basically all the julia Dicts in variable. What I propose is we treat it as such.
    • So if you getVariable(..., solvekey=:something) it just gives you the variable with something
    • updateVariable(..., solvekey=:something) seems complicated so I don't know if it exists.
    • Consider DFGVariable immutable, so you can't get/add/update/delete the solver data dict.
      • You can only get/add/update/delete keys in it.
      • we can use get/add/update/delete (to be consistent with variable/factor)
        • DF, will think on this more for a bit.
    • and introduce set operations merge/union on the dicts

That brings us back to #166 updateVariableSolverData! (ignoring specific names, just follow logic)

  • The way you used it in the test, I feel does not exist.
  • mergeUpdateVariableSolverData! -> mergeVariableSolverData!
    • DF, if mergeUpdate was a stop-gap, I agree we should remove it (might have been my suggestion to help move us along and get to this decision point faster.)
  • get/add/update/deleteVariableSolverData!(dfg, variablekey, solvekey) would require a solvekey
    • DF, if no key is given then do it to all keys in the current DFG object (see in thread below)?
  • so similar to get/add/update/deleteVariable(dfg, variablekey) requiring a variablekey = label
    • DF, as above

#201 already places solveinprogress in the correct place for the above

#185 and comment in that already fits in with this

A first use case as motivation.
Consider :default and :parametric.
Lets start 2 solvers in parallel. - with one solveinprogress you can't start 2
:parametric finishes way quicker and wants to update variables' solver and estimates.

  • DF, see possible change to this below

So you can try and wait for :default to release the lock solveinprogress on the data.
but default will wait for :parametric. Deadlock

  • DF, potential deadlocks are bad, but dont limit CRUD if we could just have added a semaphore / mutex somewhere that resolves the problem.
  • JT edit: My point here is the solverkey=>data needs to be atomic. If you lock the variable you can't do parallel solves. If you update the whole variable it will overwrite something.

EDIT: Note currently it is addVariable and not setVariable.
So consistent naming would be get/add/update/delete. (changed above)

@GearsAD
Copy link
Collaborator

GearsAD commented Nov 29, 2019

Thanks for example - makes sense. Yes, we laid the foundation but it hasn't been used yet, so this is a good discussion for the solver-side of it. Let me look over the weekend 👍

@GearsAD
Copy link
Collaborator

GearsAD commented Dec 2, 2019

I like the idea of working on solve keys, because generally the solver would just want to work with a single key.

The only issue I can see with this is that when calling getVariable(..., solvekey=:something) we would need to copy the variable itself (so that we can take out the other solveDict entries). There would be two versions then - the original in the graph, and your copy. We can choose whether to copy the solver data itself, or pass the original to the copied variable.

  • If you're updating only solverDataDict entries and we choose to pass those by reference: Although the variable is a copy, the solver data would still be by reference, so if you're just updating solver data once you have retrieved it, nothing would change in IIF. Changes to the solver data would be directly in the graph. Otherwise, if we copy it too, then you would need to update the original version in the graph (with, say, updateVariable or some equivalent)
    • DF, IIF does not chance anything -- it only uses DFG. Think if IIF as an attempt to be completely stateless. We'll see how successful that is.
  • If you're adding to solverDataDict or updating parameters in the DFGVariable itself: In that case, because the DFGVariable is a copy, you would need to call updateVariable() to make the change in the graph. E.g. solvable, you would need to call setSolvable, then updateVariable() to persist the change to the graph.

@GearsAD
Copy link
Collaborator

GearsAD commented Dec 4, 2019

Thanks @Affie for good discussion today.

Takeaways from discussion:

  • We should move toward making variables and factors immutable structs
    • DF, (on second thought) this is good for a few reasons -- agree.
  • When you getVariable you get a shallow copy, then we can implement getVariable(fg, :x1, :parametric).
    • DF, clarify: 'shallow copy" means? DFGVariable (Level2) with empty solverDict -- i.e. no VND?
  • Should have crud operations for data in DFGVariable and DFGFactor, e.g. solverdatadict so you can update inner data etc.
  • solvable is the only field that may change selectively - we either move this down a level (VariableNodeData), or we create a variable-level struct for mutable fields (this may grow in time, so that may be a good idea to start down that road).
    • DF, i disagree on this (if i have the detail right). solvable is a graph level thing. Remember that super-solve is not meant to replace graph operations. solvable is an indication that this variable or factor is ready to go. Which solver ends up using it is a different issue. This is different from the solveInProgress value which should be a level lower (in VND) so that each solver compiling a solution into solverDict can indicate that its busy there.

Please add if I missed anything.

@Affie
Copy link
Member

Affie commented Dec 4, 2019

To summarize what we discussed:
Because this pattern seems to be applicable to other data and a possible performance increase with an immutable struct.:

We can consider using DFGVariable as immutable, ie. after you make it, it becomes a reference to the data it contains and you can change its fields safely. If you updateVariable it will overwrite it with a completely new one.

  • DF, sounds like that will work yes.

tags is the odd one out as it is just a set. So we might consider treating it as only a set with merge/union. addTags!(dfg, varlabel, tags) vs mergeTags!(dfg, varlabel, tags),

DFGVariable for convenience

mutable struct DFGVariable <: AbstractDFGVariable
    label::Symbol
    timestamp::DateTime
    tags::Vector{Symbol}
    estimateDict::Dict{Symbol, <: AbstractPointParametricEst}
    solverDataDict::Dict{Symbol, VariableNodeData}
    smallData::Dict{String, String}
    bigData::Dict{Symbol, AbstractBigDataEntry}
    solvable::Int
    _internalId::Int64
end

EDIT: On getVariable
Currently we already have 2 behaviors:

  • InMemoryDFGTypes -> pass by reference
  • other (CloudDFG) -> makes a copy

so what about getVariable(fg, :x1, :parametric)

  • InMemoryDFGTypes -> assert :parametric in keys, then pass variable by reference
  • other (CloudDFG) -> makes a copy of only :parametric
    • DF, as above -- agree. Still need to figure out how to actually do this I guess. It's an issue if the serialized (Packed) DFGVariable is one block, then its hard to break it up before the copy.
    • DF, captain obvious: we'd probably like to avoid the data transport / overhead compute load of first copying all elements in solverDict and then only end up leaving one. See similar comments below.

@Affie
Copy link
Member

Affie commented Dec 4, 2019

So the next step is to create the inner CRUD for solver data. Not to be confused with mergeUpdateVariableSolverData!
Would something like this be acceptable:

PR #231

Currently I only need an updateVariableSolverData! function, so haven't given much thought to the others.
mergeUpdateVariableSolverData! does not work if I have copies of all the solverkeys.

So usage would be:

key = :parametric
v = getVariable(cloudfg, :x1, key) # or getVariable(cloudfg, :x1) since the update will only be on one key
vnd = solverData(v::DFGVariable, key=key)
#do something with vnd
updateVariableSolverData!(cloudfg, vnd, :x1, key)

Does this sound correct. I don't know if it complicates things unnecessary.

For performance, bulk operations would be needed for the cloud.
updateVariableSolverData!(dfg::AbstractDFG, variablekeys::Vector{Symbol}, solvekey::Symbol=:default) would batch update all variablekeys. With julia we can broadcast.

EDIT: Edited a lot

but if no key case

  • DF
key = :parametric
v = getVariable(cloudfg, :x1)

should return all keys

@dehann
Copy link
Member Author

dehann commented Dec 4, 2019

[CRUD] would require a solvekey

How about if no key is given then it influences all keys in the current dfg object?

Lets start 2 solvers in parallel. - with one solveinprogress you can't start 2
:parametric finishes way quicker and wants to update variables' solver and estimates.

It should be possible (in the future) to start many of the same solvers, and just dictate a special solveKey, .e.g. :parametric_12. This is useful for Monte Carlo analysis etc.

@Affie, this will likely impact the decision on how to user solvers=[:default; :parametric]?

@Affie
Copy link
Member

Affie commented Dec 4, 2019

How about if no key is given then it influences all keys in the current dfg object?

If I understand you correctly, I don't think this is applicable if the solverDataDict is not used between different dfg objects (say cloud to graphDFG or subgraphDFG to graphDFG).
In either case I feel updateSolverData should only update :default,

  • DF, please clarify "it"?
  • Edited above

(or maybe if it only has one solvekey, that one)
If you are updating more than one you are working with sets, and merge/union kind of operations fits better. Like it currently is mergeUpdateSolverDataDict.

@Affie, this will likely impact the decision on how to user solvers=[:default; :parametric]?

Yes, that's why I looked at it again. If you are working in memory on :parametric, it doesn't make sense to deepcopy the variable and strip out :default or other solve keys. If you then mergeUpdateSolverDataDict it will overwrite the others.

@dehann
Copy link
Member Author

dehann commented Dec 4, 2019

If I understand you correctly, I don't think this is applicable if its not used between different dfg objects (say cloud to graphDFG or subgraphDFG to graphDFG).

If you ask to update a key that is not in a graph, then the new one should just be added to that DFG object?

@Affie
Copy link
Member

Affie commented Dec 4, 2019

I guess it should follow updateVariable behavior. If it does not exist it is created. Thats how setSolverData currently works also.

@Affie
Copy link
Member

Affie commented Dec 4, 2019

with the no solverkey case of getVariable should return all keys.

Yes, definitely. So no behavioral change in existing getVariable method.
getVariable(...,solvekey) might even be just a cloudDFG function.

  • DF, okay, got it thanks!

@dehann
Copy link
Member Author

dehann commented Dec 10, 2019

If many fields in Variable and Factor become immutable, then we would likely need a way to replace the entire object so that small changes can be made. In this discussion:
#230 (comment)
there is need to change the timestamp field in a DFGNode -- I strongly feel we need to support this case, and if for performance we make the DFGVariable/Factor objects immutable, we would need to have easy function that replaces (delete and insert from factor graph) the Node object with a new one that includes the change.

From the user perspective its something small, like:

setTimestamp!(getVariable(dfg, :x12), now())

but the code underneath would then actually do a delete and replace.

@dehann
Copy link
Member Author

dehann commented Dec 16, 2019

The weakest part at this point is not having definitions for standard API. So the direction now I believe should be that we define set theoretic graph terms like merge/union and update properly. These definitions go hand in hand with general API verb definitions.

@dehann
Copy link
Member Author

dehann commented Dec 16, 2019

#232 (comment)

xref #232

@GearsAD
Copy link
Collaborator

GearsAD commented Dec 16, 2019

I'm catching up on this and thanks it seems like it is snowballing. I'd like to nail this down a bit :) okay: standardization. Yes - sets looks good ✔️.

One thing, I'd like to push not to make the functions smarter than they need to be. Update a variable should update the complete variable - that is needed for general use. Understand if we need something special, but then let's introduce that as a new function - e.g. inner CRUD for solver data. That makes sense.

@dehann
Copy link
Member Author

dehann commented Dec 16, 2019

definitions of verbs and Nouns are the best way out I see at the moment -- proper ratification process is required and any concerns must be listed on the wiki. Going to take a few weeks to do that properly, then hopefully API is easier:
https://github.com/JuliaRobotics/DistributedFactorGraphs.jl/wiki/Standard-Definitions-of-API-via-verbs-and-set-theory

EDIT: problem is API standardization has become the venue for design decisions as well as word meaning. Definition ratificatin via the wiki is intended to resolve the latter first, since design is going to take longer.

@dehann
Copy link
Member Author

dehann commented Jan 13, 2020

could i invite commentary on verbNoun definitions please, most content already populated on the wiki.

@dehann
Copy link
Member Author

dehann commented Feb 2, 2020

EDIT: COMMENT MISTAKE, moved to #145 as suggested below.

@Affie
Copy link
Member

Affie commented Feb 2, 2020

I think its should be in #145

@dehann
Copy link
Member Author

dehann commented Feb 2, 2020

OOPS THANKS, copying there

@dehann
Copy link
Member Author

dehann commented Feb 2, 2020

FYI a large part of this has been captured in the new verbNoun definitions:
https://juliarobotics.org/Caesar.jl/latest/dev/wiki/

@Affie Affie modified the milestones: v0.5.x, v0.6.x Feb 7, 2020
@dehann dehann modified the milestones: v0.7.x, v0.8.0 Apr 27, 2020
@dehann
Copy link
Member Author

dehann commented May 3, 2020

I think we should be good on update/merge/union. This issue helped us identify the need for verbNoun definitions which is mostly complete and hence I think we can close this.

@dehann dehann closed this as completed May 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants