Skip to content
This repository was archived by the owner on Dec 29, 2021. It is now read-only.

Merkle Tuesday #29

Closed
mafintosh opened this issue Oct 8, 2015 · 13 comments
Closed

Merkle Tuesday #29

mafintosh opened this issue Oct 8, 2015 · 13 comments

Comments

@mafintosh
Copy link

Lets do a community call on strategies for replicating merkle dags

YouTube link: http://youtu.be/DPiOdh8CoS0
Gitter link (chat/ask questions): https://gitter.im/datproject/discussions
Etherpad: https://etherpad.wikimedia.org/p/merkle-tuesday

Theme: Strategies for replicating merkle graphs
Where: Google Hangouts on Air
When: Tuesday, October 13th
Time: 1pm EST (17:00 GMT)

We are doing our 1.0 iteration of our merkle graph module, mafintosh/dat-graph and in the last couple of days we've actively been discussing the various ways of replicating merkle dags and generic hash sets on irc. dat-graph implementation is almost done so some community feedback would be appreciated.

Merkle DAG research so far: #28

Agenda:

  • Quick introdution to Merkle DAGs
  • Our requirements and the trade-offs we're willing to make
  • Discussion of our current dat-graph replication strategy using local graph path enumeration
  • Discussion of merkle tree / trie strategies
@okdistribute okdistribute changed the title Merkle Monday Tuerkle Tuesday Oct 9, 2015
@max-mapper max-mapper changed the title Tuerkle Tuesday Merkle Tuesday Oct 9, 2015
@jbenet
Copy link
Contributor

jbenet commented Oct 12, 2015

@maxogden got the msg -- this sounds fun i think i can do it.

I think @diasdavid may want to come too.

See also ipfs/notes#12 -- this is WIP.

@daviddias
Copy link

awesome idea 👍

@daviddias
Copy link

@RichardLitt
Copy link
Contributor

Added this to the IPFS community calendar, too.

@mikolalysenko
Copy link

Will try to make it, currently at a conference in SLC and have to give a talk right after this. But I will try to sit in!

@pfrazee
Copy link

pfrazee commented Oct 12, 2015

Will be in transit
/cc @dominictarr

@max-mapper
Copy link
Contributor

YouTube link is here https://www.youtube.com/watch?v=DPiOdh8CoS0&feature=youtu.be, starts in 44 minutes

@jbenet
Copy link
Contributor

jbenet commented Oct 13, 2015

This was great! 👍

@mikolalysenko
Copy link

Thanks for inviting me!

@max-mapper
Copy link
Contributor

Anyone who missed it can watch the youtube: https://www.youtube.com/watch?v=DPiOdh8CoS0&feature=youtu.be

or check out the notes from the call:

 __   __  _______  ______    ___   _  ___      _______                                                                                             
|  |_|  ||       ||    _ |  |   | | ||   |    |       |                                                                                            
|       ||    ___||   | ||  |   |_| ||   |    |    ___|                                                                                            
|       ||   |___ |   |_||_ |      _||   |    |   |___                                                                                             
|       ||    ___||    __  ||     |_ |   |___ |    ___|                                                                                            
| ||_|| ||   |___ |   |  | ||    _  ||       ||   |___           https://gitter.im/datproject/discussions                                          
|_|   |_||_______||___|  |_||___| |_||_______||_______|                                                                                            
 _______  __   __  _______  _______  ______   _______  __   __   https://github.com/datproject/discussions/issues/29                               
|       ||  | |  ||       ||       ||      | |   _   ||  | |  |                                                                                    
|_     _||  | |  ||    ___||  _____||  _    ||  |_|  ||  |_|  |                                                                                    
  |   |  |  |_|  ||   |___ | |_____ | | |   ||       ||       |                                                                                    
  |   |  |       ||    ___||_____  || |_|   ||       ||_     _|                                                                                    
  |   |  |       ||   |___  _____| ||       ||   _   |  |   |                                                                                      
  |___|  |_______||_______||_______||______| |__| |__|  |___|                                                                                      

      a         a = hash(b + d)
     / \        b = hash(c)
    b   d       d = hash(e)
   /     \      
  c       e

    a         a = hash(hash(ant.jpg) + b + c)
   / \        b = hash(bunny.jpg)
  b   c       c = hash(cat.jpg)

  a         a = hash(hash(ant.jpg) + b + c)
  |\        b = hash(bunny.jpg + d)
  b c       c = hash(cat.jpg + d)
  |/        d = hash(dog.jpg)
  d


    x    y
  /   \ /
 a     e
 |    / \
 b   d   f
  \ /    |
   c     g
    \   /  
      j
      |
      k
      |
      l

x a b c j k l <--
y e d
f g

c
mad science from mikola:
- spanning tree + preferred path decomposition
- disjointness problem:
    find if A + B have no intersections (are they disjoint)
    its impossible to find this out in less than O(n)
    bad news: you need O(n)
    good news: dumb solution is probably optimal

- merkle dags are functional data structures
  - any functional data structure can be converted to a merkle dag by replacing pointers with hashes (merkleize)
  - functional data structure theory can be applied here

- http://cbor.io/
https://tools.ietf.org/html/rfc7049
https://github.com/diasdavid/node-ipld
- https://github.com/ipfs/go-ipld/issues

- use cases that merkle dags provide a trust framework for:
- replicate entire graph (e.g. dat where you want a complete dataset replica)
- query specific parts of the graph (e.g. ipfs where graph spans multiple galaxies, query flooding)
- the network runs the queries itself (e.g. ethereum where you have turing complete bytecode running in distributed VMs)
- replication of graphs
  1. mafintosh style preferred path replication
  2. decompose graph into sets of nodes and try to sync nodes
    - put node hashes into a trie
    - walk the trie to sync
    - similar to ethereum patricia merkle tree https://github.com/ethereum/wiki/wiki/Patricia-Tree
    - need to tune radix of the trie (because hashes are large)

"verified computing"

shared goals:
    - low level graph comparison api
      - ask if someone has nodes
      - ask for data for nodes
      - ask if someone has nodes that match a graph selection path (jbenet graph query syntax)
    - syntax for graph pathing
      - xpath might be too complicated
      - main use case is e.g. 'a..b' (all nodes between a and b)
      - another one would be e.g. /*/**/[type="file"]
    - data structures that can enable the above APIs (preferred path indexing, node trie indexing)
    - backwards compat on the public APIs
    - incremental friendly indexing & replication for data structures
    - supporting efficient streaming by selective walking of inner & intermediate nodes
    - uncle hashes & munro hashes may help this 

 agree - can think of this as stacked layers GET/SET/HAVE simple functions
 - add additional query layers on top
 - protocol should have simple extensibility
 - viz https://tools.ietf.org/html/rfc5754
 - consider whether the low-level query layer needs to know about graphs or not

 - collect any names / urls of reseearchers and projects (maybe on wiki) to contact
 - I mentioned ICNRG research group from IETF https://irtf.org/icnrg 
 - binmaps www.pds.ewi.tudelft.nl/fileadmin/pds/reports/2011/PDS-2011-005.pdf & https://github.com/gritzko/binmap & http://scattered-thoughts.net/blog/2012/01/03/binmaps-compressed-bitmaps/
 - need solutions / optimisation choices for different retrieval / replication requirements

Where can people leave feedback? 

- i need to index some stuff
- i need to query some stuff
- i have a content addressable store graph
- maybe a REST-like api for the graph would make sense? then you can layer on indexing and querying

- possible inspiration https://en.wikipedia.org/wiki/Border_Gateway_Protocol
- communication complexity notes: http://theory.stanford.edu/~tim/w15/l/w15.pdf
 {value: 'some new data', link: ['deafbeaf'], key: 'aaaabbbb'} -> {value: 'some data', author: 'mathias', key: 'deafbeaf'}

 bob         alice

 a            a         
 🔼           🔼
 b            b
 🔼           🔼
 c            c
              🔼
              d

scenario 1:
bob asks alice -> do you have any of [a,b,c]?
alice reponds -> [0,1,2]
bob asks -> give me everything from [2, infinity]

scenario 2:
bob asks alice -> do you have any of [c]?
alice reponds -> []
bob asks alice -> do you have any of [a]?
alice reponds -> [0]
bob asks alice -> do you have any of [b]?
alice reponds -> [0]
now bob knows alice has a and b but not c



------------------------

dag path notation https://github.com/ipfs/notes/issues/12

- traversals (or selections) on a graph
- XPath

<hash>/*/**/[type="file"]


<hash>/*
<hash>/*/*/*
<hash>/a/b/c/d/e/*



<hash1>..<hash2>

wrt to erlang, data structures are not replicated in any smart sense - just converted to "flat" binary structure. (dch) see http://www.erlang.org/doc/man/erlang.html http://www.erlang.org/doc/apps/erts/erl_ext_dist.html

@max-mapper
Copy link
Contributor

@mikolalysenko
Copy link

Okasaki's thesis is a good reference on functional data structures: http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

The main trick in these things is path copying (aka structural sharing) to reduce memory. Lots of data structures are easy to functionalize, like red-black trees, kd-trees, tries, etc.

@kumavis
Copy link

kumavis commented Oct 27, 2015

here's null_radix's merkle-patricia-tree implementation

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants