Skip to content
Blake Walsh edited this page Jul 8, 2021 · 1 revision

Fundamentally this would be a feature to synchronize the filesystem with ArangoDB, likely using git hooks. If transformations need to be performed to the files for the sake of running queries in AQL, those transformations should be made using AQL.

In order to represent the file system I propose a collection of simple documents such as:

{ "relative_path": "root/pli/ms/sutta/dn/dn1_root-pli-ms.json", "mtime": 1234567890 "size": 12345, "contents: { ... } }

The files collection may not be ideal for indexing, for example in our case we have key:value pairs which are very confusing for a typical search index and the contents field really shouldn't be indexed in most cases.

So therefore after adding a file, an INSERT (by query) AQL query can be run to make a new collection:

FOR doc_key IN @added_docs
  doc = DOCUMENT(doc_key)
  FOR uid IN doc
    string = doc[uid]
    _key = CONCAT(doc_key, '_', uid)
    INSERT {uid, string, _key, _file: doc.relative_path} INTO strings
  /* If you don't want to store the contents you can null it out 
  UPDATE {_key: doc_key} WITH {contents: null} IN files
  */

Alternatively you could transform the contents on the same document:

  LET long_string = CONCAT_SEPARATOR('\n', VALUES(doc.contents))
  UPDATE {_key: doc_key} WITH {long_string} IN files

The arango-git-sync utility will delete any documents it created, but if you created documents then a DELETE AQL query can be run to remove them:

FOR doc IN strings
  FILTER doc._file IN @deleted_files
  DELETE doc

A rename would likely be expressed as a DELETE followed by an INSERT.

The synchronization program configuration would be able to take the INSERT and DELETE queries.

config.yaml

arango:
  endpoints: http://localhost:8529
  username: test
  password: test
 
rules:
  -
    arango_collection: files
    paths: [root/.*\.json, translation/.*\.json]
    blacklist: []
    INSERT_AQL: |
      FOR doc_key IN @added_doc_keys
        doc = DOCUMENT(doc_key)
        FOR uid IN doc
          string = doc[uid]
          _key = CONCAT(doc_key, '_', uid)
          INSERT {uid, string, _key, _file: doc.relative_path} INTO strings
        /* If you don't want to store the contents you can null it out 
        UPDATE {_key: doc_key} WITH {contents: null} IN files
        */
    DELETE_AQL: |
      FOR doc IN strings
        FILTER doc._file IN @deleted_files
        DELETE doc
Clone this wiki locally