Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION/ENH] Java / VelocyPack <> Apache Arrow / Graphistry #77

Open
lmeyerov opened this issue Oct 29, 2020 · 7 comments
Open

[QUESTION/ENH] Java / VelocyPack <> Apache Arrow / Graphistry #77

lmeyerov opened this issue Oct 29, 2020 · 7 comments

Comments

@lmeyerov
Copy link

The Graphistry team is starting to get requests from Arango db users to help grow their Arango implementations + use cases, and we're wondering if there is any guidance for getting Arango to interop with the broader Apache / Python / etc. data community? Ideally, via parquet/orc (cold) or even better, apache arrow (in-memory / streaming / etc.)?

Most immediately, we're working with one team where the goal is Arango<>their Java app<>Graphistry.

  • V0: The no-thought solution is doing Arango--[json/csv]-->java--[json/csv]-->graphistry, but that means big transfers, losing existing type data, etc. On the plus side, when the customer does know the result column schema, they can send that as part of the graphistry ingest step.

  • V1: To do better, we're thinking Arango---[velocypack]-->Java app--[manually constructed arrow or orc typed columnar format for node+edge property tables]-->Graphistry. Though we're unsure what such a conversion looks like, e.g., any sample VelocyPack code, and especially wrt taking type/serialization wrangling pain away from Arango users by doing automated conversions.

  • V2: Longer term, we're thinking direct Arango--[velocypack stream]-->graphistry REST API--[velocypack stream chunk to arrow conversion]-->graphistry internal. Or better, Arango--[apache arrow/parquet/orc]-->Graphistry, if on the roadmap. In both cases, no type wrangling etc. for users.

Any pointers would be appreciated. As simplifying constraints, users can get a lot of mileage by limiting the initial scope to node/edge queries that return primitively typed columns (string/int/date/etc.). Long-term, for fancier nested types (json, ...), Arrow etc. ecosystem do support an increasing variety.

Thanks!

@lmeyerov
Copy link
Author

cc @jsteemann as you seem to be the main contact for this :)

Helpful links:

@jsteemann
Copy link
Contributor

@lmeyerov : Hi, thanks for getting in touch. Let me check who will be the contact on our side. It will not necessarily be me. Need to check it internally first. Will get back once I have more info!

@lmeyerov
Copy link
Author

lmeyerov commented Nov 2, 2020

Thanks @jsteemann !

If it helps, we're ultimately interested in a few integration points:

-- converting arango query responses into arrow-typed record or arrow-typed node+edge property tables, e.g., https://github.com/graphistry/pygraphistry/blob/master/demos/demos_databases_apis/arango/arango_tutorial.ipynb except with types
-- dispatching 'search' queries (text, pattern, ...)
-- dispatching 'pivot' / 'expand' queries (set of IDs , potentially a pattern expression -> result graph)
-- schema fetch query, ideally also into a subgraph
-- any other graph-y queries, such as all paths between 2 points

We'd love to help the Java-using arangodb team be successful now, and are gearing up for a public native arango connector in q1 :)

@grepler
Copy link

grepler commented Aug 15, 2022

@lmeyerov did you ever complete your native Graphistry<->ArangoDB connector?

@lmeyerov
Copy link
Author

lmeyerov commented Aug 16, 2022

Hi @grepler we have arangodb<>graphistry users combining via pydata envs like jupyter notebooks & streamlit dashboards, via our respective JS APIs, and I'm unsure with our REST API

no-code/low-code (so no python/js/...) is a longer story. we're starting to do more customer-funded projects around roadmap items, so def something we're watching out for. if relevant, happy to chat!

@grepler
Copy link

grepler commented Aug 16, 2022

Thanks @lmeyerov, I'll keep experimenting - bi-directional exploration & tagging interaction with the graph model would be amazing, but I will see if I can get by with one-way visualization of our AQL graph for the time being.
We're still in early internal tool development on our end, ArangoDB has some unique functionality and we really like the AQL language for it's flexibility, but the third-party tooling ecosystem is still very early days it seems.

+1 for more ArangoDB tooling adoption! Will keep your offer in mind as we continue our testing.

@lmeyerov
Copy link
Author

lmeyerov commented Aug 16, 2022

Great, lmk. Likewise, on the visual side, feel free to shout in our community slack.

RE:bidirectional, a relevant feature request we've heard is exposing custom action buttons in our UI, so when embedding, you turn custom tag etc calls into an action like tagging a node in the DB . (Related, we're actively working on in-tool "grouping", such as for selecting nodes and saving as a tagged group, and "visual search", where analysts can build up pattern searches without writing cypher/aql/etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants