F3( Vortex ) - Potential Storage Format #113

jonrmayer · 2025-10-05T06:45:13Z

jonrmayer
Oct 5, 2025

This is more to say 'hello' - I am sure that you already have some great ideas for storage formats.

First of all, I want to say that this is such an intriguing project/vision - I think it has serious potential.

I came across this project last week when I was exploring SPARQL ( ignorance ) and looking around for a Rust/Arrow/DF implementation ( beginner ).

I have been aware of Vortex as an alternative to Parquet for a little while now and having just finished reading F3: The Open-Source Data File Format for the Future, I realise that Vortex might be the first to implement F3 capabilities and might be worth exploring.

Regards,
Jonathan

tobixdev · 2025-10-05T11:35:59Z

tobixdev
Oct 5, 2025
Maintainer

Thank you for reaching out, your motivating words, and your ideas!

To be honest, I don't have any detailed plans on how we're gonna do persistent storage, so you're idea definitely comes at the right time :).

Parquet-based storage mentioned in #4 was just the one that I've encountered often and DataFusion has already great support. But there is also a Vortex DataFusion integration that we could also use.

From my naïve current standpoint, I think RDF Fusion's storage format should be

Open, I don't think specing and implementing a custom format makes sense for this project
The should be existing support for reading and writing files in the Arrow/DataFusion ecosystem so we have battle-tested and performant implementations that we can rely upon
Ideally, the writer implementation would allow us to easily integrate features like transactions and sorting.

I have not researched enough how well Parquet and Vortex cater to these needs but I think that 1. and 2. are fulfilled by both of them. I am not that sure about 3. yet, hopefully I can find the time to research that.

If we can integrate new file formats with a relatively low amount of efforts (re-using existing implementations) I think these two options are not mutually exclusive. RDF Fusion aims to also be a platform that allows for prototyping new ideas and doing further research. Basically, we hope to enable this by making DataFusion's extensibility more accessible to the SPARQL / Semantic Web community. I think evaluating the suitability of open file formats for SPARQL processing would be a cool thing to do (maybe there is already a comparison out there). :)

I'll link this discussion in the relevant ticket and copy some of my thoughts in there. Feel free to add your thoughts there! I've also added the paper to my reading list. I am a big fan of Wasm so that alone makes it interesting for me.

I came across this project last week when I was exploring SPARQL ( ignorance ) and looking around for a Rust/Arrow/DF implementation ( beginner ).

Have you by chance read the documentation of the rdf-fusion crate? If you have any further input on how I can make it more accessible, please don't hesitate to just open an issue! My goal was that people having basic knowledge about relational databases get enough information on SPARQL / DataFusion to get an idea what this project is.

0 replies

jonrmayer · 2025-10-05T12:16:32Z

jonrmayer
Oct 5, 2025
Author

I had to edit the title - I mixed up Vortex and Velox - too many similar names out there ;-)

I think this is a great idea and a step in the right direction.

I read and was deeply influenced by The Composable Data Management System Manifesto last year. A more accessible version would be The Composable Codex.

I think that the SPARQL community and the documentation might benefit from some highlighted sections from this as more of an explainer for the Rust/Arrow/Datafusion design choice.

Regarding SPARQL-side documentation, I still don't know what I don't know and am still very ignorant. What you had written was enlightening!

I am looking forward to reading your paper when it is published - "RDF Fusion: An Extensible SPARQL Engine for Hybrid Data Models" - found that reference online but not sure that is correct.

I have a Geospatial background - hence the interest in high performance engines.

Will definitely be following the project.

1 reply

tobixdev Oct 5, 2025
Maintainer

I think that the SPARQL community and the documentation might benefit from some highlighted sections from this as more of an explainer for the Rust/Arrow/Datafusion design choice.

That's a good point. I'll try to include that!

I read and was deeply influenced by The Composable Data Management System Manifesto last year. A more accessible version would be The Composable Codex.

These are great resources. I wasn't aware of the website. Thanks!

I am looking forward to reading your paper when it is published - "RDF Fusion: An Extensible SPARQL Engine for Hybrid Data Models" - found that reference online but not sure that is correct.

Yes, its currently in review. Fingers crossed 🤞 . If you want, I can provide you with an author's version if you send me your e-mail.

I have a Geospatial background - hence the interest in high performance engines.

Nice! One of my goals is to make the make it possible to plug in your own encodings for certain RDF literals. I think geospatial data would be a great use case for demonstrating this (e.g., would allow you to use existing Arrow libraries for geo data types). GeoSPARQL might be of interest to you and would be a great use case for demonstrating this extension point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

F3( Vortex ) - Potential Storage Format #113

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

F3( Vortex ) - Potential Storage Format #113

Uh oh!

jonrmayer Oct 5, 2025

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

tobixdev Oct 5, 2025 Maintainer

Uh oh!

jonrmayer Oct 5, 2025 Author

Uh oh!

tobixdev Oct 5, 2025 Maintainer

jonrmayer
Oct 5, 2025

Replies: 2 comments 1 reply

tobixdev
Oct 5, 2025
Maintainer

jonrmayer
Oct 5, 2025
Author

tobixdev Oct 5, 2025
Maintainer