-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom datasets #393
Comments
Hey Christopher! Thanks for your super detailed post. 1) Data VizI actually think the ability to have extensions for custom visualization would be super interesting - and useful in a bunch of industries. What sort of libraries do you typically use to visualize? Our front end is written in JS and React. We've been trying to think how we could make renderers more configurable based on the data types on the backend. 2) File types and diffThat's interesting, we sniff the first few bytes of files to detect if they are UTF-8 or not so maybe some of those files are false positives. Custom diffs would be interesting too, happy to chat more on that. 3) Generic URLsGlad you found the download URL (let us know if it could be documented better). We are also working direct S3 integrations if that is interesting as well. 4) Oxen CratesCurrently the lib and CLI crates shouldn't depend on Actix, I'd have to double check the Cargo.toml files but there's a separate one per module. I actually think it would be awesome if the |
Hey Greg,
Cheers |
Hi there,
I just learned about Oxen, and it looks very promising. I have a bit of an unusual use case and I'm trying to figure out whether Oxen could be the appropriate solution here.
I deal with planetary ephemeris files which store, in binary format, the trajectory of planets and spacecraft for possibly hundreds of years. This format was originally created by JPL in the 80s -- the specs are here: https://naif.jpl.nasa.gov/pub/naif/toolkit_docs/C/req/daf.html . The main library that reads these files is by NASA itself (through the NAIF division) and is called SPICE ... but I've rewritten it in full in Rust (because the original code is FORTRAN transliterated in C and absolutely not thread safe). This rewrite is called ANISE -- https://github.com/nyx-space/anise.
Every year, NASA releases a new and improved prediction of where the planets will be in the future -- https://naif.jpl.nasa.gov/pub/naif/generic_kernels/spk/planets/. Every day, NASA also releases the Earth orientation parameters, which specify how the Earth is actually aligned with respect to the stars (we can't predict it super well crazy enough) -- https://naif.jpl.nasa.gov/pub/naif/generic_kernels/pck/ (specifically the earth_latest_high_prec.bpc file). These files aren't typically big (ranging from single digit MB to typically ~100s MB).
In spacecraft operations, we need to ensure that the whole team of flight dynamics engineer use either the latest data (for some computations), or a specific agreed-upon version of these data. The way I've solved this in ANISE is by having a "MetaFile" structure which is pretty simple and stores the URL to the file and optionally its CRC32, so that it can be redownloaded if the CRC is unspecified or if the CRC does not match the local copy (config file example: https://github.com/nyx-space/anise/blob/master/data/latest.dhall ; basic docs: https://docs.rs/anise/latest/anise/almanac/metaload/struct.MetaFile.html ). Another related use case is that we need to publish new datasets, namely a new ephemeris file whenever we compute a new trajectory. In this case, we have consumers of this data in other teams who need to be sure that they're using the latest version of all our data prior to whatever work they're up to. At the moment, we use Kedro to organize our workflows but also to version our data on AWS S3. It works perfectly fine, but Oxen's visualization of datasets is very appealing (especially as most people in this industry are not tech savvy).
In other words, versioning of datasets is crucial. Oxen solves the versioning of datasets and provides visualizations. But Oxen doesn't support NASA's DAF format (and it's such a niche case that it probably should not). Hence my questions:
Is it possible to upload arbitrary blobs of data on Oxen?(This already works!) If so, I could use ANISE to build a "companion" delivery for new ephemeris files that we deliver and users could diff the companion version. Interestingly, Oxen tries to parse this as text.Is it possible to download these datasets with a generic URLs that include the version, e.g. similar to how AWS S3 can have a unique link? In my experience, it's hard enough to convince the IT teams in my industry to install updates to Python, so I can't imagine the trials and tribulations to convince them to install a new binary on operational machines, but if all that's needed is curl/wget with a token, that would be fine (especially if it runs a local deployment of Oxen (which you could/should charge for in my view)).(This already works too!)That's all my questions for now, and again, this is a very exciting project, so I'll be keeping a close eye on it regardless.
Thanks
The text was updated successfully, but these errors were encountered: