-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove dependency on localstack / refactor s3aggregator to parquet aggregator #652
Comments
@vringar @englehardt I recall from our conversations that the above description of the issue is not accurate and there's an additional piece of related to saving content. Can you update the above description and we can continue iterating the conversation? |
This seems like a duplicate of #618? We should close that one or this one (this has more context). And yes, see #390 for context. Right now we have:
it seems like we want an option to independently choose how to save the structured data (e.g., parquet / sql) from the unstructured data (e.g., leveldb when local and something else when remote). |
+100 |
Move from leveldb to rocksdb and use https://github.com/rockset/rocksdb-cloud to push to aws, google cloud etc? |
I don't love this suggestion because I think the path to windows is easier with leveldb. |
One big thing I'd want to be mindful of is the discussion around caching and completion callbacks. |
It's come up in a few issues so gathering the core issue here.
Related: #614, #628
Localstack has caused us a number of headaches:
It's purpose is to mock out s3, but that complexity is driven from the fact that the s3aggregator does two things for structured data:
In addition to that it saves out individual content files to s3.
We can separate these two functions allowing for openwpm to generate parquet and then to push it to a variety of locations: s3, gcs, or just local.
The text was updated successfully, but these errors were encountered: