Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Added project overview. #89

Merged
merged 5 commits into from
Jun 30, 2022
Merged

Added project overview. #89

merged 5 commits into from
Jun 30, 2022

Conversation

ryzhyk
Copy link
Collaborator

@ryzhyk ryzhyk commented Jun 26, 2022

I want to create a crates.io release to make it easy for people to try DBSP. In preparation for that I added a project description to the README. Once the crate has been published, I will add links to crates.io and docs.rs.

@ryzhyk ryzhyk requested review from gz, mihaibudiu and Kixiron June 26, 2022 00:53
@codecov
Copy link

codecov bot commented Jun 26, 2022

Codecov Report

Merging #89 (6741fab) into main (bf56f56) will increase coverage by 0.05%.
The diff coverage is 65.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #89      +/-   ##
==========================================
+ Coverage   81.39%   81.45%   +0.05%     
==========================================
  Files          63       63              
  Lines       10273    10266       -7     
==========================================
  Hits         8362     8362              
+ Misses       1911     1904       -7     
Impacted Files Coverage Δ
src/monitor/mod.rs 71.24% <ø> (ø)
src/operator/filter.rs 0.00% <0.00%> (ø)
src/trace/cursor/cursor_pair.rs 0.00% <0.00%> (ø)
src/trace/layers/mod.rs 43.37% <16.66%> (+1.02%) ⬆️
src/trace/ord/key_batch.rs 80.63% <43.75%> (-3.32%) ⬇️
src/operator/map.rs 76.50% <46.66%> (ø)
src/trace/cursor/mod.rs 56.75% <55.00%> (+4.25%) ⬆️
src/trace/ord/val_batch.rs 83.00% <58.97%> (-1.34%) ⬇️
src/trace/layers/ordered_leaf.rs 71.30% <61.29%> (+2.13%) ⬆️
src/trace/ord/zset_batch.rs 74.21% <63.63%> (-3.39%) ⬇️
... and 20 more

@Kixiron
Copy link
Contributor

Kixiron commented Jun 26, 2022

I actually just encountered a funky error in the wild with cargo trying to fetch the spec submodule but me not having proper creds to get it, leaving me unable to use the repo. To fix that I think we should add exclude = ["doc/spec"] to our Cargo.toml to prevent publishing that as part of our crate.

Additionally, our exports really need to be refined. When opening the crate with cargo doc --open it's really really chaotic and hard to figure out what's relevant. We've got macros that maybe shouldn't be exported, no docs, none of the entry-points into the crate are exported (Root, Circuit, etc.) and it's just generally hard to figure out what's going on
image

I'd also like to ditch our timely/abomonation dependencies before we publish because otherwise we'll set off every dependency scan in the galaxy because of how cursed abomonation is.

We're also missing a lotta docs on a lotta very critical things like .index(), .join(), etc. (the docs on .join_trace() are also incomplete, they end with by first assembling traces of both streams: with nothing after it)

A bunch of names are also fairly unintuitive, .delta0() makes more sense as .import() or something.

We also just need more examples, without concrete examples or an understanding of the underlying theory there's almost no discernible difference when looking at .join() vs. .join_incremental(). I still think we need different flavors of scopes to make it easier to use incremental vs. non-incremental operators so that you can only use incremental operators within incremental scopes.

We can also probably write a good, long doc piece in our readme and then use #![doc(include="../README.md")] at the top of our lib.rs to make those part of our crate docs. We should also probably add this so we get even better docs in docs.rs

// lib.rs
#![cfg_attr(doc, doc_cfg)]
# Cargo.toml
[package.metadata.docs.rs]
all-features = true

@gz
Copy link
Contributor

gz commented Jun 27, 2022

It would be nice to have a tutorial with some more explanations aside from the spec PDF. Reading this it seems to be fairly complex? I don't think I could write a program with this from just the docs+spec.

Having a tutorial with a simple program that's incrementally (pun intended) built and explained would be nice as you add complexity, e.g., start with a join/filter/smth else and make it more complex as you explain things...

@gz
Copy link
Contributor

gz commented Jun 27, 2022

OTOH, if this is just about releasing this on crates.io to reserve the name that seems reasonable too.

README.md Outdated
1. The complete set of **relational operators**: select, project, join,
aggregate, etc.

1. **Recursion**: Recursive queries allow for instance expressing graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation goes right into example (graph queries) rather than giving a general explanation like you do for {window, per-record} operators.

@ryzhyk
Copy link
Collaborator Author

ryzhyk commented Jun 27, 2022

The PR is only for the readme. The public API is currently non existent. I'll work on it next, but there are some challenges there, it's not just a matter of cleaning the docs. I don't think this should stop us from releasing on crates.io, but if people feel strongly about it, we can do it later.

@Kixiron
Copy link
Contributor

Kixiron commented Jun 27, 2022

Gerd does bring up a good point, I didn't think about reserving the crate name

Removed information about Rust tooling from the README: it is not
specific to DBSP, is a matter of taste, and doesn't help people to get
started with the project.
@ryzhyk
Copy link
Collaborator Author

ryzhyk commented Jun 28, 2022

I actually just encountered a funky error in the wild with cargo trying to fetch the spec submodule but me not having proper creds to get it, leaving me unable to use the repo. To fix that I think we should add exclude = ["doc/spec"] to our Cargo.toml to prevent publishing that as part of our crate.

Seems like exclude won't do it: rust-lang/cargo#4247 (comment)

@ryzhyk
Copy link
Collaborator Author

ryzhyk commented Jun 28, 2022

I actually just encountered a funky error in the wild with cargo trying to fetch the spec submodule but me not having proper creds to get it, leaving me unable to use the repo. To fix that I think we should add exclude = ["doc/spec"] to our Cargo.toml to prevent publishing that as part of our crate.

Seems like exclude won't do it: rust-lang/cargo#4247 (comment)

@mbudiu-vmw , is there a way the paper could live in a public repo?

@mihaibudiu
Copy link

The pdf is already in this repo.
Apparently overleaf cannot make the underlying repository public, so we have to copy the sources.

@ryzhyk
Copy link
Collaborator Author

ryzhyk commented Jun 28, 2022

The pdf is already in this repo. Apparently overleaf cannot make the underlying repository public, so we have to copy the sources.

I think there is a way overleaf can work with an existing git repo. But I'm also ok with removing the submodule and just keeping the PDF, which is the important part. Are you ok with that?

@mihaibudiu
Copy link

I just made some edits to the sources. We should have the tex sources too.

@mihaibudiu
Copy link

If you can send Val instructions on how to make the repo public perhaps he can do it.

@ryzhyk
Copy link
Collaborator Author

ryzhyk commented Jun 28, 2022

If you can send Val instructions on how to make the repo public perhaps he can do it.

I don't know how to do it. More importantly, we don't want to depend on Val's repo being available. If he changes the visibility or deletes the repo in the future (or overleaf deletes or closes it automatically for whatever reason), it will break the crate.

@mihaibudiu
Copy link

Then we should just copy the sources here as well. We need a backup anyway.


Ideally this code should run fine in Linux, MacOs, and Windows.
The code is written in Rust. Here are some tools we found useful for development:
Computing over streaming data is hard. Streaming computations operate over

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the word "incremental" does not appear here at all.

README.md Outdated

## Set-up git hooks
1. **Per-record operators** that parse, validate, filter, transform data streams

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the notion of "record" is not defined.
I would first introduce the notion of "transaction".
Once we merge this I will add some diagrams.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that most people know what records and tables are.

README.md Outdated
1. **Per-record operators** that parse, validate, filter, transform data streams
one record at a time.

1. **Windowing operators** that group time series data into time windows,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frankly there isn't anything about time in these records. So I would not call them "time-series".
Time-series implies that one field of the record is a timestamp.

README.md Outdated Show resolved Hide resolved
README.md Outdated
1. The complete set of **relational operators**: select, project, join,
aggregate, etc.

1. **Recursion**: Recursive queries express iterative computations, e.g.,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the word "incrementally" appears here for the first time.
Perhaps fixpoint is a more accurate description, with the statement that fixpoint computations can be used to implement recursive queries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Recursion" is a more familiar term to database folks than "fixed point". The goal here is not to write a formal design document, precisely specifying each notion, but to give an idea of the capabilities we are implementing.

README.md Show resolved Hide resolved
- **Change data** represents updates (insertions, deletions, modifications) to
some state modeled as a table of records.

In DBSP, a time series is just a table where records are only ever added and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"table" appears here for the first time.

README.md Outdated
time window queries are updated on the fly as new inputs become available. This
means that DBSP can work with arbitrarily large windows as long as they fit
within available storage. All other operators listed above apply to both time
series and change data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I could take a stab at improving this documentation, after all I have already tons of slides I could steal material from. But I understand that this text should be more developer-oriented.

Apparently, submodules don't work well with `cargo`:
- #89 (comment)
- #89 (comment)

We will add a copy of the latex source to this repo instead.
@ryzhyk
Copy link
Collaborator Author

ryzhyk commented Jun 30, 2022

I think this README is an improvement over what we have now. I am going to merge it and move on to other stuff. Improvements are welcome.

@ryzhyk ryzhyk merged commit 9496d3e into vmware-archive:main Jun 30, 2022
@ryzhyk ryzhyk deleted the crates_release branch June 30, 2022 06:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants