Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

Implement a rudimentary Keyword struct and the beginnings of ident/entid. #139

Merged
merged 2 commits into from
Jan 9, 2017

Conversation

rnewman
Copy link
Collaborator

@rnewman rnewman commented Dec 21, 2016

Inline tests! Module files!

@bgrins
Copy link
Member

bgrins commented Dec 21, 2016

Inline tests

Are there supposed to be tests? I don't see any in the diff

/// # extern crate datomish;
/// # use datomish::keyword::Keyword;
/// # fn main() {
/// assert_eq!(":baz", Keyword::new("baz").to_string());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples could be converted into tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this.

@rnewman
Copy link
Collaborator Author

rnewman commented Dec 21, 2016

@bgrins: this is Rust; docstrings containing code automatically generate tests. Try it! cargo test will run those two assertions.

@bgrins
Copy link
Member

bgrins commented Dec 21, 2016

docstrings containing code automatically generate tests

Oh, that's nice

// CONDITIONS OF ANY KIND, either express or implied. See the License for the
// specific language governing permissions and limitations under the License.

/// Just like Clojure's Keyword. We'll need this as part of EDN parsing,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might end up being a cyclic dependency if it's needed for both EDN parsing and core but defined in core. The implementation probably makes sense in a new edn-parser crate and it can be imported into core from that crate if it's needed here.

If you want to start building out the edn-parser in core and break it into it's own crate later that's fine, just keeping in mind that it might become more difficult the longer we wait.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rnewman any thoughts about this? I may be missing context, but it seems like starting a new edn-parser subfolder / crate would be the right place to put this (repeating what's done in #138)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, pushing this into edn would make sense. I wasn't too fussed about the location — just getting it down on paper.

pub type EntId = u32; // TODO: u64? Not all DB values will be representable in a u32.

/// The ability to transform entity identifiers (entids) into keyword names (idents) and back again.
pub trait Ident {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a new type going to be introduced in a later commit/PR that implements this? Not seeing this used other than referencing the module

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just super early days for starting to define rudimentary types. Not used yet, but DB will implement this trait.

@bgrins
Copy link
Member

bgrins commented Dec 21, 2016

Feel free to do what you think is best regarding modules vs crates here given the comments - we can always rearrange one way or the other

@ncalexan
Copy link
Member

As we prepare to re-implement Datomish in Rust, I'd like to challenge some of our assumptions. With regards to this ticket, why do we feel like using EDN for either our queries or our transactions is an advantage?

It seems that we'd fit much better with the Firefox codebase (and JavaScript in general!) if we used JSON, perhaps with some conventions. In theory, EDN has richer types (notably a first class set) and is extensible (notably timestamps), but in practice do we really care to write parsers for EDN and teach folks a new grammar for such small improvements? Could we not get similar advantages from, say, YAML (which probably already has a Rust parser implementation)?

At a smaller level, why do we feel that keywords in particular are an advantage? @rnewman and I have discussed interning all strings (like Java), although I only see #32 right now. Could we spend a little time identifying Clojure-style keywords with JavaScript-style Symbols and maybe get a more idiomatic interface for Firefox?

@rnewman
Copy link
Collaborator Author

rnewman commented Dec 22, 2016

Re EDN: there are a few advantages.

  • By not diverging from Datomic and Datascript, we get to leverage existing documentation/tutorials/etc. directly. We can just point people at those to set context.
  • Similarly, we get to share tooling — it becomes possible to more directly share replicators, importers, etc.
  • We've briefly discussed, both internally and with other teams and people, the ability to reuse queries/schema/data in Datascript and Datomic. Datomic's licensing is now quite affordable, so this might be useful.

It's also worth noting that the potential saving would only be in reusing an existing Rust JSON/YAML parser, not in unifying data structures with the caller. A big chunk of my motivation for Rust work is that the result would be usable on iOS and Android, but even on desktop I don't think we could accept some kind of existing parsed JSON input through a programmatic API.

Re Symbol: I don't get the impression that Symbol is particularly well-loved in JS. For one thing, Symbol properties are completely omitted from JSON object output, so there's no way to serialize them. That means finding a different serialization format for programmatic JS literals, which kinda defeats the point, no?

Re keywords: I think the point is to be able to identify symbolic key words (haha) in situations in which ambiguity is possible with strings — imagine a query like

[:find ?x :in $ :where [?x _ :some/thing]]

which would, in a trivial non-keywording system, be indistinguishable from

[:find ?x :in $ :where [?x _ "some/thing"]]

It's likely that we'll indeed want some kind of simplified JSON interface for some operations in which ambiguity isn't possible, like the simplified schema work we did. But I expect reworking the schema/datoms format and the query language to be adequately and concisely representable in the general case in a weaker format like JSON to be a bit of a pain, as well as losing interop with other datom systems.

The quoting would also be a pain in the ass for 'flat' (array-formed) queries:

["find" ["?x" "?y"] "in" ["$"] "where" …]

@rnewman
Copy link
Collaborator Author

rnewman commented Dec 22, 2016

Clarifying point: there are uses for keywords in data (an interned symbolic string), in queries (to identify keyword data), and in syntax (to call out special words like :find and :in and distinguish them from symbols like ?in and data like "in").

@ncalexan
Copy link
Member

Re EDN: there are a few advantages.

By not diverging from Datomic and Datascript, we get to leverage existing documentation/tutorials/etc. directly. We can just point people at those to set context.

I find this compelling -- for a time. We already have some significant differences from Datomic, compounded by the fact that Datomic itself is not specified (other than by its implementation). This gets worse over time. I think it's a good trade-off right now, though.

Similarly, we get to share tooling — it becomes possible to more directly share replicators, importers, etc.

I don't think this is worth the effort. Datomic, DataScript, and now Datomish, have slightly different semantics and edge cases. I'd be more compelled by a test suite across the implementations, but there's little motivation for Datomic to contribute.

We've briefly discussed, both internally and with other teams and people, the ability to reuse queries/schema/data in Datascript and Datomic. Datomic's licensing is now quite affordable, so this might be useful.

It might. This is not compelling to me. I've written "inter-op" functions to test behaviour, and already there are enough small differences to convince me that we can't re-use without effort.

It's also worth noting that the potential saving would only be in reusing an existing Rust JSON/YAML parser, not in unifying data structures with the caller. A big chunk of my motivation for Rust work is that the result would be usable on iOS and Android, but even on desktop I don't think we could accept some kind of existing parsed JSON input through a programmatic API.

I think my point is that EDN makes sense in languages that buy into the "untyped data structures" world view. But Rust is 100% not in that language category; it's centered around statically typed structs. So maybe there's an EDN thing floating around -- by why EDN? It's not standard or likely to be standardized beyond the Clojure implementation (I saw frustration around the lack of a grammar floating around the 'net) and doesn't have any traction in the Rust and JavaScript communities. (Nor the Swift and Java communities, while we're at it.)

I think it's worth trying pretty hard for unifying data structures with callers. Some part of our own motivation for Datomish is to avoid the proliferation of "magic SQL statements", hand loved through-out eternity. They're hard to build programmatically and costly to parse. Maybe there's something richer than JSON that would do for us? Transit comes to mind; it certainly encodes types, and perhaps "just works" with Symbol. But I know little about Transit right now.

Re Symbol: I don't get the impression that Symbol is particularly well-loved in JS. For one thing, Symbol properties are completely omitted from JSON object output, so there's no way to serialize them. That means finding a different serialization format for programmatic JS literals, which kinda defeats the point, no?

Perhaps? I don't know the details about Symbol. But with wavy hands, Clojure keywords are JavaScript symbols. We should understand exactly why we're brewing our own thing. (Of course, it's possible our own thing is required and is isomorphic to EDN.)

Re keywords: I think the point is to be able to identify symbolic key words (haha) in situations in which ambiguity is possible with strings — imagine a query like

[:find ?x :in $ :where [?x _ :some/thing]]

which would, in a trivial non-keywording system, be indistinguishable from

[:find ?x :in $ :where [?x _ "some/thing"]]

It's likely that we'll indeed want some kind of simplified JSON interface for some operations in which ambiguity isn't possible, like the simplified schema work we did. But I expect reworking the schema/datoms format and the query language to be adequately and concisely representable in the general case in a weaker format like JSON to be a bit of a pain, as well as losing interop with other datom systems.

The quoting would also be a pain in the ass for 'flat' (array-formed) queries:

["find" ["?x" "?y"] "in" ["$"] "where" …]

We have the option of defining our own custom DSL, since we're going to be parsing everything anyway. If we were focusing on Firefox's use cases, without reference to the existing Datomish implementation, would we choose to Lisp-y syntax? (And even Clojure's Lisp-y syntax at that!) I know I would not. I would choose something Rust-y, and think long and hard about Rust ergonomics. I would only adopt a string-ly typed representation that our consumers wouldn't be familiar with due to some compelling reasons.

@rnewman
Copy link
Collaborator Author

rnewman commented Dec 22, 2016

I think my point is that EDN makes sense in languages that buy into the "untyped data structures" world view. But Rust is 100% not in that language category; it's centered around statically typed structs.

Not really. I mean, there's no difference in "untypedness" between an array of EDN elements and a JSON array, and Rust can express both — you just use an enum, exactly as in Swift, and Vec is perfectly happy to store those things.

Queries and results will both always end up in this situation. Bindings are heterogeneous, query shapes vary and include different types, and results reflect the data in the store.

We do have our own less loosely typed DSL for queries — it's FindRel and friends — that's another parser step away. The question I think you're narrowing in on is: what are the textual formats that humans write for queries, that results are serialized in, and that are used by default for transaction listeners?

Datomic's answer for all of those is EDN, because it's expressive enough for all of the types that Datomic stores, and it's able to express relatively flat heterogeneous sequences as found in queries.

It is frustrating that EDN is not thoroughly defined (though there are implementations around in ANTLR and in Haskell and other concise formats).

We can try to find a different answer for each of those input/output needs, potentially finding ones that are easier to write, easier to serialize from JS or Swift, etc.

I'm a little leery of the effort required to do so: in order to cover all our bases we'd end up reinventing the subset of EDN used by Datomish already. I suspect something like YAML would give us just as much trouble, just in different places. That is: "parse EDN" is a relatively well-scoped problem that @grigoryk is already looking at on the side. "Define query and output formats" is much less well-scoped.

@rnewman
Copy link
Collaborator Author

rnewman commented Dec 22, 2016

It's also worth noting that there doesn't have to be only one supported format in each situation: we can support as many formats as we like, so long as they are adequately expressive.

@rnewman
Copy link
Collaborator Author

rnewman commented Dec 22, 2016

OBTW, this is a good time to point to https://github.com/joewalker/barnardsstar.

@bgrins bgrins requested a review from ncalexan December 22, 2016 22:10
Copy link
Member

@bgrins bgrins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me code-wise with or without moving to a crate (though I'd prefer we did ultimately). Asking @ncalexan to weigh in on the higher level design since he has more context.

@ncalexan
Copy link
Member

I think my point is that EDN makes sense in languages that buy into the "untyped data structures" world view. But Rust is 100% not in that language category; it's centered around statically typed structs.

We do have our own less loosely typed DSL for queries — it's FindRel and friends — that's another parser step away. The question I think you're narrowing in on is: what are the textual formats that humans write for queries, that results are serialized in, and that are used by default for transaction listeners?

I'm really thinking of the API layer beneath text. EDN is natural in Clojure, but it's artificial in all these statically typed languages. (Remember how much we fought JSON in Swift -- we eventually found an idiom, and I'm sure we will here, but there's a cost.)

My experience with Tofino and our existing Datomish implementation is that we don't have heterogeneous data. We support returning the kind of arbitrary queries you sketch below, but we didn't utilize them in our limited experiment. I personally believe we're unlikely to do so, at least at any scale. So EDN encapuslates arbitrary, but doesn't reflect the homogeneity we actually end up using. What would our interface look like if we tried to make the results reflect the query? (I understand there is a hard block on how much we can do this because the schema is not known at compile time.)

The textual formats will work themselves out once we decide what we want to allow to express. EDN says express anything. Is that right? It certainly lets us not make a bunch of hard decisions, as you point out below.

Datomic's answer for all of those is EDN, because it's expressive enough for all of the types that Datomic stores, and it's able to express relatively flat heterogeneous sequences as found in queries.

Copy link
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if the Ident trait is correct. I expect this to be two traits, to allow it to be used in function parameter declarations; and I expect it to be paramaterized in some way by a DB-like trait, to specialize on the implementation. That's just my guess based on Haskell intuition, though.

Baby is screaming and I don't want to block this. Roll on!

/// # extern crate datomish;
/// # use datomish::keyword::Keyword;
/// # fn main() {
/// assert_eq!(":baz", Keyword::new("baz").to_string());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this.

}

pub fn namespaced(name: &str, namespace: &str) -> Self {
return Keyword { name: name.to_string(), namespace: Some(namespace.to_string()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well check that namespace isn't empty, and that neither of these bits have / in them (really, are valid identifiers), and that we don't accept :: (do we?).

@rnewman
Copy link
Collaborator Author

rnewman commented Jan 5, 2017

I added a thorough and much improved set of fixes in 89cbc3d. I'll merge with that.

// combinations in Rust. It must be common in ASTs. Trait objects
// presumably aren't the answer…
pub enum Element {
Variable { variable: Variable },
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://doc.rust-lang.org/1.1.0/syntax/ast/enum.Expr_.html

… looks like this is the idiomatic way to express ASTs!

@rnewman rnewman deleted the rnewman/idents branch January 9, 2017 20:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants