Implement a rudimentary Keyword struct and the beginnings of ident/entid. #139

rnewman · 2016-12-21T16:56:17Z

Inline tests! Module files!

bgrins · 2016-12-21T17:22:59Z

Inline tests

Are there supposed to be tests? I don't see any in the diff

bgrins · 2016-12-21T17:24:22Z

src/keyword.rs

+    /// # extern crate datomish;
+    /// # use datomish::keyword::Keyword;
+    /// # fn main() {
+    ///   assert_eq!(":baz", Keyword::new("baz").to_string());


These examples could be converted into tests

+1 to this.

rnewman · 2016-12-21T17:33:44Z

@bgrins: this is Rust; docstrings containing code automatically generate tests. Try it! cargo test will run those two assertions.

bgrins · 2016-12-21T18:42:36Z

docstrings containing code automatically generate tests

Oh, that's nice

bgrins · 2016-12-21T18:49:38Z

src/keyword.rs

+// CONDITIONS OF ANY KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations under the License.
+
+/// Just like Clojure's Keyword. We'll need this as part of EDN parsing,


This might end up being a cyclic dependency if it's needed for both EDN parsing and core but defined in core. The implementation probably makes sense in a new edn-parser crate and it can be imported into core from that crate if it's needed here.

If you want to start building out the edn-parser in core and break it into it's own crate later that's fine, just keeping in mind that it might become more difficult the longer we wait.

@rnewman any thoughts about this? I may be missing context, but it seems like starting a new edn-parser subfolder / crate would be the right place to put this (repeating what's done in #138)

Yeah, pushing this into edn would make sense. I wasn't too fussed about the location — just getting it down on paper.

bgrins · 2016-12-21T22:15:07Z

src/ident.rs

+pub type EntId = u32;            // TODO: u64? Not all DB values will be representable in a u32.
+
+/// The ability to transform entity identifiers (entids) into keyword names (idents) and back again.
+pub trait Ident {


Is a new type going to be introduced in a later commit/PR that implements this? Not seeing this used other than referencing the module

This is just super early days for starting to define rudimentary types. Not used yet, but DB will implement this trait.

bgrins · 2016-12-21T22:17:16Z

Feel free to do what you think is best regarding modules vs crates here given the comments - we can always rearrange one way or the other

ncalexan · 2016-12-22T00:29:53Z

As we prepare to re-implement Datomish in Rust, I'd like to challenge some of our assumptions. With regards to this ticket, why do we feel like using EDN for either our queries or our transactions is an advantage?

It seems that we'd fit much better with the Firefox codebase (and JavaScript in general!) if we used JSON, perhaps with some conventions. In theory, EDN has richer types (notably a first class set) and is extensible (notably timestamps), but in practice do we really care to write parsers for EDN and teach folks a new grammar for such small improvements? Could we not get similar advantages from, say, YAML (which probably already has a Rust parser implementation)?

At a smaller level, why do we feel that keywords in particular are an advantage? @rnewman and I have discussed interning all strings (like Java), although I only see #32 right now. Could we spend a little time identifying Clojure-style keywords with JavaScript-style Symbols and maybe get a more idiomatic interface for Firefox?

rnewman · 2016-12-22T00:53:22Z

Re EDN: there are a few advantages.

By not diverging from Datomic and Datascript, we get to leverage existing documentation/tutorials/etc. directly. We can just point people at those to set context.
Similarly, we get to share tooling — it becomes possible to more directly share replicators, importers, etc.
We've briefly discussed, both internally and with other teams and people, the ability to reuse queries/schema/data in Datascript and Datomic. Datomic's licensing is now quite affordable, so this might be useful.

It's also worth noting that the potential saving would only be in reusing an existing Rust JSON/YAML parser, not in unifying data structures with the caller. A big chunk of my motivation for Rust work is that the result would be usable on iOS and Android, but even on desktop I don't think we could accept some kind of existing parsed JSON input through a programmatic API.

Re Symbol: I don't get the impression that Symbol is particularly well-loved in JS. For one thing, Symbol properties are completely omitted from JSON object output, so there's no way to serialize them. That means finding a different serialization format for programmatic JS literals, which kinda defeats the point, no?

Re keywords: I think the point is to be able to identify symbolic key words (haha) in situations in which ambiguity is possible with strings — imagine a query like

[:find ?x :in $ :where [?x _ :some/thing]]

which would, in a trivial non-keywording system, be indistinguishable from

[:find ?x :in $ :where [?x _ "some/thing"]]

It's likely that we'll indeed want some kind of simplified JSON interface for some operations in which ambiguity isn't possible, like the simplified schema work we did. But I expect reworking the schema/datoms format and the query language to be adequately and concisely representable in the general case in a weaker format like JSON to be a bit of a pain, as well as losing interop with other datom systems.

The quoting would also be a pain in the ass for 'flat' (array-formed) queries:

["find" ["?x" "?y"] "in" ["$"] "where" …]

rnewman · 2016-12-22T00:55:23Z

Clarifying point: there are uses for keywords in data (an interned symbolic string), in queries (to identify keyword data), and in syntax (to call out special words like :find and :in and distinguish them from symbols like ?in and data like "in").

ncalexan · 2016-12-22T05:55:05Z

Re EDN: there are a few advantages.

By not diverging from Datomic and Datascript, we get to leverage existing documentation/tutorials/etc. directly. We can just point people at those to set context.

I find this compelling -- for a time. We already have some significant differences from Datomic, compounded by the fact that Datomic itself is not specified (other than by its implementation). This gets worse over time. I think it's a good trade-off right now, though.

Similarly, we get to share tooling — it becomes possible to more directly share replicators, importers, etc.

I don't think this is worth the effort. Datomic, DataScript, and now Datomish, have slightly different semantics and edge cases. I'd be more compelled by a test suite across the implementations, but there's little motivation for Datomic to contribute.

We've briefly discussed, both internally and with other teams and people, the ability to reuse queries/schema/data in Datascript and Datomic. Datomic's licensing is now quite affordable, so this might be useful.

It might. This is not compelling to me. I've written "inter-op" functions to test behaviour, and already there are enough small differences to convince me that we can't re-use without effort.

It's also worth noting that the potential saving would only be in reusing an existing Rust JSON/YAML parser, not in unifying data structures with the caller. A big chunk of my motivation for Rust work is that the result would be usable on iOS and Android, but even on desktop I don't think we could accept some kind of existing parsed JSON input through a programmatic API.

I think my point is that EDN makes sense in languages that buy into the "untyped data structures" world view. But Rust is 100% not in that language category; it's centered around statically typed structs. So maybe there's an EDN thing floating around -- by why EDN? It's not standard or likely to be standardized beyond the Clojure implementation (I saw frustration around the lack of a grammar floating around the 'net) and doesn't have any traction in the Rust and JavaScript communities. (Nor the Swift and Java communities, while we're at it.)

I think it's worth trying pretty hard for unifying data structures with callers. Some part of our own motivation for Datomish is to avoid the proliferation of "magic SQL statements", hand loved through-out eternity. They're hard to build programmatically and costly to parse. Maybe there's something richer than JSON that would do for us? Transit comes to mind; it certainly encodes types, and perhaps "just works" with Symbol. But I know little about Transit right now.

Re Symbol: I don't get the impression that Symbol is particularly well-loved in JS. For one thing, Symbol properties are completely omitted from JSON object output, so there's no way to serialize them. That means finding a different serialization format for programmatic JS literals, which kinda defeats the point, no?

Perhaps? I don't know the details about Symbol. But with wavy hands, Clojure keywords are JavaScript symbols. We should understand exactly why we're brewing our own thing. (Of course, it's possible our own thing is required and is isomorphic to EDN.)

Re keywords: I think the point is to be able to identify symbolic key words (haha) in situations in which ambiguity is possible with strings — imagine a query like

[:find ?x :in $ :where [?x _ :some/thing]]

which would, in a trivial non-keywording system, be indistinguishable from

[:find ?x :in $ :where [?x _ "some/thing"]]

It's likely that we'll indeed want some kind of simplified JSON interface for some operations in which ambiguity isn't possible, like the simplified schema work we did. But I expect reworking the schema/datoms format and the query language to be adequately and concisely representable in the general case in a weaker format like JSON to be a bit of a pain, as well as losing interop with other datom systems.

The quoting would also be a pain in the ass for 'flat' (array-formed) queries:

["find" ["?x" "?y"] "in" ["$"] "where" …]

We have the option of defining our own custom DSL, since we're going to be parsing everything anyway. If we were focusing on Firefox's use cases, without reference to the existing Datomish implementation, would we choose to Lisp-y syntax? (And even Clojure's Lisp-y syntax at that!) I know I would not. I would choose something Rust-y, and think long and hard about Rust ergonomics. I would only adopt a string-ly typed representation that our consumers wouldn't be familiar with due to some compelling reasons.

rnewman · 2016-12-22T07:51:32Z

I think my point is that EDN makes sense in languages that buy into the "untyped data structures" world view. But Rust is 100% not in that language category; it's centered around statically typed structs.

Not really. I mean, there's no difference in "untypedness" between an array of EDN elements and a JSON array, and Rust can express both — you just use an enum, exactly as in Swift, and Vec is perfectly happy to store those things.

Queries and results will both always end up in this situation. Bindings are heterogeneous, query shapes vary and include different types, and results reflect the data in the store.

We do have our own less loosely typed DSL for queries — it's FindRel and friends — that's another parser step away. The question I think you're narrowing in on is: what are the textual formats that humans write for queries, that results are serialized in, and that are used by default for transaction listeners?

Datomic's answer for all of those is EDN, because it's expressive enough for all of the types that Datomic stores, and it's able to express relatively flat heterogeneous sequences as found in queries.

It is frustrating that EDN is not thoroughly defined (though there are implementations around in ANTLR and in Haskell and other concise formats).

We can try to find a different answer for each of those input/output needs, potentially finding ones that are easier to write, easier to serialize from JS or Swift, etc.

I'm a little leery of the effort required to do so: in order to cover all our bases we'd end up reinventing the subset of EDN used by Datomish already. I suspect something like YAML would give us just as much trouble, just in different places. That is: "parse EDN" is a relatively well-scoped problem that @grigoryk is already looking at on the side. "Define query and output formats" is much less well-scoped.

rnewman · 2016-12-22T09:05:49Z

It's also worth noting that there doesn't have to be only one supported format in each situation: we can support as many formats as we like, so long as they are adequately expressive.

rnewman · 2016-12-22T21:19:01Z

OBTW, this is a good time to point to https://github.com/joewalker/barnardsstar.

bgrins

This looks fine to me code-wise with or without moving to a crate (though I'd prefer we did ultimately). Asking @ncalexan to weigh in on the higher level design since he has more context.

ncalexan · 2016-12-23T05:14:49Z

I think my point is that EDN makes sense in languages that buy into the "untyped data structures" world view. But Rust is 100% not in that language category; it's centered around statically typed structs.

We do have our own less loosely typed DSL for queries — it's FindRel and friends — that's another parser step away. The question I think you're narrowing in on is: what are the textual formats that humans write for queries, that results are serialized in, and that are used by default for transaction listeners?

I'm really thinking of the API layer beneath text. EDN is natural in Clojure, but it's artificial in all these statically typed languages. (Remember how much we fought JSON in Swift -- we eventually found an idiom, and I'm sure we will here, but there's a cost.)

My experience with Tofino and our existing Datomish implementation is that we don't have heterogeneous data. We support returning the kind of arbitrary queries you sketch below, but we didn't utilize them in our limited experiment. I personally believe we're unlikely to do so, at least at any scale. So EDN encapuslates arbitrary, but doesn't reflect the homogeneity we actually end up using. What would our interface look like if we tried to make the results reflect the query? (I understand there is a hard block on how much we can do this because the schema is not known at compile time.)

The textual formats will work themselves out once we decide what we want to allow to express. EDN says express anything. Is that right? It certainly lets us not make a bunch of hard decisions, as you point out below.

Datomic's answer for all of those is EDN, because it's expressive enough for all of the types that Datomic stores, and it's able to express relatively flat heterogeneous sequences as found in queries.

ncalexan

I do wonder if the Ident trait is correct. I expect this to be two traits, to allow it to be used in function parameter declarations; and I expect it to be paramaterized in some way by a DB-like trait, to specialize on the implementation. That's just my guess based on Haskell intuition, though.

Baby is screaming and I don't want to block this. Roll on!

ncalexan · 2016-12-23T05:16:30Z

src/keyword.rs

+    /// # extern crate datomish;
+    /// # use datomish::keyword::Keyword;
+    /// # fn main() {
+    ///   assert_eq!(":baz", Keyword::new("baz").to_string());


+1 to this.

ncalexan · 2016-12-23T05:17:31Z

src/keyword.rs

+    }
+
+    pub fn namespaced(name: &str, namespace: &str) -> Self {
+        return Keyword { name: name.to_string(), namespace: Some(namespace.to_string()) };


Might as well check that namespace isn't empty, and that neither of these bits have / in them (really, are valid identifiers), and that we don't accept :: (do we?).

rnewman · 2017-01-05T01:39:13Z

I added a thorough and much improved set of fixes in 89cbc3d. I'll merge with that.

rnewman · 2017-01-05T01:49:17Z

query/src/find.rs

+// combinations in Rust. It must be common in ASTs. Trait objects
+// presumably aren't the answer…
+pub enum Element {
+    Variable { variable: Variable },


https://doc.rust-lang.org/1.1.0/syntax/ast/enum.Expr_.html

… looks like this is the idiomatic way to express ASTs!

…tid.

… language.

rnewman requested a review from bgrins December 21, 2016 16:56

rnewman added the in progress label Dec 21, 2016

bgrins reviewed Dec 21, 2016

View reviewed changes

bgrins requested a review from ncalexan December 22, 2016 22:10

bgrins approved these changes Dec 22, 2016

View reviewed changes

ncalexan approved these changes Dec 23, 2016

View reviewed changes

ncalexan mentioned this pull request Dec 24, 2016

Consider using a more general parser tookit (nom?) instead of pest joewalker/barnardsstar#1

Open

rnewman force-pushed the rnewman/idents branch from 270f296 to 8e538fe Compare January 5, 2017 01:37

rnewman commented Jan 5, 2017

View reviewed changes

rnewman added 2 commits January 9, 2017 12:31

Implement a rudimentary Keyword struct and the beginnings of ident/en…

476f04e

…tid.

Add query sub-crate, implementing more of the beginnings of the query…

daddfd3

… language.

rnewman force-pushed the rnewman/idents branch from 8e538fe to daddfd3 Compare January 9, 2017 20:32

rnewman mentioned this pull request Jan 9, 2017

Make the EDN parser support the correct tokens for Keyword and Symbol #154

Closed

rnewman merged commit daddfd3 into rust Jan 9, 2017

rnewman removed the in progress label Jan 9, 2017

rnewman deleted the rnewman/idents branch January 9, 2017 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a rudimentary Keyword struct and the beginnings of ident/entid. #139

Implement a rudimentary Keyword struct and the beginnings of ident/entid. #139

rnewman commented Dec 21, 2016

bgrins commented Dec 21, 2016

bgrins Dec 21, 2016

ncalexan Dec 23, 2016

rnewman commented Dec 21, 2016

bgrins commented Dec 21, 2016 •

edited

Loading

bgrins Dec 21, 2016

bgrins Dec 21, 2016

rnewman Dec 22, 2016

bgrins Dec 21, 2016

rnewman Dec 22, 2016

bgrins commented Dec 21, 2016

ncalexan commented Dec 22, 2016

rnewman commented Dec 22, 2016 •

edited

Loading

rnewman commented Dec 22, 2016

ncalexan commented Dec 22, 2016

rnewman commented Dec 22, 2016

rnewman commented Dec 22, 2016

rnewman commented Dec 22, 2016

bgrins left a comment

ncalexan commented Dec 23, 2016

ncalexan left a comment

ncalexan Dec 23, 2016

ncalexan Dec 23, 2016

rnewman commented Jan 5, 2017

rnewman Jan 5, 2017

Implement a rudimentary Keyword struct and the beginnings of ident/entid. #139

Implement a rudimentary Keyword struct and the beginnings of ident/entid. #139

Conversation

rnewman commented Dec 21, 2016

bgrins commented Dec 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rnewman commented Dec 21, 2016

bgrins commented Dec 21, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgrins commented Dec 21, 2016

ncalexan commented Dec 22, 2016

rnewman commented Dec 22, 2016 • edited Loading

rnewman commented Dec 22, 2016

ncalexan commented Dec 22, 2016

rnewman commented Dec 22, 2016

rnewman commented Dec 22, 2016

rnewman commented Dec 22, 2016

bgrins left a comment

Choose a reason for hiding this comment

ncalexan commented Dec 23, 2016

ncalexan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rnewman commented Jan 5, 2017

Choose a reason for hiding this comment

bgrins commented Dec 21, 2016 •

edited

Loading

rnewman commented Dec 22, 2016 •

edited

Loading