-
Notifications
You must be signed in to change notification settings - Fork 115
Implement a rudimentary Keyword struct and the beginnings of ident/entid. #139
Conversation
Are there supposed to be tests? I don't see any in the diff |
/// # extern crate datomish; | ||
/// # use datomish::keyword::Keyword; | ||
/// # fn main() { | ||
/// assert_eq!(":baz", Keyword::new("baz").to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These examples could be converted into tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this.
@bgrins: this is Rust; docstrings containing code automatically generate tests. Try it! |
Oh, that's nice |
// CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations under the License. | ||
|
||
/// Just like Clojure's Keyword. We'll need this as part of EDN parsing, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might end up being a cyclic dependency if it's needed for both EDN parsing and core but defined in core. The implementation probably makes sense in a new edn-parser crate and it can be imported into core from that crate if it's needed here.
If you want to start building out the edn-parser in core and break it into it's own crate later that's fine, just keeping in mind that it might become more difficult the longer we wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, pushing this into edn
would make sense. I wasn't too fussed about the location — just getting it down on paper.
pub type EntId = u32; // TODO: u64? Not all DB values will be representable in a u32. | ||
|
||
/// The ability to transform entity identifiers (entids) into keyword names (idents) and back again. | ||
pub trait Ident { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a new type going to be introduced in a later commit/PR that implements this? Not seeing this used other than referencing the module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just super early days for starting to define rudimentary types. Not used yet, but DB
will implement this trait.
Feel free to do what you think is best regarding modules vs crates here given the comments - we can always rearrange one way or the other |
As we prepare to re-implement Datomish in Rust, I'd like to challenge some of our assumptions. With regards to this ticket, why do we feel like using EDN for either our queries or our transactions is an advantage? It seems that we'd fit much better with the Firefox codebase (and JavaScript in general!) if we used JSON, perhaps with some conventions. In theory, EDN has richer types (notably a first class set) and is extensible (notably timestamps), but in practice do we really care to write parsers for EDN and teach folks a new grammar for such small improvements? Could we not get similar advantages from, say, YAML (which probably already has a Rust parser implementation)? At a smaller level, why do we feel that keywords in particular are an advantage? @rnewman and I have discussed interning all strings (like Java), although I only see #32 right now. Could we spend a little time identifying Clojure-style keywords with JavaScript-style Symbols and maybe get a more idiomatic interface for Firefox? |
Re EDN: there are a few advantages.
It's also worth noting that the potential saving would only be in reusing an existing Rust JSON/YAML parser, not in unifying data structures with the caller. A big chunk of my motivation for Rust work is that the result would be usable on iOS and Android, but even on desktop I don't think we could accept some kind of existing parsed JSON input through a programmatic API. Re Symbol: I don't get the impression that Symbol is particularly well-loved in JS. For one thing, Symbol properties are completely omitted from JSON object output, so there's no way to serialize them. That means finding a different serialization format for programmatic JS literals, which kinda defeats the point, no? Re keywords: I think the point is to be able to identify symbolic key words (haha) in situations in which ambiguity is possible with strings — imagine a query like
which would, in a trivial non-keywording system, be indistinguishable from
It's likely that we'll indeed want some kind of simplified JSON interface for some operations in which ambiguity isn't possible, like the simplified schema work we did. But I expect reworking the schema/datoms format and the query language to be adequately and concisely representable in the general case in a weaker format like JSON to be a bit of a pain, as well as losing interop with other datom systems. The quoting would also be a pain in the ass for 'flat' (array-formed) queries:
|
Clarifying point: there are uses for keywords in data (an interned symbolic string), in queries (to identify keyword data), and in syntax (to call out special words like |
I find this compelling -- for a time. We already have some significant differences from Datomic, compounded by the fact that Datomic itself is not specified (other than by its implementation). This gets worse over time. I think it's a good trade-off right now, though.
I don't think this is worth the effort. Datomic, DataScript, and now Datomish, have slightly different semantics and edge cases. I'd be more compelled by a test suite across the implementations, but there's little motivation for Datomic to contribute.
It might. This is not compelling to me. I've written "inter-op" functions to test behaviour, and already there are enough small differences to convince me that we can't re-use without effort.
I think my point is that EDN makes sense in languages that buy into the "untyped data structures" world view. But Rust is 100% not in that language category; it's centered around statically typed structs. So maybe there's an EDN thing floating around -- by why EDN? It's not standard or likely to be standardized beyond the Clojure implementation (I saw frustration around the lack of a grammar floating around the 'net) and doesn't have any traction in the Rust and JavaScript communities. (Nor the Swift and Java communities, while we're at it.) I think it's worth trying pretty hard for unifying data structures with callers. Some part of our own motivation for Datomish is to avoid the proliferation of "magic SQL statements", hand loved through-out eternity. They're hard to build programmatically and costly to parse. Maybe there's something richer than JSON that would do for us? Transit comes to mind; it certainly encodes types, and perhaps "just works" with Symbol. But I know little about Transit right now.
Perhaps? I don't know the details about
We have the option of defining our own custom DSL, since we're going to be parsing everything anyway. If we were focusing on Firefox's use cases, without reference to the existing Datomish implementation, would we choose to Lisp-y syntax? (And even Clojure's Lisp-y syntax at that!) I know I would not. I would choose something Rust-y, and think long and hard about Rust ergonomics. I would only adopt a string-ly typed representation that our consumers wouldn't be familiar with due to some compelling reasons. |
Not really. I mean, there's no difference in "untypedness" between an array of EDN elements and a JSON array, and Rust can express both — you just use an Queries and results will both always end up in this situation. Bindings are heterogeneous, query shapes vary and include different types, and results reflect the data in the store. We do have our own less loosely typed DSL for queries — it's Datomic's answer for all of those is EDN, because it's expressive enough for all of the types that Datomic stores, and it's able to express relatively flat heterogeneous sequences as found in queries. It is frustrating that EDN is not thoroughly defined (though there are implementations around in ANTLR and in Haskell and other concise formats). We can try to find a different answer for each of those input/output needs, potentially finding ones that are easier to write, easier to serialize from JS or Swift, etc. I'm a little leery of the effort required to do so: in order to cover all our bases we'd end up reinventing the subset of EDN used by Datomish already. I suspect something like YAML would give us just as much trouble, just in different places. That is: "parse EDN" is a relatively well-scoped problem that @grigoryk is already looking at on the side. "Define query and output formats" is much less well-scoped. |
It's also worth noting that there doesn't have to be only one supported format in each situation: we can support as many formats as we like, so long as they are adequately expressive. |
OBTW, this is a good time to point to https://github.com/joewalker/barnardsstar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me code-wise with or without moving to a crate (though I'd prefer we did ultimately). Asking @ncalexan to weigh in on the higher level design since he has more context.
I'm really thinking of the API layer beneath text. EDN is natural in Clojure, but it's artificial in all these statically typed languages. (Remember how much we fought JSON in Swift -- we eventually found an idiom, and I'm sure we will here, but there's a cost.) My experience with Tofino and our existing Datomish implementation is that we don't have heterogeneous data. We support returning the kind of arbitrary queries you sketch below, but we didn't utilize them in our limited experiment. I personally believe we're unlikely to do so, at least at any scale. So EDN encapuslates arbitrary, but doesn't reflect the homogeneity we actually end up using. What would our interface look like if we tried to make the results reflect the query? (I understand there is a hard block on how much we can do this because the schema is not known at compile time.) The textual formats will work themselves out once we decide what we want to allow to express. EDN says express anything. Is that right? It certainly lets us not make a bunch of hard decisions, as you point out below.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if the Ident
trait is correct. I expect this to be two traits, to allow it to be used in function parameter declarations; and I expect it to be paramaterized in some way by a DB
-like trait, to specialize on the implementation. That's just my guess based on Haskell intuition, though.
Baby is screaming and I don't want to block this. Roll on!
/// # extern crate datomish; | ||
/// # use datomish::keyword::Keyword; | ||
/// # fn main() { | ||
/// assert_eq!(":baz", Keyword::new("baz").to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this.
} | ||
|
||
pub fn namespaced(name: &str, namespace: &str) -> Self { | ||
return Keyword { name: name.to_string(), namespace: Some(namespace.to_string()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well check that namespace
isn't empty, and that neither of these bits have /
in them (really, are valid identifiers), and that we don't accept ::
(do we?).
270f296
to
8e538fe
Compare
I added a thorough and much improved set of fixes in 89cbc3d. I'll merge with that. |
// combinations in Rust. It must be common in ASTs. Trait objects | ||
// presumably aren't the answer… | ||
pub enum Element { | ||
Variable { variable: Variable }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://doc.rust-lang.org/1.1.0/syntax/ast/enum.Expr_.html
… looks like this is the idiomatic way to express ASTs!
8e538fe
to
daddfd3
Compare
Inline tests! Module files!