-
Notifications
You must be signed in to change notification settings - Fork 115
Expose line/column/character span position information from parsed EDN streams #282
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this shows that ValueAndSpan -> Value
will be much more pleasant. And there's an Into
/From
conversion in one direction.
edn/src/edn.rustpeg
Outdated
@@ -29,31 +30,60 @@ use types::Value; | |||
// TODO: Support discard | |||
|
|||
pub nil -> Value = | |||
"nil" { Value::Nil } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like rust-peg
probably solves this problem generically, marking up your tokens with spans automatically. Did you investigate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add a #position myself. This is WIP.
edn/src/types.rs
Outdated
|
||
/// Span represents the current offset into the input string. | ||
#[derive(PartialEq, Eq, Hash, Clone, Debug)] | ||
pub struct Span(usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do what the Rust compiler and other places do, which is that a span is a start and an end (or a length, but I prefer end) into the input buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rustpeg only gives positions. Compute the length myself?
0f430d6
to
bd9012c
Compare
Fine -_- |
bd9012c
to
1a46c33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just looked over this a little, and left some small notes.
As I say in more detail, I think there should be two parallel hierarchies: Value
and ValueAndSpan
. They're disjoint, with a map (a functor!) from ValueAndSpan
onto Value
.
edn/src/types.rs
Outdated
@@ -214,12 +264,6 @@ impl Value { | |||
} | |||
} | |||
|
|||
impl From<f64> for Value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One type can have many From
implementations. If anything, I want edn::Value
to have From
for more types (when there's no ambiguity). Can you explain why you removed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, you made these macros. For just a few implementations I think this trickery is overkill, but it's your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a leftover from the previous implementation, but makes this a bit more symmetrical/generic so I'll keep it.
edn/src/types.rs
Outdated
@@ -248,6 +292,18 @@ impl Ord for Value { | |||
} | |||
} | |||
|
|||
impl PartialOrd for ValueAndSpan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This surprises me. I don't expect ValueAndSpan
instances with different spans to compare or order equal.
@rnewman, do you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly necessary, I'll remove it if I have to.
edn/src/types.rs
Outdated
#[test] | ||
fn test_value_from() { | ||
assert_eq!(Value::from(42f64), Value::Float(OrderedFloat::from(42f64))); | ||
assert_eq!(Value::from_float(42f64), Value::Float(OrderedFloat::from(42f64))); | ||
assert_eq!(Value::from_ordered_float(OrderedFloat::from(42f64)), Value::Float(OrderedFloat::from(42f64))); | ||
assert_eq!(Value::from_bigint("42").unwrap(), Value::BigInteger(BigInt::from(42))); | ||
} | ||
|
||
#[test] | ||
fn test_print_edn() { | ||
let string = "[ 1 2 ( 3.14 ) #{ 4N } { foo/bar 42 :baz/boz 43 } [ ] :five :six/seven eight nine/ten true false nil #f NaN #f -Infinity #f +Infinity ]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're mapping Value
into ValueAndSpan
, manually. Why not map ValueAndSpan
onto Value
, which we need for all the other consumers? As written, this isn't testing the span business anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I duplicate the hierarchy as described above, then yes, mapping ValueAndSpan onto Value would be possible here. Otherwise not (because Value::Collection was over ValueAndSpan)
edn/src/types.rs
Outdated
// We're using a LinkedList here instead of a Vec or VecDeque because the | ||
// LinkedList is faster for appending (which we do a lot of). | ||
// See https://github.com/mozilla/mentat/issues/231 | ||
List(LinkedList<Value>), | ||
List(LinkedList<ValueAndSpan>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should remain Value
, for current consumers.
Then, we want a parallel hierarchy like you've defined here, which could have two types (ValueAndSpan(inner: InnerValueAndSpan , span: Span)
like you have, and InnerValueAndSpan
where the container values are ValueAndSpan
, just like here.
The map between is clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That's a lot of code duplication. No way around it?
edn/src/edn.rustpeg
Outdated
start:#position "nil" end:#position { | ||
ValueAndSpan { | ||
inner: Nil, | ||
span: Some(Span(start, end)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Nifty!
74e8c47
to
c733b73
Compare
Addressed comments and rebased. |
ff833b0
to
02cc039
Compare
@ncalexan this is ready for review. |
02cc039
to
17d8771
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks as I expect; I didn't look closely at the details.
Is there a reason to accept an optional span? That's what Value
is for, right?
edn/src/types.rs
Outdated
fn from(src: f64) -> Value { | ||
Value::Float(OrderedFloat::from(src)) | ||
impl ValueAndSpan { | ||
pub fn reduce(self) -> Value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reduction, but Rust tends to name general reductions fold
and special cases like this into_inner
, to be clear that you're consuming self
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You called this fold
, right? This isn't a general fold
, it's into_inner
.
edn/src/types.rs
Outdated
} | ||
} | ||
|
||
impl From<Value> for ValueAndSpan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Not sure I want this to be trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this was because it seemed useful to have a conversion between SpannedValue and Value, but in order to have that, we'd also need to have something from ValueAndSpan to Value, because SpannedValue collections have ValueAndSpan children. Not having trivial conversion here meant that a lot of code (like all the from_foo, from_keyword, from_namespace etc. methods would've had to be duplicated. I've ultimately done it with macros to avoid any trivial conversion from Value to ValueAndSpan.
Oh -- and do you think there might be a reason to just accept a generic type |
30989dc
to
f246677
Compare
I don't think there's any value in having that at this time. |
f246677
to
6c77cab
Compare
@ncalexan this is ready for final review. I've addressed all comments. |
3b880b0
to
67fcb6e
Compare
} | ||
|
||
// It's important that float comes before integer or the parser assumes that | ||
// floats are integers and fails to parse | ||
pub value -> Value = | ||
pub value -> ValueAndSpan = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not possible to do all of the #position
stuff here, in value
? That is, capture the posititions before and after the __
markers below, parse Value
in the internal bits, and then convert to ValueAndSpan
? Maybe not, because of the internal whitespace that gets ignored?
Ah, as I think more about this, I think not: the recursive nature of ValueAndSpan
denies that possibility, since the children of a container need to have the right type. So sad. (This might be possible with a ValueAndSpan<S>
parameterized by an optional span type.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you've figured out, not possible due to the fact that SpannedValue has ValueAndSpan children in collections.
edn/src/types.rs
Outdated
@@ -44,12 +43,63 @@ pub enum Value { | |||
Map(BTreeMap<Value, Value>), | |||
} | |||
|
|||
use self::Value::*; | |||
/// SpannedValue is the same as Value but used in ValueAndSpan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: SpannedValue
is parallel to Value
but container types have ValueAndSpan
children.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the comment. Should be obvious though.
edn/src/types.rs
Outdated
|
||
impl Display for Value { | ||
// TODO: Make sure float syntax is correct, handle NaN and escaping. | ||
// See https://github.com/mozilla/mentat/issues/232 | ||
fn fmt(&self, f: &mut Formatter) -> ::std::fmt::Result { | ||
use self::Value::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nifty -- I did not know this was possible in a function scope. TIL!
edn/src/types.rs
Outdated
/// in an Option, like `<Option<i64>`. | ||
/// Creates `as_$TYPE` helper functions for Value or SpannedValue, like | ||
/// `as_integer()`, which returns the underlying value representing the | ||
/// original variable wrapped in an Option, like `<Option<i64>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: here and below, drop the outer <>
. So "like Option<i64>
".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
edn/src/types.rs
Outdated
fn from(src: f64) -> Value { | ||
Value::Float(OrderedFloat::from(src)) | ||
impl ValueAndSpan { | ||
pub fn reduce(self) -> Value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You called this fold
, right? This isn't a general fold
, it's into_inner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still have some questions:
- (Important) why this implementation of ordering? If we really want an ordering, I'd much rather have the compiler derive the ordering, first by
inner
and then byspan
. - (Unimportant) Is the test structural change significant, or just a choice you made?
edn/tests/tests.rs
Outdated
|
||
#[test] | ||
fn test_nil() { | ||
assert_eq!(nil("nil").unwrap(), Nil); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughout this whole file, you've converted a large number of tests from comparing against T
to comparing against the edn::Value
version of T
. That's a valid choice, but why did you choose it? And could you have done this as a mechanical Pre:
patch? (With a commit comment explaining why you wanted this or what it makes easier later, etc, etc.) As it stands I can't see what's really new here and what's just different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might be able to get away with a bit less changes, but the crux of the issue is that both SpannedValue and Value are imported the the tests file, and the edn parser returns ValueAndSpan now. Without those changes, we would't be checking Values anymore, but SpannedValues, so some kind of conversion needs to be done: this is via folding, or maintaining the previous intent to test Value, or something else. My approach was one of many possible ones. To make life easier for other people, I chose to drop the *
import and make things specific.
tx-parser/tests/parser.rs
Outdated
@@ -26,7 +26,7 @@ fn test_entities() { | |||
let input = r#"[[:db/add 101 :test/a "v"] | |||
[:db/retract 102 :test/b "w"]]"#; | |||
|
|||
let edn = parse::value(input).unwrap(); | |||
let edn = parse::value(input).unwrap().fold(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I suggested into_inner
, but maybe we should be explicit and say without_spans
? That'll certainly read easiest...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
edn/src/types.rs
Outdated
|
||
impl Ord for SpannedValue { | ||
fn cmp(&self, other: &SpannedValue) -> Ordering { | ||
Value::from(self.clone()).cmp(&Value::from(other.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be very expensive. Why do either of these types implement Ord
and PartialOrd
at all? What is the desired ordering? Is it a domain concept? Is it because we're using edn::Value
in some ordered data structure? Maybe the inner BTreeSet
and BTreeMap
? We might not want to do that.
Can we be clear on what we're doing here and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on Slack, cloning isn't necessary, just a shortcut I took in order to get something that is landable (as in: provides value that doesn't exist yet, is better than nothing etc.). Changing this implementation to avoid cloning is relatively trivial, sufficient for a good followup good first bug, but I'll do it in this PR.
67fcb6e
to
b7cb143
Compare
@ncalexan Addressed comments, avoided cloning in |
5af9892
to
e9ca1d2
Compare
@ncalexan gentle review ping. |
1898ecb
to
fbcf3f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would s/without_span/without_spans/
, like I suggested in #282 (comment), since there are multiple spans getting stripped, not just the top-level span.
And I would bone out the commit comment to explain why we chose to have two types and code duplication, so that the next poor soul doesn't spend a lot of time learning the lessons we did.
Bombs away!
Signed-off-by: Victor Porof <victor.porof@gmail.com>
fbcf3f1
to
b457dad
Compare
Fixes #258
Signed-off-by: Victor Porof victor.porof@gmail.com