Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

Expose line/column/character span position information from parsed EDN streams #282

Closed
wants to merge 1 commit into from

Conversation

victorporof
Copy link
Contributor

@victorporof victorporof commented Feb 10, 2017

Fixes #258

Signed-off-by: Victor Porof victor.porof@gmail.com

Copy link
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this shows that ValueAndSpan -> Value will be much more pleasant. And there's an Into/From conversion in one direction.

@@ -29,31 +30,60 @@ use types::Value;
// TODO: Support discard

pub nil -> Value =
"nil" { Value::Nil }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like rust-peg probably solves this problem generically, marking up your tokens with spans automatically. Did you investigate this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add a #position myself. This is WIP.

edn/src/types.rs Outdated

/// Span represents the current offset into the input string.
#[derive(PartialEq, Eq, Hash, Clone, Debug)]
pub struct Span(usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do what the Rust compiler and other places do, which is that a span is a start and an end (or a length, but I prefer end) into the input buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rustpeg only gives positions. Compute the length myself?

@victorporof
Copy link
Contributor Author

Fine -_-

Copy link
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just looked over this a little, and left some small notes.

As I say in more detail, I think there should be two parallel hierarchies: Value and ValueAndSpan. They're disjoint, with a map (a functor!) from ValueAndSpan onto Value.

edn/src/types.rs Outdated
@@ -214,12 +264,6 @@ impl Value {
}
}

impl From<f64> for Value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One type can have many From implementations. If anything, I want edn::Value to have From for more types (when there's no ambiguity). Can you explain why you removed this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, you made these macros. For just a few implementations I think this trickery is overkill, but it's your call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a leftover from the previous implementation, but makes this a bit more symmetrical/generic so I'll keep it.

edn/src/types.rs Outdated
@@ -248,6 +292,18 @@ impl Ord for Value {
}
}

impl PartialOrd for ValueAndSpan {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This surprises me. I don't expect ValueAndSpan instances with different spans to compare or order equal.

@rnewman, do you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly necessary, I'll remove it if I have to.

edn/src/types.rs Outdated
#[test]
fn test_value_from() {
assert_eq!(Value::from(42f64), Value::Float(OrderedFloat::from(42f64)));
assert_eq!(Value::from_float(42f64), Value::Float(OrderedFloat::from(42f64)));
assert_eq!(Value::from_ordered_float(OrderedFloat::from(42f64)), Value::Float(OrderedFloat::from(42f64)));
assert_eq!(Value::from_bigint("42").unwrap(), Value::BigInteger(BigInt::from(42)));
}

#[test]
fn test_print_edn() {
let string = "[ 1 2 ( 3.14 ) #{ 4N } { foo/bar 42 :baz/boz 43 } [ ] :five :six/seven eight nine/ten true false nil #f NaN #f -Infinity #f +Infinity ]";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're mapping Value into ValueAndSpan, manually. Why not map ValueAndSpan onto Value, which we need for all the other consumers? As written, this isn't testing the span business anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I duplicate the hierarchy as described above, then yes, mapping ValueAndSpan onto Value would be possible here. Otherwise not (because Value::Collection was over ValueAndSpan)

edn/src/types.rs Outdated
// We're using a LinkedList here instead of a Vec or VecDeque because the
// LinkedList is faster for appending (which we do a lot of).
// See https://github.com/mozilla/mentat/issues/231
List(LinkedList<Value>),
List(LinkedList<ValueAndSpan>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should remain Value, for current consumers.

Then, we want a parallel hierarchy like you've defined here, which could have two types (ValueAndSpan(inner: InnerValueAndSpan , span: Span) like you have, and InnerValueAndSpan where the container values are ValueAndSpan, just like here.

The map between is clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That's a lot of code duplication. No way around it?

start:#position "nil" end:#position {
ValueAndSpan {
inner: Nil,
span: Some(Span(start, end))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Nifty!

@victorporof victorporof force-pushed the end-span branch 3 times, most recently from 74e8c47 to c733b73 Compare February 11, 2017 10:19
@victorporof
Copy link
Contributor Author

Addressed comments and rebased.

@victorporof victorporof force-pushed the end-span branch 2 times, most recently from ff833b0 to 02cc039 Compare February 11, 2017 10:35
@victorporof victorporof changed the title [WIP] Add a span component to edn::Value Expose line/column/character span position information from parsed EDN streams Feb 11, 2017
@victorporof
Copy link
Contributor Author

@ncalexan this is ready for review.

Copy link
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks as I expect; I didn't look closely at the details.

Is there a reason to accept an optional span? That's what Value is for, right?

edn/src/types.rs Outdated
fn from(src: f64) -> Value {
Value::Float(OrderedFloat::from(src))
impl ValueAndSpan {
pub fn reduce(self) -> Value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reduction, but Rust tends to name general reductions fold and special cases like this into_inner, to be clear that you're consuming self.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You called this fold, right? This isn't a general fold, it's into_inner.

edn/src/types.rs Outdated
}
}

impl From<Value> for ValueAndSpan {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Not sure I want this to be trivial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Copy link
Contributor Author

@victorporof victorporof Feb 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this was because it seemed useful to have a conversion between SpannedValue and Value, but in order to have that, we'd also need to have something from ValueAndSpan to Value, because SpannedValue collections have ValueAndSpan children. Not having trivial conversion here meant that a lot of code (like all the from_foo, from_keyword, from_namespace etc. methods would've had to be duplicated. I've ultimately done it with macros to avoid any trivial conversion from Value to ValueAndSpan.

@ncalexan
Copy link
Member

Oh -- and do you think there might be a reason to just accept a generic type S for the span? I could imagine other situations where one might want to mark up trees (with weights, scores, metadata) and the structure is there.

@victorporof victorporof force-pushed the end-span branch 2 times, most recently from 30989dc to f246677 Compare February 13, 2017 16:48
@victorporof
Copy link
Contributor Author

Oh -- and do you think there might be a reason to just accept a generic type S for the span? I could imagine other situations where one might want to mark up trees (with weights, scores, metadata) and the structure is there.

I don't think there's any value in having that at this time.

@victorporof
Copy link
Contributor Author

@ncalexan this is ready for final review. I've addressed all comments.

@victorporof victorporof force-pushed the end-span branch 3 times, most recently from 3b880b0 to 67fcb6e Compare February 13, 2017 17:15
}

// It's important that float comes before integer or the parser assumes that
// floats are integers and fails to parse
pub value -> Value =
pub value -> ValueAndSpan =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to do all of the #position stuff here, in value? That is, capture the posititions before and after the __ markers below, parse Value in the internal bits, and then convert to ValueAndSpan? Maybe not, because of the internal whitespace that gets ignored?

Ah, as I think more about this, I think not: the recursive nature of ValueAndSpan denies that possibility, since the children of a container need to have the right type. So sad. (This might be possible with a ValueAndSpan<S> parameterized by an optional span type.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you've figured out, not possible due to the fact that SpannedValue has ValueAndSpan children in collections.

edn/src/types.rs Outdated
@@ -44,12 +43,63 @@ pub enum Value {
Map(BTreeMap<Value, Value>),
}

use self::Value::*;
/// SpannedValue is the same as Value but used in ValueAndSpan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SpannedValue is parallel to Value but container types have ValueAndSpan children.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the comment. Should be obvious though.

edn/src/types.rs Outdated

impl Display for Value {
// TODO: Make sure float syntax is correct, handle NaN and escaping.
// See https://github.com/mozilla/mentat/issues/232
fn fmt(&self, f: &mut Formatter) -> ::std::fmt::Result {
use self::Value::*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nifty -- I did not know this was possible in a function scope. TIL!

edn/src/types.rs Outdated
/// in an Option, like `<Option<i64>`.
/// Creates `as_$TYPE` helper functions for Value or SpannedValue, like
/// `as_integer()`, which returns the underlying value representing the
/// original variable wrapped in an Option, like `<Option<i64>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and below, drop the outer <>. So "like Option<i64>".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

edn/src/types.rs Outdated
fn from(src: f64) -> Value {
Value::Float(OrderedFloat::from(src))
impl ValueAndSpan {
pub fn reduce(self) -> Value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You called this fold, right? This isn't a general fold, it's into_inner.

Copy link
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still have some questions:

  • (Important) why this implementation of ordering? If we really want an ordering, I'd much rather have the compiler derive the ordering, first by inner and then by span.
  • (Unimportant) Is the test structural change significant, or just a choice you made?


#[test]
fn test_nil() {
assert_eq!(nil("nil").unwrap(), Nil);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout this whole file, you've converted a large number of tests from comparing against T to comparing against the edn::Value version of T. That's a valid choice, but why did you choose it? And could you have done this as a mechanical Pre: patch? (With a commit comment explaining why you wanted this or what it makes easier later, etc, etc.) As it stands I can't see what's really new here and what's just different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might be able to get away with a bit less changes, but the crux of the issue is that both SpannedValue and Value are imported the the tests file, and the edn parser returns ValueAndSpan now. Without those changes, we would't be checking Values anymore, but SpannedValues, so some kind of conversion needs to be done: this is via folding, or maintaining the previous intent to test Value, or something else. My approach was one of many possible ones. To make life easier for other people, I chose to drop the * import and make things specific.

@@ -26,7 +26,7 @@ fn test_entities() {
let input = r#"[[:db/add 101 :test/a "v"]
[:db/retract 102 :test/b "w"]]"#;

let edn = parse::value(input).unwrap();
let edn = parse::value(input).unwrap().fold();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I suggested into_inner, but maybe we should be explicit and say without_spans? That'll certainly read easiest...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

edn/src/types.rs Outdated

impl Ord for SpannedValue {
fn cmp(&self, other: &SpannedValue) -> Ordering {
Value::from(self.clone()).cmp(&Value::from(other.clone()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be very expensive. Why do either of these types implement Ord and PartialOrd at all? What is the desired ordering? Is it a domain concept? Is it because we're using edn::Value in some ordered data structure? Maybe the inner BTreeSet and BTreeMap? We might not want to do that.

Can we be clear on what we're doing here and why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on Slack, cloning isn't necessary, just a shortcut I took in order to get something that is landable (as in: provides value that doesn't exist yet, is better than nothing etc.). Changing this implementation to avoid cloning is relatively trivial, sufficient for a good followup good first bug, but I'll do it in this PR.

@victorporof
Copy link
Contributor Author

@ncalexan Addressed comments, avoided cloning in Ord implementation, and reduced test code churn as much as I could (although this seems to be strictly worse in terms of readability, because Value and Spanned value both have the same types so it's a bit hard to determine what exactly is being tested. Ready for another review.

@victorporof victorporof force-pushed the end-span branch 10 times, most recently from 5af9892 to e9ca1d2 Compare February 14, 2017 16:18
@victorporof
Copy link
Contributor Author

@ncalexan gentle review ping.

@victorporof victorporof force-pushed the end-span branch 3 times, most recently from 1898ecb to fbcf3f1 Compare February 15, 2017 17:01
Copy link
Member

@ncalexan ncalexan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would s/without_span/without_spans/, like I suggested in #282 (comment), since there are multiple spans getting stripped, not just the top-level span.

And I would bone out the commit comment to explain why we chose to have two types and code duplication, so that the next poor soul doesn't spend a lot of time learning the lessons we did.

Bombs away!

Signed-off-by: Victor Porof <victor.porof@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants