Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor stylistic changes #1069

Merged
merged 5 commits into from
Oct 24, 2016
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 28 additions & 28 deletions docs/guides/designing-the-delta-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ redirect_from:
- /guides/working-with-deltas/
---

Rich text editors lack a specification to express its own contents. Until recently, most rich text editors did not even know what was in their own edit areas. These editors just passes the user HTML, along with the burden of parsing and interpretting this. At any given time, this interpretation will differ from those of major browser vendors, leading to different editing experiences for users.
Rich text editors lack a specification to express its own contents. Until recently, most rich text editors did not even know what was in their own edit areas. These editors just pass the user HTML, along with the burden of parsing and interpretting this. At any given time, this interpretation will differ from those of major browser vendors, leading to different editing experiences for users.

Quill is the first rich text editor to actually understand its own contents. Key to this is Deltas, the specification describing rich text. Deltas are designed to be easy to understand and use. We will walk through some of the thinking behind Deltas, to shed light on *why* things are the way they are.

Expand All @@ -16,7 +16,7 @@ If you are looking for a reference on *what* Deltas are, the [Delta documentatio

## Plain Text

Let's start at the basics with just plain text. There already is a ubiquitous format to store plain text: the string. Now if we want to build upon this and describe formatted text, such as when a range is bold, we need to add additional information.
Let's start at the basics with just plain text. There already is an ubiquitous format to store plain text: the string. Now if we want to build upon this and describe formatted text, such as when a range is bold, we need to add additional information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically 'an' ubiquitous is not wrong, but 'a' is far more common especially nowadays. http://english.stackexchange.com/questions/280921/a-or-an-ubiquitous

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I've learned something new... the more you know, right?


Arrays are the only other ordered data type available, so we use an array of objects. This also allows us to leverage JSON for compatibility with a breadth of tools.

Expand All @@ -27,7 +27,7 @@ var content = [
];
```

If we want to add italics, underline, and other formats, we can add this to the main object, but it is cleaner to separate `text` from all of this so we organize formatting under one field, which we will name `attributes`.
If we want to add italics, underline, and other formats; we can add this to the main object, but it is cleaner to separate `text` from all of this so we organize formatting under one field, which we will name `attributes`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If we want to add italics, underline, and other formats" is not a complete sentence so I don't think a semicolon is appropriate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know you could use semi-colons as higher precedent separators in lists of lists. But it doesn't seem like that applies here? There is only one list, followed by an listless independent clause. None of the sources say a semi-colon can be used just as a general list separator, but just in the special case a list is in a larger list of lists.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fourth item in the first link
screen shot 2016-10-23 at 19 29 41

However, after further research, it seems the initial if makes it invalid as you well have said: it is not an independent clause... my bad.
Changing it.


```javascript
var content = [
Expand All @@ -49,14 +49,14 @@ var content = [
];
```

To solve this, we add the constraint that Deltas must be compact. With this compact constraint, the above representation is not a valid Delta, since it can be represented more compactly by the previous example, where "Hel" and "lo" were not separate. Similarly we cannot have `{ bold: false, italic: true, underline: null }`, because `{ italic: true }` is more compact.
To solve this, we add the constraint that Deltas must be compact. With this constraint, the above representation is not a valid Delta, since it can be represented more compactly by the previous example, where "Hel" and "lo" were not separate. Similarly we cannot have `{ bold: false, italic: true, underline: null }`, because `{ italic: true }` is more compact.


### Canonical

We have not assigned any meaning to `bold`, just that it describes some formatting for text. We could very well have used different names, such as `weighted` or `strong`, or used a different range of possible values, such as a numerical or descriptive range of weights. An example can be found in CSS, where most of these ambiguities are at play. If we saw bolded text on a page, we cannot predict if its rule set is `font-weight: bold` or `font-weight: 700`. This makes the task of parsing CSS to discern its meaning, much more complex.

We do not define the set of possible attributes, nor their meanings, but we do add an additional contraint that Deltas must be canonical. If two Deltas are equal, the content they represent must be equal, and there cannot be two unequal Deltas that represent the same content. Programmically, this allows you to simply deep compare two Deltas to determine if the content they represent are equal.
We do not define the set of possible attributes, nor their meanings, but we do add an additional contraint that Deltas must be canonical. If two Deltas are equal, the content they represent must be equal, and there cannot be two unequal Deltas that represent the same content. Programmatically, this allows you to simply deep compare two Deltas to determine if the content they represent is equal.

So if we had the following, the only conclusion we can draw is `a` is different from `b`, but not what `a` or `b` means.

Expand Down Expand Up @@ -88,14 +88,14 @@ This canonicalization applies to both keys and values, `text` and `attributes`.
- There is only one way to represent a newline which is with `\n`, not `\r` or `\r\n`
- `text: "Hello World"` unambiguously means there are precisely two spaces between "Hello" and "World"

Some of these choices may be customized by the user, but the canonical contraint in Deltas dictate that the choice must be unique.
Some of these choices may be customized by the user, but the canonical constraint in Deltas dictate that the choice must be unique.

This unambiguous predictability makes Deltas easier to work with, both because you have fewer cases to handle, but also because there are no surprises in what a corresponding Delta well look like. Long term, this makes applications using Deltas easier to understand and maintain.
This unambiguous predictability makes Deltas easier to work with, both because you have fewer cases to handle and because there are no surprises in what a corresponding Delta will look like. Long term, this makes applications using Deltas easier to understand and maintain.


## Line Formatting

Line formats affect the contents of the entire line, so it present an interesting challenge for our compact and canonical constraint. A seemingly reasonable way to represent center aligned text is this:
Line formats affect the contents of an entire line, so they present an interesting challenge for our compact and canonical constraints. This is a seemingly reasonable way to represent centered text:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I prefer "A seemingly reasonable way to represent center aligned text is this:" is the that pronoun "this" is at the end so there's no confusion what it is referring to. When it is at the beginning, the reader may wrongly associate it with a noun in the previous sentence.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, will change to something more along the lines of " see as follows".
"is this" sounds a bit too prosaic.


```javascript
var content = [
Expand Down Expand Up @@ -123,7 +123,7 @@ var content = [

But if the answer is yes, then we violate the canonical constraint since any permutation of characters having an align attribute would represent the same content.

So we cannot just naively get rid of the newline character. We have to also either get rid of line attributes, or expand line attributes to fill all characters on the line. But what if we deleted the newline from this:
So we cannot just naively get rid of the newline character. We also have to either get rid of line attributes, or expand them to fill all characters on the line. But what if we removed the _newline_ from it:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use asterisks instead of underscores as it is more consistent with other Quill docs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. The reason why I've used italics was to change the tone, rather than to apply emphasis. But for the sake of consistency, I'll do as you ask.


```javascript
var content = [
Expand All @@ -135,7 +135,7 @@ var content = [

It is not clear if our resulting line is aligned center or right. We could delete both or have some ordering rule to favor one over the other, but our Delta is becoming more complex and harder to work with on this path.

This problem begs for atomicity, and we find this in the newline character itself. But we have an off by one problem in that if we have n lines, we only have n-1 newline characters.
This problem begs for atomicity, and we find this in the _newline_ character itself. But we have an off by one problem in that if we have _n_ lines, we only have _n-1_ newline characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same asterisk vs underscore comment


To solve this, Quill "adds" a newline to all documents and always ends Deltas with "\n".

Expand All @@ -152,7 +152,7 @@ var content = [

## Embedded Content

We want to add embedded content like images or video. Strings were natural to use for text but we have a lot more options for embeds. Since there are different types of embeds, our choice just needs to include this type information and then the actual content. There are many reasonable options here but we will use an object whose only key is the embed type and the value is the content representation, which may be any type or value.
We want to add embedded content like images or video. Strings were natural to use for text but we have a lot more options for embeds. Since there are different types of embeds, our choice just needs to include this type information, and then the actual content. There are many reasonable options here but we will use an object whose only key is the embed type and the value is the content representation, which may have any type or value.

```javascript
var img = {
Expand Down Expand Up @@ -189,9 +189,9 @@ As the name Delta implies, our format can describe changes to documents, as well

#### Delete

To describe deleting text, we need to know where and how many characters to delete. To delete embeds, there need not be any special treatment, other than to understand the length of an embed. If it is anything other than one, we would need to specify what happens when only part of an embed is deleted. There is currently no such specification, so embeds are all of length one, regardless of how many pixels make up an image, how many minutes long a video is, or how many slides are in a deck.
To describe deleting text, we need to know where and how many characters to delete. To delete embeds, there needs not be any special treatment, other than to understand the length of an embed. If it is anything other than one, we would then need to specify what happens when only part of an embed is deleted. There is currently no such specification, so regardless of how many pixels make up an image, how many minutes long a video is, or how many slides are in a deck; embeds are all of length _one_.

One reasonable way to describe deletion is to explictly store this deletion index and length.
One reasonable way to describe a deletion is to explicitly store its index and length.

```javascript
var delta = [{
Expand All @@ -215,7 +215,7 @@ Now that Deltas may be describing changes to a non-empty document, `{ insert: "H

#### Format

Similar to deletes, we need to specify the range of text to format, and format change. Formatting exists in the `attributes` object, so a simple solution is to provide another `attributes` object to merge with the existing `attributes` object. This merge is shallow to keep things simple. A use case that both requires a deep merge and is compelling enough to warrant the added complexity has not been found.
Similar to deletes, we need to specify the range of text to format along with the format change itself. Formatting exists in the `attributes` object, so a simple solution is to provide an additional `attributes` object to merge with the existing one. This merge is shallow to keep things simple. We have not found an use case that is compelling enough to require a deep merge and warrants the added complexity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay with the wording on the first sentence. Could go both ways but I think a comma is appropriate before "along"


```javascript
var delta = [{
Expand All @@ -229,9 +229,9 @@ var delta = [{
}];
```

The only special case is when we want to remove formatting. We will use `null` for this purpose, so `{ bold: null }` would mean remove the bold format. We could have specified any falsy value, but there may be legitimate use cases for an attribute value to be `0` or the empty string.
The exceptional case is when we want to remove formatting. We will use `null` for this purpose, so `{ bold: null }` would mean remove the bold format. We could have specified any falsy value, but there may be legitimate use cases for an attribute value to be `0` or an empty string (i.e. `' '`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "special case" and "the empty string" is more idiomatic.

Copy link
Author

@justincorrigible justincorrigible Oct 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Special case" is indeed more idiomatic, but "the" in this case sounds weird. It implies there's only "that one", some sort of specificity which is not explained or necessary.

Copy link
Member

@jhchen jhchen Oct 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty string is definitely "a thing" in programming language theory, similar to the empty set in mathematics, but I have seem it used in formal specifications as well. However "an empty string" is not wrong nor is just "empty string". The given example of ' ' is not the/an empty string however since its length is not zero.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. you're entirely right about the example.
And there does seem to be some mentions (albeit they —to my understanding— do seem slightly contextual) of "the empty string". You are most likely right about this one too, as you are better versed at language theory than I am. Changing it.

hahaha so many commits. Want me to squash? or will you please at merge?


We now have to be careful with indexes at the application layer. As noted earlier, Deltas do not ascribe any inherent meaning to any the `attributes`'s key-value pairs, nor any embed types or values. Deltas do not know that images do not have durations, text does not have alternative texts, and videos cannot be bolded. The following is a legal Delta that might have been the result of applying other legal Deltas, by an application that was not careful of format ranges.
**Note:** We now have to be careful with indexes at the application layer. As mentioned earlier, Deltas do not ascribe any inherent meaning to any the `attributes`' key-value pairs, nor any embed types or values. Deltas do not know an image does not have duration, text does not have alternative texts, and videos cannot be bolded. The following is a _legal_ Delta that might have been the result of applying other _legal_ Deltas, by an application not being careful of format ranges.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same asterisk comment


```javascript
var delta = [{
Expand All @@ -258,7 +258,7 @@ var delta = [{

#### Pitfalls

First, we should be clear that this index must refer to the index in the document **before** any Operations are applied. Otherwise, a later Operation may delete a previous insert, unformat a previous format, etc, which would violate compactness.
First, we should be clear that an index must refer to its value in the document **before** any Operations are applied. Otherwise, a later Operation may delete a previous insert, unformat a previous format, etc., which would violate compactness.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer "its position" over "its value" since it is technically more correct. I do agree on using a different word than index though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Position does fit much better. I must have been tired at this point haha.


Operations must also be strictly ordered to satisfy our canonical constraint. Ordering by index, then length, and then type is one valid way this can be accomplished.

Expand All @@ -268,17 +268,17 @@ The number of reasons a Delta might be invalid is piling up. A better format wou

#### Retain

If we step back from our compactness formalities for a moment, we can describe a much simpler format to describe inserting, deleting, and formatting:
If we step back from our compactness formalities for a moment, we can describe a much simpler format to express inserting, deleting, and formatting:

- A Delta would have Operations that is at least as long as the document that it is modifying.
- A Delta would have Operations that are at least as long as the document being modified.
- Each Operation would describe what happens to the character at that index.
- Optional insert Operations may make the Delta longer than the document it describes.

This necessitates the creation of a new Operation, that simply means keep this character as is. We call this `retain`.
This necessitates the creation of a new Operation, that will simply mean "keep this character as is". We call this a `retain`.

```javascript
// Starting with "HelloWorld",
// bold "Hello", and insert a space
// bold "Hello", and insert a space right after it
var change = [
{ format: true, attributes: { bold: true } }, // H
{ format: true, attributes: { bold: true } }, // e
Expand All @@ -294,9 +294,9 @@ var change = [
]
```

Since every character is described, explicit indexes and lengths are no longer necessary. This makes out of order indexes and overlapping ranges impossible to express.
Since every character is described, explicit indexes and lengths are no longer necessary. This makes overlapping ranges and out-of-order indexes impossible to express.

From this, we can make the easy optimization to merge adjacent equal Operations, re-introducing length. If the last Operation is a `retain`, we can also simply drop this, since it instructs us to "do nothing to the rest of the document".
Therefore, we can make the easy optimization to merge adjacent equal Operations, re-introducing _length_. If the last Operation is a `retain` we can simply drop it, for it simply instructs to "do nothing to the rest of the document".

```javascript
var change = [
Expand All @@ -305,7 +305,7 @@ var change = [
]
```

You might notice that a `retain` is in some ways just special case of a `format`. For example, there is no practical difference between `{ format: 1, attributes: {} }` and `{ retain: 1 }`. Compacting would drop the empty `attributes` object leaving us with just `{ format: 1 }`, creating a canonicalization conflict. So we simply combine `format` and `retain`, and keep the name `retain`.
Furthermore, you might notice that a `retain` is in some ways just a special case of `format`. For instance, there is no practical difference between `{ format: 1, attributes: {} }` and `{ retain: 1 }`. Compacting would drop the empty `attributes` object leaving us with just `{ format: 1 }`, creating a canonicalization conflict. Thus, in our example we will simply combine `format` and `retain`, and keep the name `retain`.

```javascript
var change = [
Expand All @@ -314,13 +314,13 @@ var change = [
]
```

We have now have a Delta format that is very close to the actual Delta format.
We now have a Delta that is very close to the current standard format.

#### ops
#### Ops

Right now we have an easy to use JSON Array that describes rich text. This is great at the storage and transport layers, but applications could benefit from more functionality. We can add this by implementing Deltas as a class, that can be easily initialized from or exported to JSON, and providing relevant methods.
Right now we have an easy to use JSON Array that describes rich text. This is great at the storage and transport layers, but applications could benefit from more functionality. We can add this by implementing Deltas as a class, that can be easily initialized from or exported to JSON, and then providing it with relevant methods.

At the time of Delta's inception, it was not possible to sub-class an Array. So Deltas are Objects, with a single property `ops` that stores an array of Operations we have been discussing.
At the time of Delta's inception, it was not possible to sub-class an Array. For this reason Deltas are expressed as Objects, with a single property `ops` that stores an array of Operations like the ones we have been discussing.

```javascript
var delta = {
Expand Down