-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change references to "dataflow" in docs to "pipeline" #5272
Conversation
Co-authored-by: Noah Treuhaft <noah.treuhaft@gmail.com>
docs/language/operators/fork.md
Outdated
merged with an automatically inserted [combine operator](combine.md). | ||
|
||
### Examples | ||
|
||
_Copy input to two paths and merge_ | ||
_Copy input to two pipeline branches and combine with the implied operator_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_Copy input to two pipeline branches and combine with the implied operator_ | |
_Copy input to two pipeline branches and combine with the implied `merge` operator_ |
|
||
### Examples | ||
|
||
_Copy input to two paths and combine_ | ||
_Copy input to two pipeline branches and merge_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Make the same change you made to the similar example in docs/language/operators/fork.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge after you fix #5272 (review) and CI is green.
What's Changing
References to "dataflow" in the user-facing docs are changed to "pipeline" and references to a "leg" or "path" of dataflow are changed to "branches" of pipelines. This is a prerequisite to revisiting the changes proposed in #5264.
If you'd like to review the docs in rendered form, I've published a build from this branch at a personal staging site https://spiffy-gnome-8f2834.netlify.app/docs/next.
Why
@mattnibs recently reacted to a line of pre-existing docs text that got moved around in the proposed #5264 changes. The line mentioned a "dataflow operator sequence" which indeed seems like a mouthful. In a group discussion about this, there was consensus that the word "pipeline" could be used here instead of "dataflow" and in so doing make the extra word "sequence" unnecessary.
We then discussed the possibility of rippling this kind of change throughout the rest of the docs. While in the past we had what we felt were defensible reasons for citing Unix pipelines one time early in the docs but then preferring the word "dataflow", @mccanne pointed out how several similar technologies in recent years have all settled on "pipeline". That led to consensus that it was worth making a pass through the docs and seeing how it reads with this wider change.
Details
When summarizing the changes from this PR's first commit with the team, I noted how text that used to read "multiple legs of the dataflow path" would now read "multiple legs of the pipeline", which led @mccanne to propose also switching from "legs" to "branches". When making that change I also spotted several places where we spoke of just "paths" in the abstract and so to be consistent I also changed these to "branches" or "pipeline branches" depending on which reads clearer in a given context.
Having made the full set of changes, I'm pretty happy with how it reads. "Leg" by itself was not particularly ambiguous before, but "path" was arguably overloaded since we sometimes use it to refer to a storage location and sometimes to the hierarchical set of names referring to a field in a nested record, and these multiple uses sometimes showed up within the same page (e.g., the
from
operator doc). Therefore starting to say "branch" seems like an improvement. It's true we also use "branch" in multiple ways (e.g., branches of a pool in the Zed lake) but those seem to come up in different contexts.I poked around to check what links outside the Zed repo might break from this change. Indeed, there's one hyperlink in a blog article on the brimdata.io site that I'd update, which is no biggie. However, something else I noticed at the same time is that the Introducing zq blog article currently says "dataflow" 20+ times but says "pipeline" once just to say how Zed's is "kind of like a Unix pipeline but not really". @mccanne already signed off on leaving that one alone, and I'm inclined to agree since the article speaks of dataflow in a way that seems academic enough (e.g., describing
jq
’s computational model as "stateless dataflow") and it reads fine as a standalone doc.