Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change references to "dataflow" in docs to "pipeline" #5272

Merged
merged 6 commits into from
Sep 16, 2024
Merged

Conversation

philrz
Copy link
Contributor

@philrz philrz commented Sep 11, 2024

What's Changing

References to "dataflow" in the user-facing docs are changed to "pipeline" and references to a "leg" or "path" of dataflow are changed to "branches" of pipelines. This is a prerequisite to revisiting the changes proposed in #5264.

If you'd like to review the docs in rendered form, I've published a build from this branch at a personal staging site https://spiffy-gnome-8f2834.netlify.app/docs/next.

Why

@mattnibs recently reacted to a line of pre-existing docs text that got moved around in the proposed #5264 changes. The line mentioned a "dataflow operator sequence" which indeed seems like a mouthful. In a group discussion about this, there was consensus that the word "pipeline" could be used here instead of "dataflow" and in so doing make the extra word "sequence" unnecessary.

We then discussed the possibility of rippling this kind of change throughout the rest of the docs. While in the past we had what we felt were defensible reasons for citing Unix pipelines one time early in the docs but then preferring the word "dataflow", @mccanne pointed out how several similar technologies in recent years have all settled on "pipeline". That led to consensus that it was worth making a pass through the docs and seeing how it reads with this wider change.

Details

When summarizing the changes from this PR's first commit with the team, I noted how text that used to read "multiple legs of the dataflow path" would now read "multiple legs of the pipeline", which led @mccanne to propose also switching from "legs" to "branches". When making that change I also spotted several places where we spoke of just "paths" in the abstract and so to be consistent I also changed these to "branches" or "pipeline branches" depending on which reads clearer in a given context.

Having made the full set of changes, I'm pretty happy with how it reads. "Leg" by itself was not particularly ambiguous before, but "path" was arguably overloaded since we sometimes use it to refer to a storage location and sometimes to the hierarchical set of names referring to a field in a nested record, and these multiple uses sometimes showed up within the same page (e.g., the from operator doc). Therefore starting to say "branch" seems like an improvement. It's true we also use "branch" in multiple ways (e.g., branches of a pool in the Zed lake) but those seem to come up in different contexts.

I poked around to check what links outside the Zed repo might break from this change. Indeed, there's one hyperlink in a blog article on the brimdata.io site that I'd update, which is no biggie. However, something else I noticed at the same time is that the Introducing zq blog article currently says "dataflow" 20+ times but says "pipeline" once just to say how Zed's is "kind of like a Unix pipeline but not really". @mccanne already signed off on leaving that one alone, and I'm inclined to agree since the article speaks of dataflow in a way that seems academic enough (e.g., describing jq’s computational model as "stateless dataflow") and it reads fine as a standalone doc.

docs/language/expressions.md Outdated Show resolved Hide resolved
docs/language/lateral-subqueries.md Outdated Show resolved Hide resolved
@philrz philrz requested review from nwt and a team September 16, 2024 18:24
merged with an automatically inserted [combine operator](combine.md).

### Examples

_Copy input to two paths and merge_
_Copy input to two pipeline branches and combine with the implied operator_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_Copy input to two pipeline branches and combine with the implied operator_
_Copy input to two pipeline branches and combine with the implied `merge` operator_


### Examples

_Copy input to two paths and combine_
_Copy input to two pipeline branches and merge_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Make the same change you made to the similar example in docs/language/operators/fork.md.

Copy link
Member

@nwt nwt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge after you fix #5272 (review) and CI is green.

@philrz philrz merged commit e100767 into main Sep 16, 2024
4 checks passed
@philrz philrz deleted the dataflow-pipeline branch September 16, 2024 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants