Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.0 arch draft #12880

Merged
merged 12 commits into from
Jan 21, 2019
Merged

4.0 arch draft #12880

merged 12 commits into from
Jan 21, 2019

Conversation

kalexand-rh
Copy link
Contributor

@kalexand-rh kalexand-rh commented Nov 15, 2018

Draft of architecture updates from the arch overview call.

http://file.rdu.redhat.com/kalexand/111618/4.0_arch/architecture/architecture.html

@kalexand-rh kalexand-rh self-assigned this Nov 15, 2018
@openshift-ci-robot openshift-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 15, 2018
@kalexand-rh kalexand-rh force-pushed the 4.0_arch branch 3 times, most recently from 90fff57 to 6f8c6cf Compare November 16, 2018 16:44
@kalexand-rh
Copy link
Contributor Author

@vikram-redhat, will you PTAL? Do you have a suggestion for someone who'd be willing to do a SME review of this version and look at some graphics markups?

@kalexand-rh kalexand-rh force-pushed the 4.0_arch branch 3 times, most recently from 9b25fcc to 987344c Compare November 16, 2018 20:27
@vikram-redhat
Copy link
Contributor

@kalexand-rh - thanks. @derekwaynecarr - would you be the best person to review this for docs?

@vikram-redhat
Copy link
Contributor

@smarterclayton and @eparis - tagging you as well for a review.

@vikram-redhat vikram-redhat added this to the Future Release milestone Nov 24, 2018
Copy link
Contributor

@smarterclayton smarterclayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The group leads need to get engaged more and help flush out the high level topics. This is a good start but I expect more input from them to describe the high level areas.

Can you add a rough “high level topics” tree to this PR so we can comment and discuss what the top sections should be and ensure they match?


Like {product-title} v3, {product-title} v4 is a layered system designed to
expose underlying container images and Kubernetes concepts as accurately as
possible, with a focus on easy composition of applications by a developer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to touch this up, I’m not sure this is going to be the right way to phrase 4. I think this is where we reset expectations for customers, and it’s going to be a combination of the core kubernetes mission (application infrastructure), the core 4 mission (automated and self-monitoring, flexible infrastructure), and the openshift story (giving developers tools to evolve their applications under operational oversight).

as both the fundamental unit of the product and an option for easily deploying
and managing utilities that your apps use.

{product-title} has an Operator-based architecture of smaller, decoupled units
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I’d focus on selling us as kubernetes first, with a set of standard components that you need (network, ingress, logging, monitoring), made simple to scale, upgrade, and maintain through a focus on automating operations with operators.

objects stored in etcd, a reliable clustered key-value store. Those services are
broken down by function:

* Operators, which run the core {product-title} services.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit too much like a list - I would hit the three core layers above and give examples of each. Operators are a detail, really, even if they underlie everything (more glue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which three layers? Dev apps, K8 core app management, and the OpenShift management layer?

build working images and react to new images.
** Deployments, which expand support for the software development and deployment
lifecycle.
** Routes, which announce your service to the world.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ingress and Routes

** Deployments, which expand support for the software development and deployment
lifecycle.
** Routes, which announce your service to the world.
** Templates, which allow you to simultaneously create many objects that are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s time to deemphasoze templates - we need to describe our catalog of supporting application infrastructure as having three parts - operators (expose apis that automate full component lifecycle like databases), service bindings (consume services running elsewhere), and templates (simple instant examples).

[id='node-types-{context}']
= Node types in {product-title}

{product-title} uses bootstrap, master, and worker nodes. The bootstrap node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven’t settled on naming yet for worker nodes. I would however say that we don’t “use ... nodes”, we assign nodes roles within the cluster that define their security boundaries and have specific roles.

It’s unclesr to me that this is the best way to communicate this though - bootstrap nodes are a detail of installation, while master and worker/compute nodes are what an admin deals with every day. I wouldn’t give them the same level of importance.

This page would be better as a discussion of the roles nodes play (use role instead of type, role is the official term). Connecting it to the ways people can change nodes (masters today cannot be changed, machines sets mean adding new nodes roles is easy), how security boundaries work, what tools are used to subdivide and group nodes is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make some updates based on your comments, but I don't have all of this context. Do you have a suggestion for who to talk to for more context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrades your cluster. By accepting automatic updates, you can automatically
keep your cluster up to date with the most recent compatible components.

To allow Cincinnati to provide only compatible updates, a release verification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t call it Cincinnati. Say it’s the OpenShift update service. Cincinnati is more of a code name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I can't remember hearing its true name. Is it the "OpenShift update service" in all versions, or is it the "{product-title} update service?"

[id='security-overview-{context}']
= Security in {product-title}

The {product-title} and Kubernetes APIs authenticate users who present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a great set of content (it’s really unbalanced). The auth teams need to provide feedback to correct this but it should be talking about our threat model, the different tools we have for security, and provide high level overview of our security subsystems. This page as it is is not useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dragged it over from the 3.x stream. :)

@enj, @ericavonb, thoughts?

always runs as a `systemd` process.


[id='cluster-version-operator-{context}']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t want to focus on these lower level details until we have an architecture topic on “upgrades” (which covers most of the details). I would do link outs to the major functional areas and list a few of the core operators and their roles, but this is too focused on “what” and not “why”. The details below would be maybe a subtopic under specific operators but as is this is too messy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are basically no link outs or xrefs in the 4.0 docs - if more information about a particular topic is required, it has to be included in the same assembly, or we can link from one top-level assembly to another top-level assembly.

Who do you suggest to talk to for the improvements to the upgrade portion?

The cluster version Operator orchestrates all things.

{product-title} 4.0 introduces several new components that support the cluster
version Operator, including Cincinnati and Telemetry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telemetry should be split out and part of an “observability” high level arch topic that then describes the major subsystems and what they do together (monitoring with Prometheus, alertmanager and grafana, logging, and telemetry).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you either have a suggestion for a contact who can explain how these fit together or the outline for the user story? I'm not familiar enough with how monitoring, alerting, grafana, logging, and telemetry fit together to say what that story is, and all of our assemblies need to have one.

@smarterclayton
Copy link
Contributor

@derekwaynecarr @eparis I’d like to see us get the high level topics agreed on and help flesh out which topics are under covered. We need to invest more time with Kathryn and the docs team than we have historically on these docs.

Copy link
Contributor Author

@kalexand-rh kalexand-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton, thank you! I'm working on updates based on your comments.

I'm not sure who the best contacts are for some of your requested changes - will you tag some more people in for me?

based on customized parameters.
* Controllers, which read those REST APIs, apply changes to other objects, and
report status or write back to the object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who should provide this information?

{product-title} offers two installation options: fully-managed infrastructure and bring your own
infrastructure. In version 4.0, the fully-managed option allows you to install a
cluster in Amazon Web Services (AWS) that runs on Red Hat CoreOS nodes. If you want to
use any other cloud or install your cluster on-premise, use the bring your own
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a public-facing roadmap I can link to? If there's a good source of information that interested parties can use to track our changes, I'm happy to point them to it.



With both installation types, installation and upgrade both use a controller
that constantly reconciles component versions as if it were any other Kubernetes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who's the right person to talk to about "controller" versus "operator," and do you have a suggestion for who might provide better nuance?

modules/machine-api-overview.adoc Outdated Show resolved Hide resolved
both its updates and updates to RHCOS, {product-title} provides an opinionated
lifecycle management experience that simplifies the orchestration of node upgrades.

{product-title} employs three DaemonSets and controllers to simplify node management:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who's your SME of choice for this update?

upgrades your cluster. By accepting automatic updates, you can automatically
keep your cluster up to date with the most recent compatible components.

To allow Cincinnati to provide only compatible updates, a release verification
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I can't remember hearing its true name. Is it the "OpenShift update service" in all versions, or is it the "{product-title} update service?"

[id='security-overview-{context}']
= Security in {product-title}

The {product-title} and Kubernetes APIs authenticate users who present
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dragged it over from the 3.x stream. :)

@enj, @ericavonb, thoughts?

objects stored in etcd, a reliable clustered key-value store. Those services are
broken down by function:

* Operators, which run the core {product-title} services.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which three layers? Dev apps, K8 core app management, and the OpenShift management layer?

[id='node-types-{context}']
= Node types in {product-title}

{product-title} uses bootstrap, master, and worker nodes. The bootstrap node
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make some updates based on your comments, but I don't have all of this context. Do you have a suggestion for who to talk to for more context?

The cluster version Operator orchestrates all things.

{product-title} 4.0 introduces several new components that support the cluster
version Operator, including Cincinnati and Telemetry.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you either have a suggestion for a contact who can explain how these fit together or the outline for the user story? I'm not familiar enough with how monitoring, alerting, grafana, logging, and telemetry fit together to say what that story is, and all of our assemblies need to have one.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 16, 2018
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 2, 2019
@kalexand-rh
Copy link
Contributor Author

@enxebre, @crawford, will you PTAL at this draft architecture overview? @smarterclayton has some great suggestions and requests that I don't have enough information to implement. (If you have other suggestions for stakeholders to review, please tag them in!)

class, which describes the types of compute nodes that are offered for different
cloud platforms. For example, a `machine` type for a worker node on Amazon Web
Services (AWS) might define a specific machine type and required metadata.
`MachineClasses`:: A unit that defines a class of `machines` and facilitates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MachineClassses are not yet integrated with the platform.
MachineDeployments might be landing soon openshift/installer#990. These are mainly to machines what "deployments" to pods. I'll keep this thread updated.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo extending and adding more info like https://www.linux.com/blog/event/kubecon/2018/4/extending-kubernetes-cluster-api will clarify a lot of end users who are not familiar with K8 Cluster API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre, I'll keep an eye out for your updates and look for additional content to include.

modules/abstraction-layers.adoc Outdated Show resolved Hide resolved
modules/architecture-overview.adoc Show resolved Hide resolved
modules/cloud_installations.adoc Outdated Show resolved Hide resolved
modules/cloud_installations.adoc Outdated Show resolved Hide resolved
modules/operators-overview.adoc Outdated Show resolved Hide resolved
modules/telemetry-service-overview.adoc Show resolved Hide resolved
modules/update-service-overview.adoc Outdated Show resolved Hide resolved
modules/update-service-overview.adoc Show resolved Hide resolved
modules/update-service-overview.adoc Show resolved Hide resolved
modules/update-service-overview.adoc Outdated Show resolved Hide resolved
Copy link
Contributor Author

@kalexand-rh kalexand-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking and @enxebre, thank you! I've made some of the easier updates in your comments and added a few more questions.

modules/abstraction-layers.adoc Outdated Show resolved Hide resolved
modules/architecture-overview.adoc Show resolved Hide resolved
modules/installation-options.adoc Show resolved Hide resolved
modules/machine-api-overview.adoc Outdated Show resolved Hide resolved
class, which describes the types of compute nodes that are offered for different
cloud platforms. For example, a `machine` type for a worker node on Amazon Web
Services (AWS) might define a specific machine type and required metadata.
`MachineClasses`:: A unit that defines a class of `machines` and facilitates
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre, I'll keep an eye out for your updates and look for additional content to include.

or roles the second-level Operator runs as, the CRD and pull secret that drives
the operation of the Operator, and the Operator deployment.

Second-level Operators write out to a CRD resource called the cluster Operator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vikram-redhat, what's your opinion on linking v incorporating upstream docs like this?

modules/operators-overview.adoc Show resolved Hide resolved
Some Red Hat Operators drive the cluster functions, like the scheduler and
problem detectors. Others are provider for you to manage yourself and use in
your applications, like etcd. {product-title} also offers certified Operators,
which the community built around the OLM and maintains. These certified Operators
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adellape, do you have more information about this process?

your applications, like etcd. {product-title} also offers certified Operators,
which the community built around the OLM and maintains. These certified Operators
are traditional applications that are Kubernetes-aware because they are wrapped
in an Operator.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking, I see where you're going, but I don't think that "wrap" would translate well. Does this sound ok to you?

These certified Operators provide an API layer to traditional applications so you can manage the application through Kubernetes constructs.

modules/update-service-overview.adoc Show resolved Hide resolved
@kalexand-rh
Copy link
Contributor Author

I've created some follow-up issues and am merging per @vikram-redhat.

@kalexand-rh kalexand-rh merged commit 3e996da into openshift:enterprise-4.0 Jan 21, 2019
@kalexand-rh kalexand-rh deleted the 4.0_arch branch January 21, 2019 15:57
wking added a commit to wking/openshift-docs that referenced this pull request Oct 23, 2020
I'd initially wanted to address "It provides a graph, or diagram that
contain" -> "It provides a graph, or diagram, that contains".  But the
"of component Operators" bit was confusing to me to.  So I've
reworded to lead with "graph of recommended update(s)", and reduce to
a single sentence.  I'd also be fine leading with a description of
release images, and then introducing the recommended update edges that
link them, but that seemed like a bigger refactor.  Wording I'm
replacing is from 29016d7 (draft of early 4.0 architecture updates,
2018-11-08, openshift#12880) and 6e1b894 (Initial add of modularized arch
guide content, 2019-05-21, openshift#14991).
wking added a commit to wking/openshift-docs that referenced this pull request Jun 9, 2021
…hift, including RHCOS"

The RHCOS mention is from way back in 1ac3751 (some updates per
Clayton, 2018-11-26, openshift#12880).  But:

  $ git --no-pager grep -h supported modules/rhcos-about.adoc
  {op-system} is supported only as a component of {product-title} {product-version} for all {product-title} machines....

This commit rephrases the update docs to put RHCOS under OpenShift, so
folks don't get ideas and think it is a stand-alone product.
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/openshift-docs that referenced this pull request Jun 10, 2021
…hift, including RHCOS"

The RHCOS mention is from way back in 1ac3751 (some updates per
Clayton, 2018-11-26, openshift#12880).  But:

  $ git --no-pager grep -h supported modules/rhcos-about.adoc
  {op-system} is supported only as a component of {product-title} {product-version} for all {product-title} machines....

This commit rephrases the update docs to put RHCOS under OpenShift, so
folks don't get ideas and think it is a stand-alone product.
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/openshift-docs that referenced this pull request Jun 10, 2021
…hift, including RHCOS"

The RHCOS mention is from way back in 1ac3751 (some updates per
Clayton, 2018-11-26, openshift#12880).  But:

  $ git --no-pager grep -h supported modules/rhcos-about.adoc
  {op-system} is supported only as a component of {product-title} {product-version} for all {product-title} machines....

This commit rephrases the update docs to put RHCOS under OpenShift, so
folks don't get ideas and think it is a stand-alone product.
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/openshift-docs that referenced this pull request Jun 10, 2021
…hift, including RHCOS"

The RHCOS mention is from way back in 1ac3751 (some updates per
Clayton, 2018-11-26, openshift#12880).  But:

  $ git --no-pager grep -h supported modules/rhcos-about.adoc
  {op-system} is supported only as a component of {product-title} {product-version} for all {product-title} machines....

This commit rephrases the update docs to put RHCOS under OpenShift, so
folks don't get ideas and think it is a stand-alone product.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-4.1 size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants