Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add version-info to tablet tags in the topo #8973

Merged
merged 4 commits into from
Nov 9, 2021

Conversation

ajm188
Copy link
Contributor

@ajm188 ajm188 commented Oct 10, 2021

Signed-off-by: Andrew Mason amason@slack-corp.com

Description

This PR is both a feature request and the implementation (I will retcon an issue, but in this case the code was so small I found it was easier to discuss the concrete implementation than the abstract description).

This modifies both servenv.versionInfo and tabletmanager.BuildTabletFromInput to the aim of getting the metadata in the tablet process's servenv.AppVersion into the topo, so that others can access it (without needing to curl $tablet/debug/vars | jq '.BuildHost', for e.g.)

Demo
I1009 18:24:22.974719   75906 main.go:67] I1009 22:24:22.974216 tablet_executor.go:277] Received DDL request. strategy=direct
I1009 18:24:23.374617   75906 main.go:67] I1009 22:24:23.374484 tablet_executor.go:277] Received DDL request. strategy=direct
I1009 18:24:23.913219   75906 main.go:67] I1009 22:24:23.912934 tablet_executor.go:277] Received DDL request. strategy=direct
New VSchema object:
{
  "tables": {
    "corder": {},
    "customer": {},
    "product": {}
  }
}
If this is not what you expected, check the input data (as JSON parsing will skip unexpected fields).
Waiting for vtgate to be up...
vtgate is up!
Access vtgate at http://SFO-M-AMASON02:15001/debug/status
❯ vtctldclient --server ":15999" GetTablet zone1-100
{
  "alias": {
    "cell": "zone1",
    "uid": 100
  },
  "hostname": "localhost",
  "port_map": {
    "grpc": 16100,
    "vt": 15100
  },
  "keyspace": "commerce",
  "shard": "0",
  "key_range": null,
  "type": 1,
  "db_name_override": "",
  "tags": {
    "build_git_branch": "vttablet-buildinfo-tags",
    "build_git_rev": "e524e61559",
    "build_host": "SFO-M-AMASON02",
    "build_time": "Sat Oct  9 18:22:37 EDT 2021",
    "build_user": "amason",
    "go_version": "go1.17",
    "goarch": "amd64",
    "goos": "darwin",
    "jenkins_build_number": "0",
    "version": "12.0.0-SNAPSHOT"
  },
  "mysql_hostname": "localhost",
  "mysql_port": 17100,
  "primary_term_start_time": {
    "seconds": "1633818258",
    "nanoseconds": 985660000
  }
}

Use case(s)

  • I want to build some automation/monitoring/alerting off some of this information without having to curl every tablet in the fleet.
  • There are probably other use cases as well that I havent' thought of.

Open Questions From Me:

  1. First and foremost: is this a good idea? Should we pursue/refine this, or abandon it?
  2. Should servenv.versionInfo have its fields made public? I can't see a reason other than "we're worried someone will write code to modify this at runtime" but I feel like we can catch that in code review.
  3. Should we make this opt-outable?
  4. In my implementation, user-specified init_tags takes precedence over the tags from servenv.AppVersion.ToStringMap(). So, for example, if a user specifies a tag called version, then the vitess build version will not be present in the topo. Should we make the buildinfo tags take precedence instead?
  5. How do we feel about the tag names? I didn't give these a ton of thought, but we should be deliberate about how we name these tags now, because changing them later will be harder due to deprecation cycles.

Related Issue(s)

Issue TBD

Checklist

  • Should this PR be backported? no
  • Tests were added or are not required n/a
  • Documentation was added or is not required - TODO

Deployment Notes

Signed-off-by: Andrew Mason <amason@slack-corp.com>
@ajm188 ajm188 added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Cluster management labels Oct 10, 2021
Signed-off-by: Andrew Mason <amason@slack-corp.com>
@deepthi deepthi requested review from systay and sougou October 11, 2021 21:17
@sougou
Copy link
Contributor

sougou commented Oct 11, 2021

I like this approach overall. It's in line with the topo being a discovery mechanism and enhances that philosophy. The only future concern would be bloat.

As for overrides, there are probably arguments both ways. But I lean towards having the command line options overriding the internals, because it allows you to change behavior you otherwise cannot change.

@ajm188
Copy link
Contributor Author

ajm188 commented Oct 28, 2021

Met with Deepthi about this today; plan is to remove the jenkins_build_info tag from making it to the topo, and separately I'm going to look into our dependency on that field in AppVersion, since we managed to sneak that into upsteram years ago and it should be removed (to be clear, the plan is to remove it, the ambiguity is around how exactly to go about it).

For this:

The only future concern would be bloat.

100% agree. I think two options:

  1. Add a flag that is like -no_buildinfo_tags that completely disables this
  2. Add a flag that is like -skip_buildinfo_tags <tag1, tag2, ...> which would let you skip only the subset of tags

I prefer (2) for the flexibility, and it's not that much extra work on top of (1) to implement, but wanted to document the thought process anyway.

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Do you plan to add a flag as a separate PR?

…t record

Signed-off-by: Andrew Mason <amason@slack-corp.com>
@ajm188
Copy link
Contributor Author

ajm188 commented Oct 30, 2021

Added in this one; it was pretty straightforward. After implementing, I applied the following diff (to show both regexp and exact modes):

diff --git a/examples/local/scripts/vttablet-up.sh b/examples/local/scripts/vttablet-up.sh
index c3f66fa806..42497b19ac 100755
--- a/examples/local/scripts/vttablet-up.sh
+++ b/examples/local/scripts/vttablet-up.sh
@@ -44,6 +44,7 @@ vttablet \
  -init_keyspace $keyspace \
  -init_shard $shard \
  -init_tablet_type $tablet_type \
+ -vttablet_skip_buildinfo_tags "/.*git.*/,version" \
  -health_check_interval 5s \
  -enable_semi_sync \
  -enable_replication_reporter \

And after starting the local cluster:

❯ vtctldclient --server ":15999" GetTablet zone1-100 | jq '.tags'
{
    "build_host": "SFO-M-AMASON02",
    "build_time": "Sat Oct 30 11:21:47 EDT 2021",
    "build_user": "amason",
    "go_version": "go1.17",
    "goarch": "amd64",
    "goos": "darwin"
}

@ajm188 ajm188 merged commit 131148d into vitessio:main Nov 9, 2021
@ajm188 ajm188 deleted the vttablet-buildinfo-tags branch November 9, 2021 15:48
@ajm188 ajm188 mentioned this pull request Nov 10, 2021
3 tasks
@ajm188
Copy link
Contributor Author

ajm188 commented Nov 10, 2021

@mattlord hello! yes this was intentional, maybe we should update List{All,Shard}Tablets to be able to filter out tags? #9169 (comment)

Note you can also turn this off entirely with the (new) flag set to -vttablet_skip_buildinfo_tags="/.*/"

@mattlord
Copy link
Contributor

@ajm188 my main concerns are that

  • This feels like unnecessary detail for the default output
  • Because the output is not structured, and there's no way to get it via an API call AFAIK, you end up parsing the vtctl output with things like awk and because this has whitespace in it this becomes more challenging

Maybe we could not show them by default but only with a -v or -b option to be more verbose / show build info?

If everyone else is fine with it then I'm ok too. FWIW, I had only ever really parsed out the first 6 columns anyway and used them.

@mattlord
Copy link
Contributor

mattlord commented Nov 10, 2021

And I apologize for bringing this up after the fact. I had skimmed the RFC and I liked the general idea of having this info available in the topo, I just didn't realize it would affect the default vtctl command outputs.

@deepthi
Copy link
Member

deepthi commented Nov 10, 2021

Whether we decide to output it by default or not, any new fields in vtctl command outputs need to go at the end because people have scripts, and we have tried to keep things awk-able.
Sorry I missed this in review, but I also did not catch on that ListAllTablets output would change in this way.

@deepthi
Copy link
Member

deepthi commented Nov 10, 2021

Created vitessio/website#872 to actually create a review guide.

@ajm188
Copy link
Contributor Author

ajm188 commented Nov 10, 2021

@mattlord no problem at all! it's a fair callout!

@deepthi we haven't actually added any new fields here, the format has always been <bunch of stuff> tablet-tags, master-term-start-time, at least as far back as 8.0.

so, how do we go forward? my proposal:

in 13.0

  • default -vttablet_skip_buildinfo_tags to /.*/ which has the effect of turning it off by default
  • add a new flag to all the List/GetTablet vtctl commands called -include_tags, defaulting to true. this will let people update scripts to include the flag (to not break when we default it to false in 14.0) or to explicitly opt out now

in 14.0

  • change -vttablet_skip_buildinfo_tags to default to the empty string, meaning the default is to include all the build info fields
  • change -include_tags to default to false, meaning that tablet tags will not appear in vtctl tablet output without being explicitly requested

i will also point out that this is workable via awk (which is how tablet output has always meant to be parsed) such as vtctlclient ListAllTablets | awk -F'[][]' '{print $1, $3}' (get everything but the tags-map) or awk -F'[][]' '{print $2}' (get just the tags map)

@mattlord
Copy link
Contributor

@ajm188 I think that sounds fine, although I'm unsure why we wouldn't do the 14.0 plan in 13.0 since we're so early on in the 13.0 lifecycle?

@ajm188
Copy link
Contributor Author

ajm188 commented Nov 10, 2021

Because currently the output of ListAllTablets has 7 fields including the tablet tag map, so any scripts doing something like awk '{print $7}' to extract the primary-term-start-time field will break if we remove the tags and make the output only 6 fields by default.

@mattlord
Copy link
Contributor

mattlord commented Nov 10, 2021

IMO we need to keep the column, it's just a matter of whether we show any tags in the column or not. I think that the columns should be stable and any new columns added at the end. That's the general contract for unstructured output like this, otherwise you break lots of random things for users (e.g. SHOW commands in MySQL).

@ajm188
Copy link
Contributor Author

ajm188 commented Nov 10, 2021

I agree in general, but I don't see how we support that in these commands without potentially breaking someone.

Some potential scenarios:

  • we keep the column in position 6, and don't show tags => we break anyone that is currently using tags and handling them via awk -F'[][]')
  • we keep the column in position 6, and just don't show the buildinfo tags => we break anyone that had a tag with the same name as one of those tags by chance
  • we remove the column => we break anyone that expected exactly seven columns
  • and so on

that's why i think we make "7 columns + no buildinfo tags in the topo" the default in 13.0, and provide migration paths (via those flags) for 14.0

@mattlord
Copy link
Contributor

mattlord commented Nov 10, 2021

After thinking about it a bit more... I think we should simply make -vttablet_skip_buildinfo_tags default to /.*/ in 13.0 and keep that the default in perpetuity. This way it's an opt-in feature. Anyone using the -init_flags vttablet flag is already dealing with any parsing issues and this way there's no breakage for upgrades and you don't see the build info stuff unless you explicitly request it (for most people this will be a lot of noise).

@ajm188 ajm188 mentioned this pull request Nov 10, 2021
3 tasks
@deepthi
Copy link
Member

deepthi commented Nov 10, 2021

@deepthi we haven't actually added any new fields here, the format has always been tablet-tags, master-term-start-time, at least as far back as 8.0.

Thank you for pointing this out, I got it wrong :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants