AWS instance types data refactoring and update #449

AlexanderZagaynov · 2018-05-22T17:11:20Z

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1562062
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1590288

Changes:

Instance types getting task was removed together with intermediate data files (see below about further plans).
Data moved to and loaded from yaml file.
Specs updated to include new types amount.

Further plans:

Second PR will update aws-sdk gem to v3.
Third one should remove current yaml, and update data as a part of standard EMS refresh process.

Old vs New data: https://goo.gl/p2qFKq

Fryguy · 2018-05-29T18:00:42Z

lib/tasks_private/instance_types.rake

+
+    instances = AwsInstanceTypesParser.new(products).instances_list
+
+    # prevent yaml anchors/aliases


This should not be necessary. To avoid aliases, there are multiple approaches. If you are creating the data structures in question, then you can just avoid putting references into other container instances (e.g. don't put the same array into multiple hashes). Instead, use .dup or .clone. If you aren't creating the hashes, and instead they come from Amazon, you can use .deep_clone or Marshal.load(Marshal.dump(obj) which will ultimately serialize and then deserialize, which should give you what you need

Fryguy · 2018-05-29T18:04:12Z

lib/tasks_private/lib/simple_logging.rb

+    @logger ||= Logger.new(STDOUT)
+  end
+  delegate *%i(info debug warn error fatal), :to => :logger
+end


Why do you need this? if all you are trying to do is call a logger, just set a global $log = Logger.new(STDOUT), and be done with it. I'd prefer that over patching Kernel.

Fryguy · 2018-05-29T18:05:44Z

lib/tasks_private/lib/github_file.rb

+require 'net/http'
+require_relative 'simple_logging'
+
+class GithubFile


I think it would be easier to just use Octokit than to build your own. Though I don't understand the purpose of the cache_dir, nor the commits_uri

AlexanderZagaynov · 2018-07-03T13:47:57Z

@Fryguy @Ladas @bronaghs please review

Fryguy · 2018-07-03T14:09:03Z

lib/tasks_private/instance_types.rake

+    discontinued_types, types_list = isolated do
+      previous_list = []
+      versions = get_gh_data("contents/#{SDK_DATA_DIR}")
+      versions = versions.lazy.select { |v| v['type'] == 'dir' }.map { |v| v['name'] }.sort


If you're ultimately iterating every object, then the lazy is not necessary.

There is select first, then map, so I'm saving one loop.

True...it's still an over-optimization (i.e. this code probably goes from less than a second to also less than a second)

Are you insist to remove .lazy then? :)

No, but if other changes are being made, I would suggest remove it

Fryguy · 2018-07-03T14:10:06Z

lib/tasks_private/instance_types.rake

+        data = get_gh_file("#{SDK_DATA_DIR}/#{version}/#{SDK_FILE_NAME}")
+        data = data['shapes']['InstanceType']['enum']
+        old_minus_new = previous_list - data
+        new_minus_old = data - previous_list


removed/added might be better variable names for readability.

is that critical?

Others would say no, but to me readability and maintenance are crticial. 😄 old_minus_new says nothing about the intent, which causes more cognitive overhead, as compared to a name like removed

Fryguy · 2018-07-03T14:11:57Z

lib/tasks_private/instance_types.rake

+      if instance_data[:current_version] && !instance_data[:current_generation]
+        instance_data[:deprecated] = true
+      end
+      types_list.index(instance_type) || 1_000_000


How do unindexed types get sorted amongst themselves? I would expect by instance_type name or something.

They should keep previous order. And it is impossible to sort by numbers together with the strings.

It's possible, you just need a multi-level sort...

[types_list.index(instance_type) || 1_000_000, instance_data[:instance_type]

If you return that from the sort_by! block it will do multi-level sort on previous index, then name.

That being said, it's much more intuitive to simplify this to just sort by name instead of previous index. New values will get inserted where they belong and the diff will still only include the new stuff.

It complicates code and readability, and (maybe less important, but...) it will decrease speed and increase memory usage.

Types list order can't be sorted alphabetically (it leads to bad end user experience), that's why I'm taking types list from AWS GitHub repo, to make it consistent with AWS lists.

It complicates code and readability

Sorting by name makes it less readable? As opposed to finding the index in a previously sorted list, then sorting by that, and then just appending to the end when that isn't found? I fail to see how .sort_by { |x| x.instance_type } is more complicated.

(maybe less important, but...) it will decrease speed and increase memory usage.

That should not be a concern, unless you are saying it will take hours and use gigs of memory. This is a one-off execution.

Types list order can't be sorted alphabetically (it leads to bad end user experience), that's why I'm taking types list from AWS GitHub repo, to make it consistent with AWS lists.

I don't understand the bad end user experience. The sorting in the file is strictly for diff purposes and the end user will never see that sorting. Unless you mean the dev running this rake task as the end-user?

Fryguy · 2018-07-03T14:12:36Z

lib/tasks_private/instance_types.rake

+      report << '<body>'
+      report << '<h1>Generated: <script>'
+      report << "document.write(new Date('#{report_date.iso8601}').toString())"
+      report << '</script></h1>'


I don't understand the point of this report

It shows which attributes was changed - for control. I've used it to control data consistency before my PR and after.
However, this report doesn't go anywhere besides local tmp dir.

Ok, yeah, then IMO it's completely unnecessary...git diff should be more than sufficient for those purposes.

git diff can't guarantee you consistent experience.
You can see it here for example: AlexanderZagaynov/manageiq-providers-amazon@3a9a452...a92033e#diff-d8dde9392fe3a31bd3f5580c14571a1cL1681
(I'm attaching a screenshot of this)

The link doesn't work...do you have another link with an example? Wanted to see if git diff --patience made a difference at all (and/or side-by-side)

You need to press "Load diff" (however that link really don't work now - because of recent amends)

Fryguy · 2018-07-03T14:29:51Z

I think you can just require a GITHUB_API_TOKEN. I don't see why that needs to be made optional. This should simplify the code a bit.

Fryguy · 2018-07-03T14:31:27Z

Unless I'm mistaken, the report in unnecessary. The code can generate the yml file, and git diff is good enough at that point for a report.

AlexanderZagaynov · 2018-07-03T14:40:55Z

@Fryguy

GITHUB_API_TOKEN only needed in case of Rate limit error. It typically should not arise in the foreseen usage case. Getting such token requires some additional actions from user, which can complicate things a bit.
git diff can be messed at least due to data movement from 'active' to 'discontinued'. Moreover, it doesn't affect usage experience in the worse way, and doesn't affect overall app performance (because it is rake task, which supposed to be run rarely). You can see example report screenshot in attachment.

AlexanderZagaynov · 2018-08-02T16:20:29Z

@Fryguy @Ladas
As per our discussion with @bronaghs this PR only moves the data to the yaml file, and updates it with a new flavors. I've described further plans in the top comment.
Data updates: https://goo.gl/m9JzCd
@Fryguy I responded to your question about description and other values here: #449 (comment)

Fryguy · 2018-08-02T16:49:14Z

I believe there is a way to ignore db/fixtures/aws_instance_types.yml for the bot to avoid the flood...I think you just edit the .yamllint file at the root.

AlexanderZagaynov · 2018-08-02T17:21:06Z

@Fryguy I didn't get what you wanted to say about .yamllint file, but just in case want to point you to #473

AlexanderZagaynov · 2018-08-02T17:25:23Z

@Fryguy ok, I moved that change from that PR to the current one

Fryguy · 2018-08-10T13:58:43Z

.yamllint

@@ -11,3 +11,4 @@ rules:
    indent-sequences: false
  line-length:
    max: 120
+  trailing-spaces: false


Instead of adding a rule that ignores trailing spaces across the board, I think it's preferable to just add the aws_instance_types.yml to the ignore list above, since it's autogenerated.

Fryguy · 2018-08-10T14:00:38Z

db/fixtures/aws_instance_types.yml

@@ -0,0 +1,4055 @@
+---
+c1.medium:


What is this sorted by? As mentioned earlier, having it sort by name will allow future diffs to be a lot more readable.

yes, exactly, it is sorted by name

Fryguy · 2018-08-10T14:03:57Z

app/models/manageiq/providers/amazon/instance_types.rb

-      :ebs_only                => false,
-      :instance_store_size     => 4000.0.gigabytes,
-      :instance_store_volumes  => 2,
-      :architecture   


No need for the == true

I have no idea why Github thinks I commented here :/ I commented on the lines that say == true

Fryguy · 2018-08-10T14:04:33Z

app/models/manageiq/providers/amazon/instance_types.rb

-      :ebs_only                => false,
-      :instance_store_size     => 4000.0.gigabytes,
-      :instance_store_volumes  => 2,
-      :architecture   


Also don't need the == true

Fryguy · 2018-08-10T14:05:42Z

app/models/manageiq/providers/amazon/instance_types.rb

-      :ebs_only                => false,
-      :instance_store_size     => 4000.0.gigabytes,
-      :instance_store_volumes  => 2,
-      :architecture   


I don't really understand the need to deep_freeze all over the place, but I'll not let it stop the PR.

And here I commented on the line that says .each_value(&:freeze).freeze

I'm freezing that, because it is a constant. I believe that constant should surely be constant.

Fryguy · 2018-08-10T14:09:26Z

@AlexanderZagaynov Some comments above. Additionally, please squash old commits as they no longer make sense giving the feature is not being added (feature added then removed is not good for the commit history). Can you also edit the original PR post to remove the "Closes #396 and #397" and other details no longer relevant now that this PR has a different meaning?

bronaghs · 2018-08-13T13:38:33Z

@AlexanderZagaynov - no need to backport since this is a not a fix for a blocker bug.
I would prefer not to have the freeze all over the place, it isnt necessary.

AlexanderZagaynov · 2018-08-13T16:43:35Z

@Fryguy @bronaghs done on remarks.
However, I didn't squashed commits entirely, but removed unnecessary ones, and organized rest in a meaningful steps.
P.S. Instance types data update diff: https://goo.gl/p2qFKq

Fryguy · 2018-08-14T14:48:28Z

LGTM. @bronaghs Not sure about the description changes as those are user-facing, but if you are good with those, then this is good to merge.

miq-bot · 2018-08-14T15:38:09Z

Checked commits AlexanderZagaynov/manageiq-providers-amazon@c512be6~...2087cb7 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
5 files checked, 0 offenses detected
Everything looks fine. 🏆

miq-bot added the wip label May 22, 2018

AlexanderZagaynov changed the title ~~[WIP] AWS instance types data gathering~~ [skip ci] [WIP] AWS instance types data gathering May 23, 2018

miq-bot removed the wip label May 23, 2018

AlexanderZagaynov changed the title ~~[skip ci] [WIP] AWS instance types data gathering~~ [WIP] AWS instance types data gathering May 23, 2018

miq-bot added the wip label May 23, 2018

AlexanderZagaynov force-pushed the GH-0396 branch from 32e6188 to c7f4eeb Compare May 23, 2018 16:33

Fryguy reviewed May 29, 2018

View reviewed changes

AlexanderZagaynov force-pushed the GH-0396 branch 2 times, most recently from d03b91c to 1a679b0 Compare June 19, 2018 20:32

AlexanderZagaynov force-pushed the GH-0396 branch 9 times, most recently from 0aade49 to c99db8f Compare July 3, 2018 13:12

AlexanderZagaynov changed the title ~~[WIP] AWS instance types data gathering~~ AWS instance types data gathering Jul 3, 2018

Fryguy reviewed Jul 3, 2018

View reviewed changes

miq-bot removed the wip label Jul 3, 2018

AlexanderZagaynov changed the title ~~AWS instance types data gathering~~ AWS instance types data refactoring and update Aug 6, 2018

Fryguy reviewed Aug 10, 2018

View reviewed changes

AlexanderZagaynov force-pushed the GH-0396 branch from c900cd5 to 8a90ea1 Compare August 13, 2018 16:27

Alexander Zagaynov added 6 commits August 13, 2018 18:28

remove old instance types getting task

c512be6

move current instance types data from code to yaml

bc316d4

load instance types data from yaml file

e1086a9

new instance types

b1c75c5

fix failing specs

55be05c

ignore trailing whitespaces in instance types file

7616eca

AlexanderZagaynov force-pushed the GH-0396 branch from 8a90ea1 to 7616eca Compare August 13, 2018 16:30

Fryguy approved these changes Aug 14, 2018

View reviewed changes

remove all freezing as requested

2087cb7

bronaghs merged commit 3f5a37e into ManageIQ:master Aug 14, 2018

bronaghs added this to the Sprint 93 Ending Aug 27, 2018 milestone Aug 15, 2018

bronaghs added enhancement gaprindashvili/no labels Aug 15, 2018

agrare mentioned this pull request May 28, 2021

Add back an instance_types rake task #707

Merged


		instances = AwsInstanceTypesParser.new(products).instances_list

		# prevent yaml anchors/aliases

		@@ -0,0 +1,4055 @@
		---
		c1.medium:

AWS instance types data refactoring and update #449

AWS instance types data refactoring and update #449

Conversation

AlexanderZagaynov commented May 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexanderZagaynov commented Jul 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexanderZagaynov Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

Fryguy Jul 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexanderZagaynov Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fryguy Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fryguy commented Jul 3, 2018

Fryguy commented Jul 3, 2018

AlexanderZagaynov commented Jul 3, 2018 • edited Loading

AlexanderZagaynov commented Aug 2, 2018

Fryguy commented Aug 2, 2018

AlexanderZagaynov commented Aug 2, 2018

AlexanderZagaynov commented Aug 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fryguy commented Aug 10, 2018

bronaghs commented Aug 13, 2018

AlexanderZagaynov commented Aug 13, 2018

Fryguy commented Aug 14, 2018

miq-bot commented Aug 14, 2018

AlexanderZagaynov commented May 22, 2018 •

edited

Loading

AlexanderZagaynov Jul 3, 2018 •

edited

Loading

Fryguy Jul 5, 2018 •

edited

Loading

AlexanderZagaynov Jul 3, 2018 •

edited

Loading

Fryguy Jul 3, 2018 •

edited

Loading

AlexanderZagaynov commented Jul 3, 2018 •

edited

Loading