Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS instance types data refactoring and update #449

Merged
merged 7 commits into from
Aug 14, 2018

Conversation

AlexanderZagaynov
Copy link
Contributor

@AlexanderZagaynov AlexanderZagaynov commented May 22, 2018

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1562062
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1590288

Changes:

  • Instance types getting task was removed together with intermediate data files (see below about further plans).
  • Data moved to and loaded from yaml file.
  • Specs updated to include new types amount.

Further plans:

  • Second PR will update aws-sdk gem to v3.
  • Third one should remove current yaml, and update data as a part of standard EMS refresh process.

Old vs New data: https://goo.gl/p2qFKq

@miq-bot miq-bot added the wip label May 22, 2018
@AlexanderZagaynov AlexanderZagaynov changed the title [WIP] AWS instance types data gathering [skip ci] [WIP] AWS instance types data gathering May 23, 2018
@miq-bot miq-bot removed the wip label May 23, 2018
@AlexanderZagaynov AlexanderZagaynov changed the title [skip ci] [WIP] AWS instance types data gathering [WIP] AWS instance types data gathering May 23, 2018
@miq-bot miq-bot added the wip label May 23, 2018

instances = AwsInstanceTypesParser.new(products).instances_list

# prevent yaml anchors/aliases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be necessary. To avoid aliases, there are multiple approaches. If you are creating the data structures in question, then you can just avoid putting references into other container instances (e.g. don't put the same array into multiple hashes). Instead, use .dup or .clone. If you aren't creating the hashes, and instead they come from Amazon, you can use .deep_clone or Marshal.load(Marshal.dump(obj) which will ultimately serialize and then deserialize, which should give you what you need

@logger ||= Logger.new(STDOUT)
end
delegate *%i(info debug warn error fatal), :to => :logger
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? if all you are trying to do is call a logger, just set a global $log = Logger.new(STDOUT), and be done with it. I'd prefer that over patching Kernel.

require 'net/http'
require_relative 'simple_logging'

class GithubFile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to just use Octokit than to build your own. Though I don't understand the purpose of the cache_dir, nor the commits_uri

@AlexanderZagaynov AlexanderZagaynov force-pushed the GH-0396 branch 2 times, most recently from d03b91c to 1a679b0 Compare June 19, 2018 20:32
@AlexanderZagaynov AlexanderZagaynov force-pushed the GH-0396 branch 9 times, most recently from 0aade49 to c99db8f Compare July 3, 2018 13:12
@AlexanderZagaynov AlexanderZagaynov changed the title [WIP] AWS instance types data gathering AWS instance types data gathering Jul 3, 2018
@AlexanderZagaynov
Copy link
Contributor Author

@Fryguy @Ladas @bronaghs please review

discontinued_types, types_list = isolated do
previous_list = []
versions = get_gh_data("contents/#{SDK_DATA_DIR}")
versions = versions.lazy.select { |v| v['type'] == 'dir' }.map { |v| v['name'] }.sort
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're ultimately iterating every object, then the lazy is not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is select first, then map, so I'm saving one loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True...it's still an over-optimization (i.e. this code probably goes from less than a second to also less than a second)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you insist to remove .lazy then? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but if other changes are being made, I would suggest remove it

data = get_gh_file("#{SDK_DATA_DIR}/#{version}/#{SDK_FILE_NAME}")
data = data['shapes']['InstanceType']['enum']
old_minus_new = previous_list - data
new_minus_old = data - previous_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed/added might be better variable names for readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that critical?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others would say no, but to me readability and maintenance are crticial. 😄 old_minus_new says nothing about the intent, which causes more cognitive overhead, as compared to a name like removed

if instance_data[:current_version] && !instance_data[:current_generation]
instance_data[:deprecated] = true
end
types_list.index(instance_type) || 1_000_000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do unindexed types get sorted amongst themselves? I would expect by instance_type name or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should keep previous order. And it is impossible to sort by numbers together with the strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, you just need a multi-level sort...

[types_list.index(instance_type) || 1_000_000, instance_data[:instance_type]

If you return that from the sort_by! block it will do multi-level sort on previous index, then name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, it's much more intuitive to simplify this to just sort by name instead of previous index. New values will get inserted where they belong and the diff will still only include the new stuff.

Copy link
Contributor Author

@AlexanderZagaynov AlexanderZagaynov Jul 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It complicates code and readability, and (maybe less important, but...) it will decrease speed and increase memory usage.
  2. Types list order can't be sorted alphabetically (it leads to bad end user experience), that's why I'm taking types list from AWS GitHub repo, to make it consistent with AWS lists.

Copy link
Member

@Fryguy Fryguy Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It complicates code and readability

Sorting by name makes it less readable? As opposed to finding the index in a previously sorted list, then sorting by that, and then just appending to the end when that isn't found? I fail to see how .sort_by { |x| x.instance_type } is more complicated.

(maybe less important, but...) it will decrease speed and increase memory usage.

That should not be a concern, unless you are saying it will take hours and use gigs of memory. This is a one-off execution.

Types list order can't be sorted alphabetically (it leads to bad end user experience), that's why I'm taking types list from AWS GitHub repo, to make it consistent with AWS lists.

I don't understand the bad end user experience. The sorting in the file is strictly for diff purposes and the end user will never see that sorting. Unless you mean the dev running this rake task as the end-user?

report << '<body>'
report << '<h1>Generated: <script>'
report << "document.write(new Date('#{report_date.iso8601}').toString())"
report << '</script></h1>'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the point of this report

Copy link
Contributor Author

@AlexanderZagaynov AlexanderZagaynov Jul 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shows which attributes was changed - for control. I've used it to control data consistency before my PR and after.
However, this report doesn't go anywhere besides local tmp dir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yeah, then IMO it's completely unnecessary...git diff should be more than sufficient for those purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git diff can't guarantee you consistent experience.
You can see it here for example: AlexanderZagaynov/manageiq-providers-amazon@3a9a452...a92033e#diff-d8dde9392fe3a31bd3f5580c14571a1cL1681
(I'm attaching a screenshot of this)
screenshot from 2018-07-03 16-53-27

Copy link
Member

@Fryguy Fryguy Jul 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link doesn't work...do you have another link with an example? Wanted to see if git diff --patience made a difference at all (and/or side-by-side)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to press "Load diff" (however that link really don't work now - because of recent amends)
screenshot from 2018-07-09 10-52-22

@Fryguy
Copy link
Member

Fryguy commented Jul 3, 2018

I think you can just require a GITHUB_API_TOKEN. I don't see why that needs to be made optional. This should simplify the code a bit.

@Fryguy
Copy link
Member

Fryguy commented Jul 3, 2018

Unless I'm mistaken, the report in unnecessary. The code can generate the yml file, and git diff is good enough at that point for a report.

@AlexanderZagaynov
Copy link
Contributor Author

AlexanderZagaynov commented Jul 3, 2018

@Fryguy

  • GITHUB_API_TOKEN only needed in case of Rate limit error. It typically should not arise in the foreseen usage case. Getting such token requires some additional actions from user, which can complicate things a bit.
  • git diff can be messed at least due to data movement from 'active' to 'discontinued'. Moreover, it doesn't affect usage experience in the worse way, and doesn't affect overall app performance (because it is rake task, which supposed to be run rarely). You can see example report screenshot in attachment.
    screenshot from 2018-07-03 16-36-35

@miq-bot miq-bot removed the wip label Jul 3, 2018
@AlexanderZagaynov
Copy link
Contributor Author

@Fryguy @Ladas
As per our discussion with @bronaghs this PR only moves the data to the yaml file, and updates it with a new flavors. I've described further plans in the top comment.
Data updates: https://goo.gl/m9JzCd
@Fryguy I responded to your question about description and other values here: #449 (comment)

@Fryguy
Copy link
Member

Fryguy commented Aug 2, 2018

I believe there is a way to ignore db/fixtures/aws_instance_types.yml for the bot to avoid the flood...I think you just edit the .yamllint file at the root.

@AlexanderZagaynov
Copy link
Contributor Author

@Fryguy I didn't get what you wanted to say about .yamllint file, but just in case want to point you to #473

@AlexanderZagaynov
Copy link
Contributor Author

@Fryguy ok, I moved that change from that PR to the current one

@AlexanderZagaynov AlexanderZagaynov changed the title AWS instance types data gathering AWS instance types data refactoring and update Aug 6, 2018
.yamllint Outdated
@@ -11,3 +11,4 @@ rules:
indent-sequences: false
line-length:
max: 120
trailing-spaces: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a rule that ignores trailing spaces across the board, I think it's preferable to just add the aws_instance_types.yml to the ignore list above, since it's autogenerated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,4055 @@
---
c1.medium:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this sorted by? As mentioned earlier, having it sort by name will allow future diffs to be a lot more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly, it is sorted by name

:ebs_only => false,
:instance_store_size => 4000.0.gigabytes,
:instance_store_volumes => 2,
:architecture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the == true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why Github thinks I commented here :/ I commented on the lines that say == true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

:ebs_only => false,
:instance_store_size => 4000.0.gigabytes,
:instance_store_volumes => 2,
:architecture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't need the == true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

:ebs_only => false,
:instance_store_size => 4000.0.gigabytes,
:instance_store_volumes => 2,
:architecture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the need to deep_freeze all over the place, but I'll not let it stop the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here I commented on the line that says .each_value(&:freeze).freeze

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm freezing that, because it is a constant. I believe that constant should surely be constant.

@Fryguy
Copy link
Member

Fryguy commented Aug 10, 2018

@AlexanderZagaynov Some comments above. Additionally, please squash old commits as they no longer make sense giving the feature is not being added (feature added then removed is not good for the commit history). Can you also edit the original PR post to remove the "Closes #396 and #397" and other details no longer relevant now that this PR has a different meaning?

@bronaghs
Copy link

@AlexanderZagaynov - no need to backport since this is a not a fix for a blocker bug.
I would prefer not to have the freeze all over the place, it isnt necessary.

@AlexanderZagaynov
Copy link
Contributor Author

@Fryguy @bronaghs done on remarks.
However, I didn't squashed commits entirely, but removed unnecessary ones, and organized rest in a meaningful steps.
P.S. Instance types data update diff: https://goo.gl/p2qFKq

@Fryguy
Copy link
Member

Fryguy commented Aug 14, 2018

LGTM. @bronaghs Not sure about the description changes as those are user-facing, but if you are good with those, then this is good to merge.

@miq-bot
Copy link
Member

miq-bot commented Aug 14, 2018

Checked commits AlexanderZagaynov/manageiq-providers-amazon@c512be6~...2087cb7 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
5 files checked, 0 offenses detected
Everything looks fine. 🏆

@bronaghs bronaghs merged commit 3f5a37e into ManageIQ:master Aug 14, 2018
@bronaghs bronaghs added this to the Sprint 93 Ending Aug 27, 2018 milestone Aug 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants