-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS instance types data refactoring and update #449
Conversation
32e6188
to
c7f4eeb
Compare
|
||
instances = AwsInstanceTypesParser.new(products).instances_list | ||
|
||
# prevent yaml anchors/aliases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be necessary. To avoid aliases, there are multiple approaches. If you are creating the data structures in question, then you can just avoid putting references into other container instances (e.g. don't put the same array into multiple hashes). Instead, use .dup
or .clone
. If you aren't creating the hashes, and instead they come from Amazon, you can use .deep_clone
or Marshal.load(Marshal.dump(obj)
which will ultimately serialize and then deserialize, which should give you what you need
@logger ||= Logger.new(STDOUT) | ||
end | ||
delegate *%i(info debug warn error fatal), :to => :logger | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? if all you are trying to do is call a logger, just set a global $log = Logger.new(STDOUT)
, and be done with it. I'd prefer that over patching Kernel.
lib/tasks_private/lib/github_file.rb
Outdated
require 'net/http' | ||
require_relative 'simple_logging' | ||
|
||
class GithubFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be easier to just use Octokit than to build your own. Though I don't understand the purpose of the cache_dir, nor the commits_uri
d03b91c
to
1a679b0
Compare
0aade49
to
c99db8f
Compare
discontinued_types, types_list = isolated do | ||
previous_list = [] | ||
versions = get_gh_data("contents/#{SDK_DATA_DIR}") | ||
versions = versions.lazy.select { |v| v['type'] == 'dir' }.map { |v| v['name'] }.sort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're ultimately iterating every object, then the lazy is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is select
first, then map
, so I'm saving one loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True...it's still an over-optimization (i.e. this code probably goes from less than a second to also less than a second)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you insist to remove .lazy
then? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but if other changes are being made, I would suggest remove it
data = get_gh_file("#{SDK_DATA_DIR}/#{version}/#{SDK_FILE_NAME}") | ||
data = data['shapes']['InstanceType']['enum'] | ||
old_minus_new = previous_list - data | ||
new_minus_old = data - previous_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed/added might be better variable names for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that critical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others would say no, but to me readability and maintenance are crticial. 😄 old_minus_new says nothing about the intent, which causes more cognitive overhead, as compared to a name like removed
if instance_data[:current_version] && !instance_data[:current_generation] | ||
instance_data[:deprecated] = true | ||
end | ||
types_list.index(instance_type) || 1_000_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do unindexed types get sorted amongst themselves? I would expect by instance_type name or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should keep previous order. And it is impossible to sort by numbers together with the strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible, you just need a multi-level sort...
[types_list.index(instance_type) || 1_000_000, instance_data[:instance_type]
If you return that from the sort_by! block it will do multi-level sort on previous index, then name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, it's much more intuitive to simplify this to just sort by name instead of previous index. New values will get inserted where they belong and the diff will still only include the new stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It complicates code and readability, and (maybe less important, but...) it will decrease speed and increase memory usage.
- Types list order can't be sorted alphabetically (it leads to bad end user experience), that's why I'm taking types list from AWS GitHub repo, to make it consistent with AWS lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It complicates code and readability
Sorting by name makes it less readable? As opposed to finding the index in a previously sorted list, then sorting by that, and then just appending to the end when that isn't found? I fail to see how .sort_by { |x| x.instance_type }
is more complicated.
(maybe less important, but...) it will decrease speed and increase memory usage.
That should not be a concern, unless you are saying it will take hours and use gigs of memory. This is a one-off execution.
Types list order can't be sorted alphabetically (it leads to bad end user experience), that's why I'm taking types list from AWS GitHub repo, to make it consistent with AWS lists.
I don't understand the bad end user experience. The sorting in the file is strictly for diff purposes and the end user will never see that sorting. Unless you mean the dev running this rake task as the end-user?
report << '<body>' | ||
report << '<h1>Generated: <script>' | ||
report << "document.write(new Date('#{report_date.iso8601}').toString())" | ||
report << '</script></h1>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the point of this report
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shows which attributes was changed - for control. I've used it to control data consistency before my PR and after.
However, this report doesn't go anywhere besides local tmp
dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yeah, then IMO it's completely unnecessary...git diff
should be more than sufficient for those purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git diff
can't guarantee you consistent experience.
You can see it here for example: AlexanderZagaynov/manageiq-providers-amazon@3a9a452...a92033e#diff-d8dde9392fe3a31bd3f5580c14571a1cL1681
(I'm attaching a screenshot of this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link doesn't work...do you have another link with an example? Wanted to see if git diff --patience
made a difference at all (and/or side-by-side)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just require a GITHUB_API_TOKEN. I don't see why that needs to be made optional. This should simplify the code a bit. |
Unless I'm mistaken, the report in unnecessary. The code can generate the yml file, and |
@Fryguy @Ladas |
I believe there is a way to ignore |
@Fryguy ok, I moved that change from that PR to the current one |
.yamllint
Outdated
@@ -11,3 +11,4 @@ rules: | |||
indent-sequences: false | |||
line-length: | |||
max: 120 | |||
trailing-spaces: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a rule that ignores trailing spaces across the board, I think it's preferable to just add the aws_instance_types.yml to the ignore list above, since it's autogenerated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -0,0 +1,4055 @@ | |||
--- | |||
c1.medium: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this sorted by? As mentioned earlier, having it sort by name will allow future diffs to be a lot more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, exactly, it is sorted by name
:ebs_only => false, | ||
:instance_store_size => 4000.0.gigabytes, | ||
:instance_store_volumes => 2, | ||
:architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the == true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why Github thinks I commented here :/ I commented on the lines that say == true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
:ebs_only => false, | ||
:instance_store_size => 4000.0.gigabytes, | ||
:instance_store_volumes => 2, | ||
:architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also don't need the == true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
:ebs_only => false, | ||
:instance_store_size => 4000.0.gigabytes, | ||
:instance_store_volumes => 2, | ||
:architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand the need to deep_freeze all over the place, but I'll not let it stop the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here I commented on the line that says .each_value(&:freeze).freeze
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm freezing that, because it is a constant. I believe that constant should surely be constant.
@AlexanderZagaynov Some comments above. Additionally, please squash old commits as they no longer make sense giving the feature is not being added (feature added then removed is not good for the commit history). Can you also edit the original PR post to remove the "Closes #396 and #397" and other details no longer relevant now that this PR has a different meaning? |
@AlexanderZagaynov - no need to backport since this is a not a fix for a blocker bug. |
c900cd5
to
8a90ea1
Compare
8a90ea1
to
7616eca
Compare
@Fryguy @bronaghs done on remarks. |
LGTM. @bronaghs Not sure about the description changes as those are user-facing, but if you are good with those, then this is good to merge. |
Checked commits AlexanderZagaynov/manageiq-providers-amazon@c512be6~...2087cb7 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0 |
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1562062
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1590288
Changes:
Further plans:
aws-sdk
gem tov3
.Old vs New data: https://goo.gl/p2qFKq