-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "Error: invalid byte sequence in UTF-8" error when retrieving catalogues with invalid encoding from PuppetDB on Puppet 3 server #38
Conversation
…g-diff-viewer when the JSON-file is generated on a Puppet Master that has server configuration config_version set to generate Epoch numbers. In this case these numbers where stored as numbers instead of strings in the JSON-file which caused puppet-catalog-diff-viewer to throw an exception when it tries to replace strange characters in the version string. I've verified this works in latest version of puppet-catalog-diff-viewer. See also conversation on #puppet here https://puppetcommunity.slack.com/archives/C0W298S9G/p1594642754396700.
…useful and how it's recommended to be used in a Puppet upgrade.
…rb script should work fine on after discussion with raphink on voxpupuli#30.
… a resource parameter contains string data with invalid encoding. This happens when comparing with the stored catalogues in PuppetDB on a Puppet 3 server. The Puppet server in version 3 supports storing cataloges with invalid encoding but gives the warning "Puppet Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB" in the puppetserver.log file. With this fix we allow these invalid catalogues to be compared in puppet-catalog-diff instead of giving error by removing the invalid characters, although this in return may produce a lot of additional differences in the output. This fix is only applied to parameters on resources where the encoding is wrong so wont affect correct UTF-8 encoded parameters.
…version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet. We needed to use this code modification in combination with voxpupuli#38 in our environment, but this change does not require this pull request as long as the encoding is valid UTF-8 in the cataloges stored in PuppetDB on the Puppet 3 Master server.
…uppetDB version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet." This reverts commit fad6448.
…version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet. We needed to use this code modification in combination with voxpupuli#38 in our environment, but this change does not require this pull request as long as the encoding is valid UTF-8 in the cataloges stored in PuppetDB on the Puppet 3 Master server.
…version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet. We needed to use this code modification in combination with voxpupuli#38 in our environment, but this change does not require this pull request as long as the encoding is valid UTF-8 in the cataloges stored in PuppetDB on the Puppet 3 Master server.
lib/puppet/catalog-diff/comparer.rb
Outdated
unless str2.valid_encoding? | ||
Puppet::debug("content parameter in new resource has invalid #{str2.encoding} encoding.") | ||
str2.encode!('UTF-8', 'UTF-8', :invalid => :replace) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm used to dealing with this by forcing locales in the environment. Does this have any side effects, such as deteriorating characters in a C
environment for ex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a unit test to this PR that reproduces the problem. This test adds latin1 data to the content of a file and test to show the diff. Without this fix the test will fail with mentioned error on line 91.
Even if I try to run the test like LC_ALL=C pdk test unit --verbose
it gives me an error without the fix so I don't think it can be solved only by setting environment variables. I set default encoding to UTF-8 in the test on line 86 and 87 to make the test run consistently independent of LC_ALL
, so to test with other environment variables you might want to remove these lines, but it still failed for me without these while running with LC_ALL=C
.
The fix passes all tests so I think it shouldn't deteriorate characters that has worked fine in earlier versions. The fix only replaces invalid characters with � in diffs where the program else would have crashed. So it makes the program more robust to handle other encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
It could be interesting to have a wrapper function for that so we can just call:
validate_encoding(str1)
validate_encoding(str2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the code into that function now. Much better! Unit test still runs fine and I also tested diffing different branches on a Puppet 5 master as well as diffing between a Puppet 3 and 5 Master and it works fine. So this should be solved now! 🙂
…owing resource diff of file resources with non UTF-8 encoded content. Test demonstrates how fix earlier in str_diff prevents "ArgumentError: invalid byte sequence in US-ASCII>" error and also tests that the invalid character is replaced with Ruby's default character "uFFFD" in the output.
…pecificErrorClass)` risks false positives, since literally any other error would cause the expectation to pass, including those raised by Ruby (e.g. NoMethodError, NameError and ArgumentError), meaning the code you are intending to test may not even get reached. Instead consider using `expect { }.not_to raise_error` or `expect { }.to raise_error(DifferentSpecificErrorClass)`." by making test less specific.
…e_encoding and made debug logging more general and explanatory.
…alogues with invalid encoding from PuppetDB on Puppet 3 server (voxpupuli#38) * Fix problem where a nodes details wouldn't load/show in puppet-catalog-diff-viewer when the JSON-file is generated on a Puppet Master that has server configuration config_version set to generate Epoch numbers. In this case these numbers where stored as numbers instead of strings in the JSON-file which caused puppet-catalog-diff-viewer to throw an exception when it tries to replace strange characters in the version string. I've verified this works in latest version of puppet-catalog-diff-viewer. See also conversation on #puppet here https://puppetcommunity.slack.com/archives/C0W298S9G/p1594642754396700. * Added link to upload_facts.rb script and documentation for when it's useful and how it's recommended to be used in a Puppet upgrade. * Clarified documentation about which Puppet versions the upload_facts.rb script should work fine on after discussion with raphink on voxpupuli#30. * Fix problem "Error: invalid byte sequence in UTF-8" that happens when a resource parameter contains string data with invalid encoding. This happens when comparing with the stored catalogues in PuppetDB on a Puppet 3 server. The Puppet server in version 3 supports storing cataloges with invalid encoding but gives the warning "Puppet Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB" in the puppetserver.log file. With this fix we allow these invalid catalogues to be compared in puppet-catalog-diff instead of giving error by removing the invalid characters, although this in return may produce a lot of additional differences in the output. This fix is only applied to parameters on resources where the encoding is wrong so wont affect correct UTF-8 encoded parameters. * Adds backward compatibility to retrieve catalogs from older PuppetDB version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet. We needed to use this code modification in combination with voxpupuli#38 in our environment, but this change does not require this pull request as long as the encoding is valid UTF-8 in the cataloges stored in PuppetDB on the Puppet 3 Master server. * Revert "Adds backward compatibility to retrieve catalogs from older PuppetDB version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet." This reverts commit fad6448. * Add unit test of function compare_resources to recreate error when showing resource diff of file resources with non UTF-8 encoded content. Test demonstrates how fix earlier in str_diff prevents "ArgumentError: invalid byte sequence in US-ASCII>" error and also tests that the invalid character is replaced with Ruby's default character "uFFFD" in the output. * Cleaned up output text from development to make the code cleaner. * Removed Ruby warning "WARNING: Using `expect { }.not_to raise_error(SpecificErrorClass)` risks false positives, since literally any other error would cause the expectation to pass, including those raised by Ruby (e.g. NoMethodError, NameError and ArgumentError), meaning the code you are intending to test may not even get reached. Instead consider using `expect { }.not_to raise_error` or `expect { }.to raise_error(DifferentSpecificErrorClass)`." by making test less specific. * Refactored validate encoding code for string diff to function validate_encoding and made debug logging more general and explanatory.
…ppet 3, to support Puppet 3 to 5 upgrades (#39) * Adds backward compatibility to retrieve catalogs from older PuppetDB version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet. We needed to use this code modification in combination with #38 in our environment, but this change does not require this pull request as long as the encoding is valid UTF-8 in the cataloges stored in PuppetDB on the Puppet 3 Master server. * Add unit test of function compare_resources to recreate error when showing resource diff of file resources with non UTF-8 encoded content. Test demonstrates how fix earlier in str_diff prevents "ArgumentError: invalid byte sequence in US-ASCII>" error and also tests that the invalid character is replaced with Ruby's default character "uFFFD" in the output. * Cleaned up output text from development to make the code cleaner. * Removed Ruby warning "WARNING: Using `expect { }.not_to raise_error(SpecificErrorClass)` risks false positives, since literally any other error would cause the expectation to pass, including those raised by Ruby (e.g. NoMethodError, NameError and ArgumentError), meaning the code you are intending to test may not even get reached. Instead consider using `expect { }.not_to raise_error` or `expect { }.to raise_error(DifferentSpecificErrorClass)`." by making test less specific. * Revert "Removed Ruby warning "WARNING: Using `expect { }.not_to raise_error(SpecificErrorClass)` risks false positives, since literally any other error would cause the expectation to pass, including those raised by Ruby (e.g. NoMethodError, NameError and ArgumentError), meaning the code you are intending to test may not even get reached. Instead consider using `expect { }.not_to raise_error` or `expect { }.to raise_error(DifferentSpecificErrorClass)`." by making test less specific." This reverts commit 20e7555. Revert "Cleaned up output text from development to make the code cleaner." This reverts commit ab62f19. Revert "Add unit test of function compare_resources to recreate error when showing resource diff of file resources with non UTF-8 encoded content. Test demonstrates how fix earlier in str_diff prevents "ArgumentError: invalid byte sequence in US-ASCII>" error and also tests that the invalid character is replaced with Ruby's default character "uFFFD" in the output." This reverts commit 095335e. * Fix `threads` option in `pull` face (#40) * Fix "Error: invalid byte sequence in UTF-8" error when retrieving catalogues with invalid encoding from PuppetDB on Puppet 3 server (#38) * Fix problem where a nodes details wouldn't load/show in puppet-catalog-diff-viewer when the JSON-file is generated on a Puppet Master that has server configuration config_version set to generate Epoch numbers. In this case these numbers where stored as numbers instead of strings in the JSON-file which caused puppet-catalog-diff-viewer to throw an exception when it tries to replace strange characters in the version string. I've verified this works in latest version of puppet-catalog-diff-viewer. See also conversation on #puppet here https://puppetcommunity.slack.com/archives/C0W298S9G/p1594642754396700. * Added link to upload_facts.rb script and documentation for when it's useful and how it's recommended to be used in a Puppet upgrade. * Clarified documentation about which Puppet versions the upload_facts.rb script should work fine on after discussion with raphink on #30. * Fix problem "Error: invalid byte sequence in UTF-8" that happens when a resource parameter contains string data with invalid encoding. This happens when comparing with the stored catalogues in PuppetDB on a Puppet 3 server. The Puppet server in version 3 supports storing cataloges with invalid encoding but gives the warning "Puppet Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB" in the puppetserver.log file. With this fix we allow these invalid catalogues to be compared in puppet-catalog-diff instead of giving error by removing the invalid characters, although this in return may produce a lot of additional differences in the output. This fix is only applied to parameters on resources where the encoding is wrong so wont affect correct UTF-8 encoded parameters. * Adds backward compatibility to retrieve catalogs from older PuppetDB version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet. We needed to use this code modification in combination with #38 in our environment, but this change does not require this pull request as long as the encoding is valid UTF-8 in the cataloges stored in PuppetDB on the Puppet 3 Master server. * Revert "Adds backward compatibility to retrieve catalogs from older PuppetDB version 2.3 used in Puppet 3 in the latest version of puppet-catalog-diff. This is neccesary when you want to use puppet-catalog-diff to compare the cataloges between a Puppet 3 and Puppet 5 Master, as we did when we upgraded to Puppet 5. This code adds a fallback to the older /v4/nodes/?query PuppetDB API if the latest version of the API, that is /pdb/query/v4/nodes?, fails. This means that the code is compatible with both versions and thus allows the latest version of puppet-catalog-diff to still be used together with Puppet 3 servers, which is useful for those who still hasn't upgraded to newer version of Puppet." This reverts commit fad6448. * Add unit test of function compare_resources to recreate error when showing resource diff of file resources with non UTF-8 encoded content. Test demonstrates how fix earlier in str_diff prevents "ArgumentError: invalid byte sequence in US-ASCII>" error and also tests that the invalid character is replaced with Ruby's default character "uFFFD" in the output. * Cleaned up output text from development to make the code cleaner. * Removed Ruby warning "WARNING: Using `expect { }.not_to raise_error(SpecificErrorClass)` risks false positives, since literally any other error would cause the expectation to pass, including those raised by Ruby (e.g. NoMethodError, NameError and ArgumentError), meaning the code you are intending to test may not even get reached. Instead consider using `expect { }.not_to raise_error` or `expect { }.to raise_error(DifferentSpecificErrorClass)`." by making test less specific. * Refactored validate encoding code for string diff to function validate_encoding and made debug logging more general and explanatory. * IMPR: Solved all suggestions from raphink in #39. FEAT: The PuppetDB version is now retrieved using HTTP API call to help determine if to use current or older API to retrieve list of nodes to compare. * IMPR: Check with documentation to make sure change in API changed between version 2.3.x and 3.0.x of PuppetDB. Made version compare more explicit by using >= to ensure that also if PuppetDB version would be only 3 the syntax would be correct. PuppetDB versions are usually in the format x.y.z. Co-authored-by: Alexander Fisher <alex@linfratech.co.uk>
We got this problem when we used puppet-catalog-diff to compare catalogues between a Puppet 3 and 5 Master server. Many of our managed files use other encodings than UTF-8, we use ISO8896-1 for example, which results in puppetserver on our Puppet 3 Master storing catalogues with invalid encoding in PuppetDB. This is ok in Puppet 3 but results in the warning "Puppet Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB" in the puppetserver.log file. So we needed to have a fix for this to be able to use puppet-catalog-diff in our Puppet 5 upgrade.