Skip to content

Commit

Permalink
Fix issue #139; Version 1.9.0 (#242)
Browse files Browse the repository at this point in the history
* more tests; fixed xit tests

* requiring Ruby 2.5 or higher; rubocop fixes; adding tests

* fix issued #139

* renamed MissingHeaders exception to MissingKeys
  • Loading branch information
tilo authored Sep 5, 2023
1 parent 72d5585 commit f66eebc
Show file tree
Hide file tree
Showing 23 changed files with 202 additions and 104 deletions.
14 changes: 13 additions & 1 deletion .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ Metrics/BlockLength:
Metrics/BlockNesting:
Enabled: false

Metrics/ClassLength:
Enabled: false

Metrics/CyclomaticComplexity: # BS rule
Enabled: false

Expand All @@ -46,6 +49,9 @@ Naming/VariableNumber:
Style/ClassEqualityComparison:
Enabled: false

Style/ClassMethods:
Enabled: false

Style/ConditionalAssignment:
Enabled: false

Expand Down Expand Up @@ -114,6 +120,9 @@ Style/StringLiteralsInInterpolation:
Enabled: false
EnforcedStyle: double_quotes

Style/SymbolArray:
Enabled: false

Style/SymbolProc: # old Ruby versions can't do this
Enabled: false

Expand All @@ -123,11 +132,14 @@ Style/TrailingCommaInHashLiteral:
Style/TrailingUnderscoreVariable:
Enabled: false

Style/TrivialAccessors:
Enabled: false

# Style/UnlessModifier:
# Enabled: false

Style/ZeroLengthPredicate:
Enabled: false

Layout/LineLength:
Max: 240
Max: 256
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,27 @@

# SmarterCSV 1.x Change Log

## 1.9.0 (2023-09-04)
* fixed issue #139

* Error `SmarterCSV::MissingHeaders` was renamed to `SmarterCSV::MissingKeys`

* CHANGED BEHAVIOR:
When `key_mapping` option is used. (issue #139)
Previous versions just printed an error message when a CSV header was missing during key mapping.
Versions >= 1.9 will throw `SmarterCSV::MissingHeaders` listing all headers that were missing during mapping.

* Notable details for `key_mapping` and `required_headers`:

* `key_mapping` is applied to the headers early on during `SmarterCSV.process`, and raises an error if a header in the input CSV file is missing, and we can not map that header to its desired name.

Mapping errors can be surpressed by using:
* `silence_missing_keys` set to `true`, which silence all such errors, making all headers for mapping optional.
* `silence_missing_keys` given an Array with the specific header keys that are optional
The use case is that some header fields are optional, but we still want them renamed if they are present.

* `required_headers` checks which headers are present **after** `key_mapping` was applied.

## 1.8.5 (2023-06-25)
* fix parsing of escaped quote characters (thanks to JP Camara)

Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,9 @@ And header and data validations will also be supported in 2.x
| Option | Default | Explanation |
---------------------------------------------------------------------------------------------------------------------------------
| :key_mapping | nil | a hash which maps headers from the CSV file to keys in the result hash |
| :silence_missing_key | false | ignore missing keys in `key_mapping` if true |
| :silence_missing_key | false | ignore missing keys in `key_mapping` |
| | | if set to true: makes all mapped keys optional |
| | | if given an array, makes only the keys listed in it optional |
| :required_keys | nil | An array. Specify the required names AFTER header transformation. |
| :required_headers | nil | (DEPRECATED / renamed) Use `required_keys` instead |
| | | or an exception is raised No validation if nil is given. |
Expand Down
19 changes: 9 additions & 10 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,15 @@
require "bundler/gem_tasks"
require 'rspec/core/rake_task'


# temp fix for NoMethodError: undefined method `last_comment'
# remove when fixed in Rake 11.x and higher
module TempFixForRakeLastComment
def last_comment
last_description
end
end
Rake::Application.send :include, TempFixForRakeLastComment
### end of tempfix
# # temp fix for NoMethodError: undefined method `last_comment'
# # remove when fixed in Rake 11.x and higher
# module TempFixForRakeLastComment
# def last_comment
# last_description
# end
# end
# Rake::Application.send :include, TempFixForRakeLastComment
# ### end of tempfix

RSpec::Core::RakeTask.new(:spec)

Expand Down
43 changes: 24 additions & 19 deletions lib/smarter_csv.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ class HeaderSizeMismatch < SmarterCSVException; end
class IncorrectOption < SmarterCSVException; end
class ValidationError < SmarterCSVException; end
class DuplicateHeaders < SmarterCSVException; end
class MissingHeaders < SmarterCSVException; end
class MissingKeys < SmarterCSVException; end # previously known as MissingHeaders
class NoColSepDetected < SmarterCSVException; end
class KeyMappingError < SmarterCSVException; end # CURRENTLY UNUSED -> version 1.9.0
class KeyMappingError < SmarterCSVException; end

# first parameter: filename or input object which responds to readline method
def SmarterCSV.process(input, options = {}, &block)
def SmarterCSV.process(input, options = {}, &block) # rubocop:disable Lint/UnusedMethodArgument
options = default_options.merge(options)
options[:invalid_byte_sequence] = '' if options[:invalid_byte_sequence].nil?
puts "SmarterCSV OPTIONS: #{options.inspect}" if options[:verbose]
Expand Down Expand Up @@ -99,7 +99,7 @@ def SmarterCSV.process(input, options = {}, &block)
hash.delete_if{|_k, v| has_rails ? v.blank? : blank?(v)}
end

hash.delete_if{|_k, v| !v.nil? && v =~ /^(\d+|\d+\.\d+)$/ && v.to_f == 0} if options[:remove_zero_values] # values are typically Strings!
hash.delete_if{|_k, v| !v.nil? && v =~ /^(0+|0+\.0+)$/} if options[:remove_zero_values] # values are Strings
hash.delete_if{|_k, v| v =~ options[:remove_values_matching]} if options[:remove_values_matching]

if options[:convert_values_to_numeric]
Expand Down Expand Up @@ -171,15 +171,15 @@ def SmarterCSV.process(input, options = {}, &block)
result << chunk # not sure yet, why anybody would want to do this without a block
end
chunk_count += 1
chunk = [] # initialize for next chunk of data
# chunk = [] # initialize for next chunk of data
end
ensure
fh.close if fh.respond_to?(:close)
end
if block_given?
return chunk_count # when we do processing through a block we only care how many chunks we processed
chunk_count # when we do processing through a block we only care how many chunks we processed
else
return result # returns either an Array of Hashes, or an Array of Arrays of Hashes (if in chunked mode)
result # returns either an Array of Hashes, or an Array of Arrays of Hashes (if in chunked mode)
end
end

Expand Down Expand Up @@ -285,11 +285,11 @@ def parse(line, options, header_size = nil)
has_quotes = line =~ /#{options[:quote_char]}/
elements = parse_csv_line_c(line, options[:col_sep], options[:quote_char], header_size)
elements.map!{|x| cleanup_quotes(x, options[:quote_char])} if has_quotes
return [elements, elements.size]
[elements, elements.size]
# :nocov:
else
# puts "WARNING: SmarterCSV is using un-accelerated parsing of lines. Check options[:acceleration]"
return parse_csv_line_ruby(line, options, header_size)
parse_csv_line_ruby(line, options, header_size)
end
end

Expand Down Expand Up @@ -402,7 +402,7 @@ def only_or_except_limit_execution(options, option_name, key)
return true unless Array(options[option_name][:only]).include?(key)
end
end
return false
false
end

# If file has headers, then guesses column separator from headers.
Expand Down Expand Up @@ -467,8 +467,8 @@ def guess_line_ending(filehandle, options)

counts["\r"] += 1 if last_char == "\r"
# find the most frequent key/value pair:
k, _ = counts.max_by{|_, v| v}
return k
most_frequent_key, _count = counts.max_by{|_, v| v}
most_frequent_key
end

def process_headers(filehandle, options)
Expand All @@ -490,6 +490,7 @@ def process_headers(filehandle, options)

file_headerA.map!{|x| x.gsub(%r/#{options[:quote_char]}/, '')}
file_headerA.map!{|x| x.strip} if options[:strip_whitespace]

unless options[:keep_original_headers]
file_headerA.map!{|x| x.gsub(/\s+|-+/, '_')}
file_headerA.map!{|x| x.downcase} if options[:downcase_header]
Expand Down Expand Up @@ -523,10 +524,13 @@ def process_headers(filehandle, options)
# do some key mapping on the keys in the file header
# if you want to completely delete a key, then map it to nil or to ''
if !key_mappingH.nil? && key_mappingH.class == Hash && key_mappingH.keys.size > 0
unless options[:silence_missing_keys]
# if silence_missing_keys are not set, raise error if missing header
missing_keys = key_mappingH.keys - headerA
puts "WARNING: missing header(s): #{missing_keys.join(",")}" unless missing_keys.empty?
# if silence_missing_keys are not set, raise error if missing header
missing_keys = key_mappingH.keys - headerA
# if the user passes a list of speciffic mapped keys that are optional
missing_keys -= options[:silence_missing_keys] if options[:silence_missing_keys].is_a?(Array)

unless missing_keys.empty? || options[:silence_missing_keys] == true
raise SmarterCSV::KeyMappingError, "ERROR: can not map headers: #{missing_keys.join(', ')}"
end

headerA.map!{|x| key_mappingH.has_key?(x) ? (key_mappingH[x].nil? ? nil : key_mappingH[x]) : (options[:remove_unmapped_keys] ? nil : x)}
Expand All @@ -544,8 +548,8 @@ def process_headers(filehandle, options)
end

# deprecate required_headers
if !options[:required_headers].nil?
puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required headers'"
unless options[:required_headers].nil?
puts "DEPRECATION WARNING: please use 'required_keys' instead of 'required_headers'"
if options[:required_keys].nil?
options[:required_keys] = options[:required_headers]
options[:required_headers] = nil
Expand All @@ -557,7 +561,7 @@ def process_headers(filehandle, options)
options[:required_keys].each do |k|
missing_keys << k unless headerA.include?(k)
end
raise SmarterCSV::MissingHeaders, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
raise SmarterCSV::MissingKeys, "ERROR: missing attributes: #{missing_keys.join(',')}" unless missing_keys.empty?
end

@headers = headerA
Expand Down Expand Up @@ -611,6 +615,7 @@ def validate_options!(options)
def option_valid?(str)
return true if str.is_a?(Symbol) && str == :auto
return true if str.is_a?(String) && !str.empty?

false
end
end
Expand Down
2 changes: 1 addition & 1 deletion lib/smarter_csv/version.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# frozen_string_literal: true

module SmarterCSV
VERSION = "1.8.5"
VERSION = "1.9.0"
end
13 changes: 8 additions & 5 deletions smarter_csv.gemspec
Original file line number Diff line number Diff line change
@@ -1,21 +1,25 @@
# -*- encoding: utf-8 -*-
require File.expand_path('../lib/smarter_csv/version', __FILE__)
# coding: utf-8
# frozen_string_literal: true

require File.expand_path('lib/smarter_csv/version', __dir__)

Gem::Specification.new do |spec|
spec.name = "smarter_csv"
spec.version = SmarterCSV::VERSION
spec.authors = ["Tilo Sloboda"]
spec.email = ["tilo.sloboda@gmail.com"]

spec.summary = %q{Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files}
spec.description = %q{Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys}
spec.summary = "Ruby Gem for smarter importing of CSV Files (and CSV-like files), with lots of optional features, e.g. chunked processing for huge CSV files"
spec.description = "Ruby Gem for smarter importing of CSV Files as Array(s) of Hashes, with optional features for processing large files in parallel, embedded comments, unusual field- and record-separators, flexible mapping of CSV-headers to Hash-keys"
spec.homepage = "https://github.com/tilo/smarter_csv"
spec.license = 'MIT'

spec.metadata["homepage_uri"] = spec.homepage
spec.metadata["source_code_uri"] = spec.homepage
spec.metadata["changelog_uri"] = "https://github.com/tilo/smarter_csv/blob/main/CHANGELOG.md"

spec.required_ruby_version = ">= 2.5.0"

# Specify which files should be added to the gem when it is released.
# The `git ls-files -z` loads the files in the RubyGem that have been added into git.
spec.files = Dir.chdir(__dir__) do
Expand All @@ -30,7 +34,6 @@ Gem::Specification.new do |spec|
spec.require_paths = ["lib"] # add ext here?
spec.extensions = ["ext/smarter_csv/extconf.rb"]


spec.add_development_dependency "awesome_print"
spec.add_development_dependency "codecov"
spec.add_development_dependency "pry"
Expand Down
2 changes: 1 addition & 1 deletion spec/fixtures/silence_missing_keys.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
THIS,THAT
this,that
1,2
2 changes: 1 addition & 1 deletion spec/smarter_csv/binary_file2_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
expect(key.class).to eq String
end
expect(item['timestamp']).to eq 1_381_388_409
expect(item['item_id'].class).to eq Fixnum
expect(item['item_id'].class).to eq Integer
expect(item['name'].size).to be > 0
end
end
Expand Down
2 changes: 1 addition & 1 deletion spec/smarter_csv/binary_file_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
expect(key.class).to eq Symbol
end
expect(item[:timestamp]).to eq 1_381_388_409
expect(item[:item_id].class).to eq Fixnum
expect(item[:item_id].class).to eq Integer
expect(item[:name].size).to be > 0
end
end
Expand Down
2 changes: 1 addition & 1 deletion spec/smarter_csv/carriage_return_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@
let(:options) { { row_sep: :auto } }

it 'should process a file with more quoted text carriage return characters (\r) than line ending characters (\n)' do
row_sep = "\n"
# row_sep = "\n"
text_sep = "\r"
data = SmarterCSV.process("#{fixture_path}/carriage_returns_quoted.csv", options)
expect(data.flatten.size).to eq 2
Expand Down
2 changes: 1 addition & 1 deletion spec/smarter_csv/chunked_reading_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

describe 'be_able_to' do
it 'loads_chunk_cornercase_csv_files' do
(0..5).each do |chunk_size| # test for all chunk-sizes
6.times do |chunk_size| # test for all chunk-sizes
options = {chunk_size: chunk_size, remove_empty_hashes: true}
data = SmarterCSV.process("#{fixture_path}/chunk_cornercase.csv", options)
expect(data.flatten.size).to eq 5 # end-result must always be 5 rows
Expand Down
4 changes: 2 additions & 2 deletions spec/smarter_csv/chunked_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def self.process_chunk(chunk)
end

describe 'chunked processing' do
(0..13).each do |chunk_size|
14.times do |chunk_size|
it "loads all content from CSV file with chunk_size #{chunk_size}" do
options = { chunk_size: chunk_size }
data = SmarterCSV.process("#{fixture_path}/chunked.csv", options)
Expand All @@ -24,7 +24,7 @@ def self.process_chunk(chunk)
end

context 'process chunks with a block' do
(0..13).each do |chunk_size|
14.times do |chunk_size|
it "processes with chunk size #{chunk_size}" do
expect(Processor).to receive(:process).exactly(12).times

Expand Down
2 changes: 1 addition & 1 deletion spec/smarter_csv/duplicate_headers_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

it 'raises error on duplicate headers, when attempting to do key_mapping' do
# the mapping is right, but the underlying csv file is bad
options = {key_mapping: {email: :a, firstname: :b, lastname: :c, manager_email: :d, age: :e} }
options = {key_mapping: {email: :a, firstname: :b, lastname: :c, age: :e} }
expect do
SmarterCSV.process("#{fixture_path}/duplicate_headers.csv", options)
end.to raise_exception(SmarterCSV::DuplicateHeaders)
Expand Down
Loading

0 comments on commit f66eebc

Please sign in to comment.