Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer datapackage.json from transformed data files #48

Merged
merged 1 commit into from
Jan 14, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 65 additions & 14 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2,28 +2,59 @@ PATH
remote: .
specs:
hsds_transformer (0.0.5)
datapackage (~> 1.1.1, >= 1.1.1)
rest-client (~> 2.0.2, >= 2.0.2)
rubyzip (~> 1.3.0, >= 1.3.0)
rubyzip (~> 2.0.0, < 2.1.0)
sinatra (~> 2.0.5, >= 2.0.5)
unf_ext (~> 0.0.7.5, >= 0.0.7.5)
zip-zip (~> 0.3, >= 0.3)

GEM
remote: http://rubygems.org/
specs:
activesupport (5.1.7)
concurrent-ruby (~> 1.0, >= 1.0.2)
i18n (>= 0.7, < 2)
minitest (~> 5.1)
tzinfo (~> 1.1)
addressable (2.7.0)
public_suffix (>= 2.0.2, < 5.0)
coderay (1.1.2)
colorize (0.8.1)
concurrent-ruby (1.1.5)
datapackage (1.1.1)
colorize
json-schema
ruby_dig
rubyzip
tableschema
diff-lcs (1.3)
domain_name (0.5.20190701)
unf (>= 0.0.5, < 1.0.0)
dotenv (2.6.0)
ffi (1.11.3-x64-mingw32)
http-cookie (1.0.3)
domain_name (~> 0.5)
mime-types (3.3)
i18n (1.7.0)
concurrent-ruby (~> 1.0)
json-schema (2.8.1)
addressable (>= 2.4)
macaddr (1.7.2)
systemu (~> 2.6.5)
method_source (0.9.2)
mime-types (3.3.1)
mime-types-data (~> 3.2015)
mime-types-data (3.2019.0904)
mustermann (1.0.3)
mime-types-data (3.2019.1009)
minitest (5.13.0)
mustermann (1.1.1)
ruby2_keywords (~> 0.0.1)
netrc (0.11.0)
rack (2.0.6)
rack-protection (2.0.7)
pry (0.12.2)
coderay (~> 1.1.0)
method_source (~> 0.9.0)
public_suffix (4.0.3)
rack (2.0.8)
rack-protection (2.0.8.1)
rack
rack-test (1.1.0)
rack (>= 1.0, < 3)
Expand All @@ -32,29 +63,48 @@ GEM
http-cookie (>= 1.0.2, < 2.0)
mime-types (>= 1.16, < 4.0)
netrc (~> 0.8)
rest-client (2.0.2-x64-mingw32)
ffi (~> 1.9)
http-cookie (>= 1.0.2, < 2.0)
mime-types (>= 1.16, < 4.0)
netrc (~> 0.8)
rspec (3.8.0)
rspec-core (~> 3.8.0)
rspec-expectations (~> 3.8.0)
rspec-mocks (~> 3.8.0)
rspec-core (3.8.0)
rspec-core (3.8.2)
rspec-support (~> 3.8.0)
rspec-expectations (3.8.2)
rspec-expectations (3.8.6)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.8.0)
rspec-mocks (3.8.0)
rspec-mocks (3.8.2)
diff-lcs (>= 1.2.0, < 2.0)
rspec-support (~> 3.8.0)
rspec-support (3.8.0)
rubyzip (1.3.0)
sinatra (2.0.7)
rspec-support (3.8.3)
ruby2_keywords (0.0.1)
ruby_dig (0.0.2)
rubyzip (2.0.0)
sinatra (2.0.8.1)
mustermann (~> 1.0)
rack (~> 2.0)
rack-protection (= 2.0.7)
rack-protection (= 2.0.8.1)
tilt (~> 2.0)
systemu (2.6.5)
tableschema (1.0.2)
activesupport (~> 5.1.0)
json-schema (~> 2.8.0)
tod (~> 2.1.0)
uuid (~> 2.3.8)
thread_safe (0.3.6)
tilt (2.0.10)
tod (2.1.1)
tzinfo (1.2.6)
thread_safe (~> 0.1)
unf (0.1.4)
unf_ext
unf_ext (0.0.7.5)
unf_ext (0.0.7.6)
uuid (2.3.9)
macaddr (~> 1.0)
zip-zip (0.3)
rubyzip (>= 1.0.0)

Expand All @@ -65,6 +115,7 @@ PLATFORMS
DEPENDENCIES
dotenv (~> 2.6.0, >= 2.6.0)
hsds_transformer!
pry (~> 0.12.2, >= 0.12.2)
rack-test (~> 1.1.0, >= 1.1.0)
rb-readline (~> 0.5.5, >= 0.5.5)
rspec (~> 3.8.0, >= 3.8.0)
Expand Down
4 changes: 3 additions & 1 deletion hsds_transformer.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,12 @@ Gem::Specification.new do |s|
s.add_development_dependency 'rspec', '~> 3.8.0', '>= 3.8.0'
s.add_development_dependency 'rb-readline', '~> 0.5.5', '>= 0.5.5'
s.add_development_dependency 'rack-test', '~> 1.1.0', '>= 1.1.0'
s.add_development_dependency 'pry', '~> 0.12.2', '>= 0.12.2'

s.add_runtime_dependency 'unf_ext', '~> 0.0.7.5', '>= 0.0.7.5'
s.add_runtime_dependency 'rubyzip', '~> 1.3.0', '>= 1.3.0'
s.add_runtime_dependency 'rubyzip', '~> 2.0.0', '< 2.1.0'
s.add_runtime_dependency 'zip-zip', '~> 0.3', '>= 0.3'
s.add_runtime_dependency 'sinatra', '~> 2.0.5', '>= 2.0.5'
s.add_runtime_dependency 'rest-client', '~> 2.0.2', '>= 2.0.2'
s.add_runtime_dependency 'datapackage', '~> 1.1.1', '>= 1.1.1'
end
1 change: 1 addition & 0 deletions lib/hsds_transformer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
require "zip"
require "zip/zip"
require "rest_client"
require "datapackage"

require "hsds_transformer/version"
require "hsds_transformer/file_paths"
Expand Down
31 changes: 29 additions & 2 deletions lib/hsds_transformer/base_transformer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -138,17 +138,44 @@ def write_output_files
path_var = instance_variable_get "@output_#{model}_path"
write_csv path_var, headers(collection_ivar(model).first, model), collection_ivar(model)
end
write_datapackage_json
end

def write_datapackage_json
package = DataPackage::Package.new

# Is the output path in the file tree of the current directory? If so, we can work with it; if not, we can't.
# Due to "safe" file path requirements in the datapackage-rb library
path_chunks = output_datapackage_path.split(Dir.pwd)
if path_chunks[0] == ""
base_dir, remaining_path = parse_path(path_chunks)
descriptor = package.infer(directory: "#{remaining_path}/data", base_path: base_dir)
content_to_write = descriptor.to_json
else
content_to_write = File.read(default_datapackage_json_path)
end
File.open(output_datapackage_file_path, "wb") { |f| f.write(content_to_write) }
end

# Returns for example: ['tmp', 'input/data']
def parse_path(path_chunks)
path = path_chunks[1]
subpath_chunks = path.split("/")
base_dir = subpath_chunks[1]
remaining_path = subpath_chunks[2..-1].join("/")
[base_dir, remaining_path]
end

def zip_output
input_data_files = Dir.glob(File.join(output_data_path, '**/*'))
input_data_files = Dir.glob(File.join(output_data_path, "**/*"))


File.delete(zipfile_name) if File.exists?(zipfile_name)

Zip::File.open(zipfile_name, Zip::File::CREATE) do |zipfile|

# Add databpackage.json
zipfile.add("datapackage.json", datapackage_json_path)
zipfile.add("datapackage.json", output_datapackage_file_path)

# Add data files
input_data_files.each do |file_path|
Expand Down
6 changes: 4 additions & 2 deletions lib/hsds_transformer/file_paths.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ module FilePaths
DEFAULT_OUTPUT_PATH = "#{ENV["ROOT_PATH"]}/tmp"
DEFAULT_INPUT_PATH = "#{ENV["ROOT_PATH"]}/"

attr_reader :input_path, :output_path, :output_datapackage_path, :output_data_path, :datapackage_json_path,
attr_reader :input_path, :output_path, :output_datapackage_path, :output_datapackage_file_path,
:output_data_path, :default_datapackage_json_path,
:zipfile_name, :output_organizations_path, :output_locations_path, :output_services_path,
:output_phones_path, :output_physical_addresses_path, :output_postal_addresses_path,
:output_services_at_locations_path, :output_eligibilities_path, :output_contacts_path,
Expand All @@ -15,6 +16,7 @@ def set_file_paths(args)
@input_path = args[:input_path] || DEFAULT_INPUT_PATH
@output_path = args[:output_path] || DEFAULT_OUTPUT_PATH
@output_datapackage_path = File.join(output_path, "datapackage")
@output_datapackage_file_path = File.join(output_path, "datapackage/datapackage.json")
@output_data_path = File.join(output_datapackage_path, "data")
@zipfile_name = File.join(output_path, "datapackage.zip")

Expand All @@ -34,7 +36,7 @@ def set_file_paths(args)
@output_regular_schedules_path = output_data_path + "/regular_schedules.csv"
@output_service_areas_path = output_data_path + "/service_areas.csv"

@datapackage_json_path = File.join(ENV["ROOT_PATH"], "lib/datapackage/datapackage.json")
@default_datapackage_json_path = File.join(ENV["ROOT_PATH"], "lib/datapackage/datapackage.json")
end
end
end
4 changes: 2 additions & 2 deletions spec/fixtures/base_transformer/mapping.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ services.csv:
required: true
- model: phones
field: service_id
- model: services_at_location
- model: services_at_locations
field: service_id
- model: eligibilities
field: service_id
Expand Down Expand Up @@ -124,7 +124,7 @@ services.csv:
- model: services
field: organization_id
Location ID:
- model: services_at_location
- model: services_at_locations
field: location_id
Income standard for eligibility:
model: eligibilities
Expand Down
1 change: 1 addition & 0 deletions spec/fixtures/base_transformer/output/datapackage.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"profile":"tabular-data-package","resources":[{"format":"csv","mediatype":"text/csv","name":"regular_schedules","path":"tmp/datapackage/data/regular_schedules.csv","schema":{"fields":[{"name":"id","type":"any","format":"default"},{"name":"service_id","type":"any","format":"default"},{"name":"location_id","type":"integer","format":"default"},{"name":"service_at_location_id","type":"any","format":"default"},{"name":"weekday","type":"any","format":"default"},{"name":"opens_at","type":"time","format":"default"},{"name":"closes_at","type":"time","format":"default"}],"missingValues":[""]},"profile":"tabular-data-resource","encoding":"utf-8"},{"format":"csv","mediatype":"text/csv","name":"phones","path":"tmp/datapackage/data/phones.csv","schema":{"fields":[{"name":"id","type":"any","format":"default"},{"name":"location_id","type":"any","format":"default"},{"name":"service_id","type":"any","format":"default"},{"name":"organization_id","type":"any","format":"default"},{"name":"contact_id","type":"any","format":"default"},{"name":"service_at_location_id","type":"any","format":"default"},{"name":"number","type":"any","format":"default"},{"name":"extension","type":"any","format":"default"},{"name":"type","type":"any","format":"default"},{"name":"language","type":"any","format":"default"},{"name":"description","type":"any","format":"default"}],"missingValues":[""]},"profile":"tabular-data-resource","encoding":"utf-8"},{"format":"csv","mediatype":"text/csv","name":"eligibility","path":"tmp/datapackage/data/eligibility.csv","schema":{"fields":[{"name":"id","type":"any","format":"default"},{"name":"service_id","type":"integer","format":"default"},{"name":"eligibility","type":"any","format":"default"}],"missingValues":[""]},"profile":"tabular-data-resource","encoding":"utf-8"},{"format":"csv","mediatype":"text/csv","name":"postal_addresses","path":"tmp/datapackage/data/postal_addresses.csv","schema":{"fields":[{"name":"id","type":"any","format":"default"},{"name":"location_id","type":"any","format":"default"},{"name":"organization_id","type":"integer","format":"default"},{"name":"attention","type":"any","format":"default"},{"name":"address_1","type":"any","format":"default"},{"name":"city","type":"any","format":"default"},{"name":"region","type":"any","format":"default"},{"name":"state_province","type":"any","format":"default"},{"name":"postal_code","type":"integer","format":"default"},{"name":"country","type":"any","format":"default"}],"missingValues":[""]},"profile":"tabular-data-resource","encoding":"utf-8"},{"format":"csv","mediatype":"text/csv","name":"organizations","path":"tmp/datapackage/data/organizations.csv","schema":{"fields":[{"name":"id","type":"integer","format":"default"},{"name":"name","type":"any","format":"default"},{"name":"alternate_name","type":"any","format":"default"},{"name":"description","type":"any","format":"default"},{"name":"email","type":"any","format":"default"},{"name":"url","type":"any","format":"default"},{"name":"tax_status","type":"any","format":"default"},{"name":"tax_id","type":"any","format":"default"},{"name":"year_incorporated","type":"any","format":"default"},{"name":"legal_status","type":"any","format":"default"}],"missingValues":[""]},"profile":"tabular-data-resource","encoding":"utf-8"},{"format":"csv","mediatype":"text/csv","name":"services_at_location","path":"tmp/datapackage/data/services_at_location.csv","schema":{"fields":[{"name":"id","type":"any","format":"default"},{"name":"service_id","type":"integer","format":"default"},{"name":"location_id","type":"integer","format":"default"},{"name":"description","type":"any","format":"default"}],"missingValues":[""]},"profile":"tabular-data-resource","encoding":"utf-8"},{"format":"csv","mediatype":"text/csv","name":"locations","path":"tmp/datapackage/data/locations.csv","schema":{"fields":[{"name":"id","type":"integer","format":"default"},{"name":"organization_id","type":"integer","format":"default"},{"name":"name","type":"any","format":"default"},{"name":"alternate_name","type":"any","format":"default"},{"name":"description","type":"any","format":"default"},{"name":"transportation","type":"any","format":"default"},{"name":"latitude","type":"any","format":"default"},{"name":"longitude","type":"any","format":"default"}],"missingValues":[""]},"profile":"tabular-data-resource","encoding":"utf-8"},{"format":"csv","mediatype":"text/csv","name":"services","path":"tmp/datapackage/data/services.csv","schema":{"fields":[{"name":"id","type":"integer","format":"default"},{"name":"organization_id","type":"integer","format":"default"},{"name":"program_id","type":"any","format":"default"},{"name":"name","type":"any","format":"default"},{"name":"alternate_name","type":"any","format":"default"},{"name":"description","type":"any","format":"default"},{"name":"url","type":"any","format":"default"},{"name":"email","type":"any","format":"default"},{"name":"status","type":"any","format":"default"},{"name":"interpretation_services","type":"any","format":"default"},{"name":"application_process","type":"any","format":"default"},{"name":"wait_time","type":"any","format":"default"},{"name":"fees","type":"any","format":"default"},{"name":"accreditations","type":"any","format":"default"},{"name":"licenses","type":"any","format":"default"}],"missingValues":[""]},"profile":"tabular-data-resource","encoding":"utf-8"}]}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to pretty print this but didn't know how to get the tests to ignore whitespace when comparing this with the actual output file.

Loading