-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle BOMs when loading HRA criteria tables #1461
Changes from 5 commits
753bcb3
dcf897b
3c07a04
b19b946
4dbe7f5
0bc66be
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -326,6 +326,41 @@ def test_criteria_table_parsing(self): | |
pandas.testing.assert_frame_equal( | ||
expected_composite_dataframe, composite_dataframe) | ||
|
||
def test_criteria_table_parsing_with_bom(self): | ||
"""HRA: criteria table - parse a BOM.""" | ||
from natcap.invest import hra | ||
|
||
criteria_table_path = os.path.join(self.workspace_dir, 'criteria.csv') | ||
with open(criteria_table_path, 'w', encoding='utf-8-sig') as criteria_table: | ||
criteria_table.write( | ||
textwrap.dedent( | ||
"""\ | ||
HABITAT NAME,eelgrass,,,hardbottom,,,CRITERIA TYPE | ||
HABITAT RESILIENCE ATTRIBUTES,RATING,DQ,WEIGHT,RATING,DQ,WEIGHT,E/C | ||
recruitment rate,2,2,2,2,2,2,C | ||
connectivity rate,2,2,2,2,2,2,C | ||
,,,,,,, | ||
HABITAT STRESSOR OVERLAP PROPERTIES,,,,,,, | ||
oil,RATING,DQ,WEIGHT,RATING,DQ,WEIGHT,E/C | ||
frequency of disturbance,2,2,3,2,2,3,C | ||
management effectiveness,2,2,1,2,2,1,E | ||
,,,,,,, | ||
fishing,RATING,DQ,WEIGHT,RATING,DQ,WEIGHT,E/C | ||
frequency of disturbance,2,2,3,2,2,3,C | ||
management effectiveness,2,2,1,2,2,1,E | ||
""" | ||
)) | ||
|
||
# Sanity check: make sure the file has the expected BOM | ||
bom_char = "\uFEFF" # byte-order marker in 16-bit hex value | ||
with open(criteria_table_path) as criteria_table: | ||
assert criteria_table.read().startswith(bom_char) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This bit is still failing on Windows and I think it's because of the following from Python's codec docs (https://docs.python.org/3/library/codecs.html#encodings-and-unicode)
So maybe it's being stripped out when opening and decoding? Although this is contradictory to what I'm seeing in my terminal: >>> with open("test-bom.csv", 'w', encoding='utf-8-sig') as my_table:
... my_table.write("HABITAT NAME,eelgrass,,,hardbottom,,,CRITERIA TYPE")
>>>
>>> with open("test-bom.csv") as table:
... print(table.read())
...
HABITAT NAME,eelgrass,,,hardbottom,,,CRITERIA TYPE Which makes sense because:
So, I'm not sure why the test assertion error looks like the BOM is being decoded and stripped, where as when I open and read it I get the funky character mapping... |
||
|
||
target_composite_csv_path = os.path.join(self.workspace_dir, | ||
'composite.csv') | ||
hra._parse_criteria_table(criteria_table_path, | ||
target_composite_csv_path) | ||
|
||
def test_criteria_table_file_not_found(self): | ||
"""HRA: criteria table - spatial file not found.""" | ||
from natcap.invest import hra | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utf-8-sig
includes the BOM when writing the file. Cool!