Skip to content

Commit

Permalink
Merge branch 'qldGovAuToCkan' into feature/badges
Browse files Browse the repository at this point in the history
  • Loading branch information
ThrawnCA authored Oct 15, 2024
2 parents e7eee6d + ded839c commit 2f817af
Show file tree
Hide file tree
Showing 16 changed files with 395 additions and 16 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ jobs:

steps:
- uses: actions/checkout@v4
- if: ${{ matrix.ckan-version == 2.9 }}
run: pip install "setuptools>=44.1.0,<71"
- name: Install requirements
run: |
pip install -r requirements.txt
Expand Down
62 changes: 62 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,65 @@
1.1.0 2024-10-15
================


Fixes:

* add README note about running on separate server, #191 by @ThrawnCA in https://github.com/ckan/ckanext-xloader/pull/192
* Use IDomainObjectModification Implementation by @JVickery-TBS in https://github.com/ckan/ckanext-xloader/pull/198
* Hide excessive numbers of resource_data log entries, #187 by @ThrawnCA in https://github.com/ckan/ckanext-xloader/pull/188
* #182 Type guessing fixes by @ThrawnCA in https://github.com/ckan/ckanext-xloader/pull/186
* Document the ckan.download_proxy setting, #176 by @ThrawnCA in https://github.com/ckan/ckanext-xloader/pull/204
* Conditional DataStore Tab in Resource Edit by @JVickery-TBS in https://github.com/ckan/ckanext-xloader/pull/190
* Make locking behaviour more robust by @ThrawnCA in https://github.com/ckan/ckanext-xloader/pull/205
* Delete Datastore Table Button by @JVickery-TBS in https://github.com/ckan/ckanext-xloader/pull/197
* Quality of life improvements by @duttonw in https://github.com/ckan/ckanext-xloader/pull/195
* Clean Datastore Tables Job by @JVickery-TBS in https://github.com/ckan/ckanext-xloader/pull/196
* strip extra space for column name by @mutantsan in https://github.com/ckan/ckanext-xloader/pull/210
* Skip empty lines instead of erroring by @ThrawnCA in https://github.com/ckan/ckanext-xloader/pull/208
* add more options for maintainers to expedite XLoader runs, GitHub #202 by @ThrawnCA in https://github.com/ckan/ckanext-xloader/pull/212
* Add Mixed Integers Type Guessing to NUMERIC Tests by @JVickery-TBS in https://github.com/ckan/ckanext-xloader/pull/217
* PY2 & PY3 String/Binary Fixes by @JVickery-TBS in https://github.com/ckan/ckanext-xloader/pull/203
* In plugin.py, there is an fix of resource format key error by @Nisha1293 in https://github.com/ckan/ckanext-xloader/pull/209
* CKAN 2.11 support by @amercader in https://github.com/ckan/ckanext-xloader/pull/220

New Contributors:

* @JVickery-TBS made their first contribution in https://github.com/ckan/ckanext-xloader/pull/198
* @duttonw made their first contribution in https://github.com/ckan/ckanext-xloader/pull/195
* @mutantsan made their first contribution in https://github.com/ckan/ckanext-xloader/pull/210
* @Nisha1293 made their first contribution in https://github.com/ckan/ckanext-xloader/pull/209

**Full Changelog**: https://github.com/ckan/ckanext-xloader/compare/1.0.1...1.1.0


1.0.1 2024-04-04
================

Fixes:

* Include config_declaration.yaml into MANIFEST by @pdelboca in https://github.com/ckan/ckanext-xloader/pull/183


**Full Changelog**: https://github.com/ckan/ckanext-xloader/compare/1.0.0...1.0.1

1.0.1 2024-04-04
================

Fixes:

* Fixed date parsing while fetching entries for task_status by @muhammed-ajmal in https://github.com/ckan/ckanext-xloader/pull/179
* Drop support for old CKAN versions and add CSRF support by @pdelboca in https://github.com/ckan/ckanext-xloader/pull/180
* Refactor test_jobs.py by @pdelboca in https://github.com/ckan/ckanext-xloader/pull/181

New Contributors:

* @muhammed-ajmal made their first contribution in https://github.com/ckan/ckanext-xloader/pull/179

**Full Changelog**: https://github.com/ckan/ckanext-xloader/compare/0.12.2...1.0.0




0.9.0 2021-10-01
================

Expand Down
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ CKAN version Compatibility
=============== =============
2.7 no longer supported (last supported version: 0.12.2)
2.8 no longer supported (last supported version: 0.12.2)
2.9 yes (Python3) (last supported version for Python 2.7: 0.12.2))
2.9 yes (Python3) (last supported version for Python 2.7: 0.12.2)), Must: ``pip install "setuptools>=44.1.0,<71"``
2.10 yes
2.11 yes
=============== =============
Expand Down
17 changes: 17 additions & 0 deletions ckanext/xloader/config_declaration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,23 @@ groups:
to True.
type: bool
required: false
- key: ckanext.xloader.validation.requires_successful_report
default: False
example: True
description: |
Resources are required to pass Validation from the ckanext-validation
plugin to be able to get XLoadered.
type: bool
required: false
- key: ckanext.xloader.validation.enforce_schema
default: True
example: False
description: |
Resources are expected to have a Validation Schema, or use the default ones if not.
If this option is set to `False`, Resources that do not have
a Validation Schema will be treated like they do not require Validation.
See https://github.com/frictionlessdata/ckanext-validation?tab=readme-ov-file#data-schema
for more details.
- key: ckanext.xloader.clean_datastore_tables
default: False
example: True
Expand Down
8 changes: 7 additions & 1 deletion ckanext/xloader/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ def xloader_data_into_datastore(input):
if tries < MAX_RETRIES:
tries = tries + 1
log.info("Job %s failed due to temporary error [%s], retrying", job_id, e)
logger.info("Job failed due to temporary error [%s], retrying", e)
job_dict['status'] = 'pending'
job_dict['metadata']['tries'] = tries
enqueue_job(
Expand Down Expand Up @@ -245,7 +246,12 @@ def tabulator_load():
logger.info("'use_type_guessing' mode is: %s", use_type_guessing)
try:
if use_type_guessing:
tabulator_load()
try:
tabulator_load()
except JobError as e:
logger.warning('Load using tabulator failed: %s', e)
logger.info('Trying again with direct COPY')
direct_load()
else:
try:
direct_load()
Expand Down
39 changes: 36 additions & 3 deletions ckanext/xloader/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,8 @@ def _clear_datastore_resource(resource_id):
'''
engine = get_write_engine()
with engine.begin() as conn:
conn.execute("SET LOCAL lock_timeout = '5s'")
conn.execute('TRUNCATE TABLE "{}"'.format(resource_id))
conn.execute("SET LOCAL lock_timeout = '15s'")
conn.execute('TRUNCATE TABLE "{}" RESTART IDENTITY'.format(resource_id))


def load_csv(csv_filepath, resource_id, mimetype='text/csv', logger=None):
Expand Down Expand Up @@ -339,6 +339,18 @@ def create_column_indexes(fields, resource_id, logger):
logger.info('...column indexes created.')


def _save_type_overrides(headers_dicts):
# copy 'type' to 'type_override' if it's not the default type (text)
# and there isn't already an override in place
for h in headers_dicts:
if h['type'] != 'text':
if 'info' in h:
if 'type_override' not in h['info']:
h['info']['type_override'] = h['type']
else:
h['info'] = {'type_override': h['type']}


def load_table(table_filepath, resource_id, mimetype='text/csv', logger=None):
'''Loads an Excel file (or other tabular data recognized by tabulator)
into Datastore and creates indexes.
Expand Down Expand Up @@ -399,7 +411,14 @@ def load_table(table_filepath, resource_id, mimetype='text/csv', logger=None):
}.get(existing_info.get(h, {}).get('type_override'), t)
for t, h in zip(types, headers)]

headers = [header.strip()[:MAX_COLUMN_LENGTH] for header in headers if header.strip()]
# Strip leading and trailing whitespace, then truncate to maximum length,
# then strip again in case the truncation exposed a space.
headers = [
header.strip()[:MAX_COLUMN_LENGTH].strip()
for header in headers
if header and header.strip()
]
header_count = len(headers)
type_converter = TypeConverter(types=types)

with UnknownEncodingStream(table_filepath, file_format, decoding_result,
Expand All @@ -409,6 +428,17 @@ def row_iterator():
for row in stream:
data_row = {}
for index, cell in enumerate(row):
# Handle files that have extra blank cells in heading and body
# eg from Microsoft Excel adding lots of empty cells on export.
# Blank header cells won't generate a column,
# so row length won't match column count.
if index >= header_count:
# error if there's actual data out of bounds, otherwise ignore
if cell:
raise LoaderError("Found data in column %s but resource only has %s header(s)",
index + 1, header_count)
else:
continue
data_row[headers[index]] = cell
yield data_row
result = row_iterator()
Expand All @@ -426,6 +456,9 @@ def row_iterator():
if type_override in list(_TYPE_MAPPING.values()):
h['type'] = type_override

# preserve any types that we have sniffed unless told otherwise
_save_type_overrides(headers_dicts)

logger.info('Determined headers and types: %s', headers_dicts)

'''
Expand Down
40 changes: 38 additions & 2 deletions ckanext/xloader/plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,16 @@

from ckan.model.domain_object import DomainObjectOperation
from ckan.model.resource import Resource
from ckan.model.package import Package

from . import action, auth, helpers as xloader_helpers, utils
from ckanext.xloader.utils import XLoaderFormats

try:
from ckanext.validation.interfaces import IPipeValidation
HAS_IPIPE_VALIDATION = True
except ImportError:
HAS_IPIPE_VALIDATION = False

try:
config_declarations = toolkit.blanket.config_declarations
except AttributeError:
Expand All @@ -34,6 +39,8 @@ class xloaderPlugin(plugins.SingletonPlugin):
plugins.implements(plugins.IResourceController, inherit=True)
plugins.implements(plugins.IClick)
plugins.implements(plugins.IBlueprint)
if HAS_IPIPE_VALIDATION:
plugins.implements(IPipeValidation)

# IClick
def get_commands(self):
Expand Down Expand Up @@ -69,6 +76,21 @@ def configure(self, config_):
)
)

# IPipeValidation

def receive_validation_report(self, validation_report):
if utils.requires_successful_validation_report():
res_dict = toolkit.get_action('resource_show')({'ignore_auth': True},
{'id': validation_report.get('resource_id')})
if (toolkit.asbool(toolkit.config.get('ckanext.xloader.validation.enforce_schema', True))
or res_dict.get('schema', None)) and validation_report.get('status') != 'success':
# A schema is present, or required to be present
return
# if validation is running in async mode, it is running from the redis workers.
# thus we need to do sync=True to have Xloader put the job at the front of the queue.
sync = toolkit.asbool(toolkit.config.get(u'ckanext.validation.run_on_update_async', True))
self._submit_to_xloader(res_dict, sync=sync)

# IDomainObjectModification

def notify(self, entity, operation):
Expand Down Expand Up @@ -96,7 +118,16 @@ def notify(self, entity, operation):
if _should_remove_unsupported_resource_from_datastore(resource_dict):
toolkit.enqueue_job(fn=_remove_unsupported_resource_from_datastore, args=[entity.id])

if not getattr(entity, 'url_changed', False):
if utils.requires_successful_validation_report():
# If the resource requires validation, stop here if validation
# has not been performed or did not succeed. The Validation
# extension will call resource_patch and this method should
# be called again. However, url_changed will not be in the entity
# once Validation does the patch.
log.debug("Deferring xloading resource %s because the "
"resource did not pass validation yet.", resource_dict.get('id'))
return
elif not getattr(entity, 'url_changed', False):
# do not submit to xloader if the url has not changed.
return

Expand All @@ -105,6 +136,11 @@ def notify(self, entity, operation):
# IResourceController

def after_resource_create(self, context, resource_dict):
if utils.requires_successful_validation_report():
log.debug("Deferring xloading resource %s because the "
"resource did not pass validation yet.", resource_dict.get('id'))
return

self._submit_to_xloader(resource_dict)

def before_resource_show(self, resource_dict):
Expand Down
2 changes: 1 addition & 1 deletion ckanext/xloader/templates/package/resource_read.html
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
{{ super() }}
{% endblock %}

{% block action_manage_inner %}
{% block action_manage %}
{{ super() }}
{% if h.is_resource_supported_by_xloader(res) %}
<li>{% link_for _('DataStore'), named_route='xloader.resource_data', id=pkg.name, resource_id=res.id, class_='btn btn-light', icon='cloud-upload' %}</li>
Expand Down
8 changes: 4 additions & 4 deletions ckanext/xloader/tests/samples/sample_with_blanks.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Funding agency,Program title,Opening date,Service ID
DTIS,Visitor First Experiences Fund,23/03/2023,63039
DTIS,First Nations Sport and Recreation Program Round 2,22/03/2023,63040
,,,63041
Funding agency,Program title,Opening date,Service ID
DTIS,Visitor First Experiences Fund,23/03/2023,63039
DTIS,First Nations Sport and Recreation Program Round 2,22/03/2023,63040
,,,63041
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Agency (Dept or Stat Body),Agency address,Contract description/name,Award contract date,Contract value,Supplier name,Supplier address,Variation to contract (Yes/No),Specific confidentiality provision used,Procurement method,Reason for Limited tender,Form of contract,Number of offers sought,Evaluation criteria and weightings,Deliverables,Contract milestones,Contract performance management,,,,,,,,,,,,,,,
State-wide Operations,"111 Easy St, Duckburg, 40000",con_12345-Social services,01/01/1970,"$123,456",LexCorp,123 Example St ELEMENT CITY 4444,No,No,Selective,,,,,,,,,,,,,,,,,,,,,,
61 changes: 61 additions & 0 deletions ckanext/xloader/tests/test_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -961,6 +961,31 @@ def test_simple(self, Session):
u"numeric",
u"text",
]
# Check that the sniffed types have been recorded as overrides
rec = p.toolkit.get_action("datastore_search")(
None, {"resource_id": resource_id, "limit": 0}
)
fields = [f for f in rec["fields"] if not f["id"].startswith("_")]
assert fields[0].get("info", {}).get("type_override", "") == "timestamp"
assert fields[1].get("info", {}).get("type_override", "") == "numeric"
assert fields[2].get("info", {}).get("type_override", "") == ""

def test_simple_large_file(self, Session):
csv_filepath = get_sample_filepath("simple-large.csv")
resource = factories.Resource()
resource_id = resource['id']
loader.load_table(
csv_filepath,
resource_id=resource_id,
mimetype="text/csv",
logger=logger,
)
assert self._get_column_types(Session, resource_id) == [
u"int4",
u"tsvector",
u"numeric",
u"text",
]

def test_simple_large_file(self, Session):
csv_filepath = get_sample_filepath("simple-large.csv")
Expand Down Expand Up @@ -1242,6 +1267,30 @@ def test_no_entries(self):
logger=logger,
)

def test_with_blanks(self, Session):
csv_filepath = get_sample_filepath("sample_with_blanks.csv")
resource = factories.Resource()
resource_id = resource['id']
loader.load_table(
csv_filepath,
resource_id=resource_id,
mimetype="text/csv",
logger=logger,
)
assert len(self._get_records(Session, resource_id)) == 3

def test_with_empty_lines(self, Session):
csv_filepath = get_sample_filepath("sample_with_empty_lines.csv")
resource = factories.Resource()
resource_id = resource['id']
loader.load_table(
csv_filepath,
resource_id=resource_id,
mimetype="text/csv",
logger=logger,
)
assert len(self._get_records(Session, resource_id)) == 6

def test_with_quoted_commas(self, Session):
csv_filepath = get_sample_filepath("sample_with_quoted_commas.csv")
resource = factories.Resource()
Expand All @@ -1266,6 +1315,18 @@ def test_with_iso_8859_1(self, Session):
)
assert len(self._get_records(Session, resource_id)) == 266

def test_with_extra_blank_cells(self, Session):
csv_filepath = get_sample_filepath("sample_with_extra_blank_cells.csv")
resource = factories.Resource()
resource_id = resource['id']
loader.load_table(
csv_filepath,
resource_id=resource_id,
mimetype="text/csv",
logger=logger,
)
assert len(self._get_records(Session, resource_id)) == 1

def test_with_mixed_quotes(self, Session):
csv_filepath = get_sample_filepath("sample_with_mixed_quotes.csv")
resource = factories.Resource()
Expand Down
Loading

0 comments on commit 2f817af

Please sign in to comment.