Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Sqlalchemy errors from prior harvest rollbacks #445

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bonnland
Copy link
Contributor

@bonnland bonnland commented May 18, 2021

I'm leaving the original description below the divider. I realize now that the proposed code change doesn't really address the root problem: why are rollbacks happening in the first place?

I managed to test harvesting of our records in a remote debugging session using the harvester run_test command. No rollbacks happen when I do this. It makes me wonder if the queue states are not quite right.

I have one theory:

  • Invalid records (those marked as ERROR) are currently being resubmitted to the fetch queue.
  • The next harvest will add all WAF records, so the fetch queue will have two instances of the invalid record in the queue.
  • Somehow processing the same record twice creates this rollback exception.

Does this sound plausible? If so, the simple solution is to stop re-submitting invalid records to the fetch queue.


Original Description of PR

This may be addressing the same issue as #312 and may be related to #151.

The WAFs that we harvest often contain metadata records that fail our extra validation rules.

Here's the behavior I'm seeing with ckanext-harvest version 1.3.3, when we harvest a WAF containing records that fail validation:

  • When the harvest processes have been newly started, the first harvest job is successfully marked as finished and an error email is sent.
  • A second harvest of the same WAF causes this stack trace in gather_err.log:
2021-05-18 09:31:21,153 DEBUG [ckanext.harvest.queue] Received harvest job id: 04c1c1db-087d-436b-88b1-ea9e910db1c3
Traceback (most recent call last):
  File "/usr/lib/ckan/default/bin/paster", line 8, in <module>
    sys.exit(run())
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
    result = self.command()
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 235, in command
    utils.gather_consumer()
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/utils.py", line 336, in gather_consumer
    gather_callback(consumer, method, header, body)
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/queue.py", line 347, in gather_callback
    job = HarvestJob.get(id)
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/model/__init__.py", line 116, in get
    o = cls.filter(**kwds).first()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2755, in first
    ret = list(self[0:1])
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2547, in __getitem__
    return list(res)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2855, in __iter__
    return self._execute_and_instances(context)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2876, in _execute_and_instances
    close_with_result=True)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2885, in _get_bind_args
    **kw
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2867, in _connection_from_session
    conn = self.session.connection(**kw)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 998, in connection
    execution_options=execution_options)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1003, in _connection_for_bind
    engine, execution_options)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 382, in _connection_for_bind
    self._assert_active()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 276, in _assert_active
    % self._rollback_exception
sqlalchemy.exc.InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.DatabaseError) could not receive data from server: Connection timed out
 [SQL: 'INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)'] [parameters: {'content': u'Received harvest job id: 04c1c1db-087d-436b-88b1-ea9e910db1c3', 'created': datetime.datetime(2021, 5, 18, 15, 31, 21, 155237), 'id': u'c4225513-9d7f-48ff-b9c5-4363d549054e', 'level': 'DEBUG'}]

This harvest job eventually times out, though sometimes many hours after the timeout value of 5 minutes that we have set. We are 6 hours behind UTC time zone, I don't know if this matters.

If there are no invalid records in the WAF, then the second harvest job completes successfully. It is only when re-harvesting a WAF with one or more invalid records that the second harvest produces this stack trace and harvest job that remains stuck for hours.

@bonnland
Copy link
Contributor Author

bonnland commented May 19, 2021

This seems related:

#151 (comment)

If validation errors cause a session rollback, then is there a problem with adding a Session.close() right after it?

@bonnland bonnland changed the title Validate before DB transaction boundary Avoid Sqlalchemy errors from rollback May 20, 2021
@bonnland bonnland changed the title Avoid Sqlalchemy errors from rollback Avoid Sqlalchemy errors from prior harvest rollbacks May 20, 2021
@bonnland
Copy link
Contributor Author

bonnland commented May 20, 2021

Sorry for the extra commits; I'm relatively new to PRs.

@@ -260,6 +260,7 @@ def create(cls, message, object, stage=u'Fetch', line=None):
# Clear any in-progress sqlalchemy transactions
try:
Session.rollback()
Session.close()
Copy link
Contributor Author

@bonnland bonnland May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if Session.remove() is somehow different/better than Session.close().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant