Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDK: allow repeated cache file removals & cleanup cache files in test #19533

Closed
wants to merge 3 commits into from

Conversation

sherifnada
Copy link
Contributor

@sherifnada sherifnada commented Nov 17, 2022

What

There was a bunch of cache files generated in unit tests which polluted the git workspace:

(.venv) ➜  python git:(sherif/gitignore-cdk-tmp-files) ✗ git status
On branch sherif/gitignore-cdk-tmp-files
Your branch is up to date with 'origin/sherif/gitignore-cdk-tmp-files'.

...
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	cache_http_stream.sqlite
	cache_http_stream_with_slices.sqlite

This PR cleans them up.

In the process, it removes a restriction (which I'm not sure why it existed) that only allowed a single removal of a stream's cache file during the lifetime of the Python runtime

@sherifnada sherifnada requested a review from a team as a code owner November 17, 2022 07:20
@octavia-squidington-iv octavia-squidington-iv added the CDK Connector Development Kit label Nov 17, 2022
@sherifnada sherifnada requested a review from grubberr November 17, 2022 07:22
@@ -1,4 +1,6 @@
# Changelog
## 0.9.5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.9.5 since there is another PR claiming 0.9.4 already

@@ -68,11 +68,9 @@ def clear_cache(self):
"""
remove cache file only once
"""
STREAM_CACHE_FILES = globals().setdefault("STREAM_CACHE_FILES", set())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grubberr tagging you since you had the git blame: why did this block only allow a single removal of the cache file? the current change is passing unit tests. Is that an issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada oops I have answered on below

STREAM_CACHE_FILES.add(self.cache_filename)
with suppress(FileNotFoundError):
os.remove(self.cache_filename)
print(f"Removed {self.cache_filename}")
Copy link
Contributor

@grubberr grubberr Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada

This can be dangerous, for example:

  1. We create Teams stream, on start it removes cache file and create it - file-inode-1
  2. We create TeamMembers(parent=Teams), on start it again removes cache file and create it file-inode-2
  3. Teams assume it has access to file-inode-1 but in realy it's already file-inode-2

I definitely remember there were some side-effects if we removed file not once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We start to catch sqlite3 runtime exceptions

Copy link
Contributor

@grubberr grubberr Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada let me cover that runtime side-effects with unit_tests today

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada example of problem from source-github:

File "/home/user/airbyte/airbyte-integrations/connectors/source-github/.venv/lib/python3.9/site-packages/requests_cache/backends/base.py", line 100, in save_response
    self.responses[cache_key] = cached_response
  File "/home/user/airbyte/airbyte-integrations/connectors/source-github/.venv/lib/python3.9/site-packages/requests_cache/backends/sqlite.py", line 268, in __setitem__
    super().__setitem__(key, serialized_value)
  File "/home/user/airbyte/airbyte-integrations/connectors/source-github/.venv/lib/python3.9/site-packages/requests_cache/backends/sqlite.py", line 220, in __setitem__
    con.execute(
sqlite3.OperationalError: attempt to write a readonly database

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada

I have dived into request_cache internals and it seems it happens something like this under the hood:

#!/home/user/airbyte/airbyte-integrations/connectors/source-github/.venv/bin/python3

import os
import sqlite3

try:
    os.unlink("cache.sqlite")
except FileNotFoundError:
    pass

con = sqlite3.connect("cache.sqlite")
row = con.execute('CREATE TABLE t (col VARCHAR)')
con.commit()

os.unlink("cache.sqlite") # <- ATTENTION HERE

con.execute("INSERT INTO t (col) VALUES ('value')");
Traceback (most recent call last):
  File "/home/user/airbyte/airbyte-integrations/connectors/source-github/./t.py", line 28, in <module>
    con.execute("INSERT INTO t (col) VALUES ('value')");
sqlite3.OperationalError: attempt to write a readonly database

@sherifnada sherifnada marked this pull request as draft January 10, 2023 00:19
@sherifnada sherifnada removed the request for review from a team January 10, 2023 00:20
@keu keu self-assigned this Sep 20, 2023
@keu
Copy link
Contributor

keu commented Oct 4, 2023

fixed with #30719

@keu keu closed this Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants