Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: test cli not lambda handler #204

Conversation

Askir
Copy link
Contributor

@Askir Askir commented Nov 7, 2024

https://linear.app/timescale/issue/AI-83/vectorizer-tests-target-the-lambda-handler-and-not-the-cli

Also removing the lambda handler from the code base in the process.
I honestly think the fixtures are trying to be a bit too generic and it ends up being a bit complicated to read that way but I didn't really change that model for now.

@Askir Askir force-pushed the jascha/ai-83-vectorizer-tests-target-the-lambda-handler-and-not-the-cli branch from 61e8443 to 6b80e1c Compare November 8, 2024 02:06
@Askir Askir marked this pull request as ready for review November 8, 2024 02:13
@Askir Askir requested a review from a team as a code owner November 8, 2024 02:13
@Askir Askir force-pushed the jascha/ai-83-vectorizer-tests-target-the-lambda-handler-and-not-the-cli branch from 6b80e1c to f815594 Compare November 8, 2024 02:19
Comment on lines 20 to 21
with PostgresContainer("timescale/timescaledb-ha:pg16", driver=None) as postgres:
yield postgres
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could already configure the username, password, and db when creating the container. This streamlines some of the later stuff (will highlight).

Suggested change
with PostgresContainer("timescale/timescaledb-ha:pg16", driver=None) as postgres:
yield postgres
with PostgresContainer(
"timescale/timescaledb-ha:pg16",
username="tsdbquerier",
password="my-password",
dbname="tsdb",
driver=None) as postgres:
yield postgres

"""Creates a test database with pgai installed"""
role = "tsdbquerier"
password = "my-password"
db_host = cli_postgres_container._docker.host() # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this is a docker for mac workaround, and if it's necessary? I'm not sure it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need that for running the tests from the the github action runners containers.

Comment on lines 29 to 30
role = "tsdbquerier"
password = "my-password"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be removed with the change above

Comment on lines 52 to 61
"event_db_config": {
"host": db_host,
"port": int(
cli_postgres_container.get_exposed_port(cli_postgres_container.port)
),
"db_name": cli_postgres_container.dbname,
"ssl_mode": "disable",
"role": role,
"password": password,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole event_db_config field could be dropped. It's only used in cli_db_url, which could use cli_db["container"].get_connection_url().

Comment on lines 39 to 44
conn.execute(sql.SQL("DROP USER IF EXISTS {}").format(sql.Identifier(role)))
conn.execute(
sql.SQL("CREATE USER {} WITH SUPERUSER PASSWORD {}").format(
sql.Identifier(role), sql.Literal(password)
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary if we create the user with the container.

Comment on lines 68 to 69
config = cli_db["event_db_config"]
return f"postgres://{config['role']}:{config['password']}@{config['host']}:{config['port']}/{config['db_name']}?sslmode={config['ssl_mode']}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, this could become:

Suggested change
config = cli_db["event_db_config"]
return f"postgres://{config['role']}:{config['password']}@{config['host']}:{config['port']}/{config['db_name']}?sslmode={config['ssl_mode']}"
return cli_db["container"].get_connection_url()

Or (if we want to keep the custom "host" thing which I don't understand):

Suggested change
config = cli_db["event_db_config"]
return f"postgres://{config['role']}:{config['password']}@{config['host']}:{config['port']}/{config['db_name']}?sslmode={config['ssl_mode']}"
container = cli_db["container"]
return container.get_connection_url(container.get_docker_client().host())

Comment on lines +90 to +98
# Cleanup from previous runs
cur.execute("SELECT id FROM ai.vectorizer")
for row in cur.fetchall():
cur.execute("SELECT ai.drop_vectorizer(%s)", (row["id"],))

# Drop tables if they exist
cur.execute("DROP VIEW IF EXISTS blog_embedding")
cur.execute("DROP TABLE IF EXISTS blog_embedding_store")
cur.execute("DROP TABLE IF EXISTS blog")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we didn't have to manually clean up. One idea would be to create a new database for each test, so we know that we're dealing with a fresh db. We can do it as a small follow-up PR though.



@pytest.fixture(scope="session")
def cli_postgres_container():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the postgres_container fixture from conftest, or replace it?


# When running the worker with cassette matching original test params
cassette = (
f"openai-character_text_splitter-chunk_value-"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed now that we are not testing the recursive text splitter. We can do that in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I also noticed the original fixture seemed to be configured to basically be the same as the character text splitter right now. But agree, we should add one for actual recursive text splitting.

@Askir Askir force-pushed the jascha/ai-83-vectorizer-tests-target-the-lambda-handler-and-not-the-cli branch 4 times, most recently from d9e8fe0 to aae4500 Compare November 8, 2024 17:08
@Askir Askir force-pushed the jascha/ai-83-vectorizer-tests-target-the-lambda-handler-and-not-the-cli branch from aae4500 to ea828d9 Compare November 8, 2024 17:11
@Askir Askir merged commit 3a48f82 into main Nov 8, 2024
6 checks passed
@Askir Askir deleted the jascha/ai-83-vectorizer-tests-target-the-lambda-handler-and-not-the-cli branch November 8, 2024 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants