Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow retries for insertions #4026

Merged
merged 14 commits into from
Jul 30, 2024
Merged

feat: allow retries for insertions #4026

merged 14 commits into from
Jul 30, 2024

Conversation

RogerHYang
Copy link
Contributor

resolves #3828

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jul 26, 2024
@RogerHYang RogerHYang changed the title feat: add retries for insertions feat: allow retries for insertions Jul 26, 2024
tests/conftest.py Outdated Show resolved Hide resolved
Copy link
Contributor

@axiomofjoy axiomofjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm having trouble grokking the difference between insertables and precursors/ parcels.

Comment on lines +187 to +192
time_cutoff = time.time() + 1
while not fut.done() and time.time() < time_cutoff:
time.sleep(0.01)
if fut.done():
return fut.result()
raise TimeoutError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, is it possible to make the timeout shorter to prevent any individual test from taking too long?

Copy link
Contributor Author

@RogerHYang RogerHYang Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at 0.1, the next order of magnitude, i have run into some flakiness, so i think 1 is probably fine for now

for column in entity.__table__.c:
if column.name in ["created_at", "updated_at"]:
continue
k = "metadata_" if column.name == "metadata" else column.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work to use column.key instead?

from phoenix.db import models

column = models.Experiment.metadata_
print(f"{column.key=}")
print(f"{column.name=}")

Prints:

column.key='metadata_'
column.name='metadata'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column.key is actually still "metadata"

src/phoenix/server/api/routers/v1/evaluations.py Outdated Show resolved Hide resolved
src/phoenix/server/api/routers/v1/evaluations.py Outdated Show resolved Hide resolved
src/phoenix/server/api/routers/v1/evaluations.py Outdated Show resolved Hide resolved
src/phoenix/db/insertion/span_annotation.py Show resolved Hide resolved
src/phoenix/db/insertion/span_annotation.py Outdated Show resolved Hide resolved
Comment on lines 103 to 109
return select(
existing_traces.c.id,
existing_traces.c.trace_id,
table.id,
table.name,
table.updated_at,
).outerjoin_from(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as a good place to use with_only_columns to simplify the return type of the function. Same for trace and doc annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after reading the docs, i'm still not clear how it can be used

src/phoenix/db/insertion/types.py Outdated Show resolved Hide resolved
Copy link
Contributor

@axiomofjoy axiomofjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the complexity of this PR is understanding the types and terminology (Received and Postponed as a generic types, Precursor vs. Insertable, parcel as an instance of Received). It would make sense to me if we had stateful classes for the entities to be inserted (e.g., TraceAnnotation, SpanAnnotation) that had a state attribute (RECEIVED, POSTPONED), an add_foreign_key method, and a boolean ready_for_insertion attribute. Sort of feels like the types are being used to make everything immutable and avoid stateful objects, but at the cost of introducing a lot of concepts and terminology.

@RogerHYang
Copy link
Contributor Author

It would make sense to me if we had stateful classes [...] [T]he types are being used to [...] avoid stateful objects, but at the cost of introducing a lot of concepts and terminology.

Consider the following.

  1. Types and states are isomorphic. Having objects A(), B(), C() is equivalent to having objects Letter(name="A"), Letter(name="B"), Letter(name="C"), which in turn is equivalent to having objects Thing(name="Letter A"), Thing(name="Letter B"), Thing(name="Letter C"). They're equivalent because from the point of view of the actual computation, all classes and objects are just purely syntactical contrivances.
  2. Removing classes does not remove the underlying concepts and complexities. Is having objects A(), B(), C() unnecessarily more complicated than having objects Thing(name="Letter A"), Thing(name="Letter B"), Thing(name="Letter C")? Yes, it is indeed unnecessarily more complicated because, from the point of view of the actual computation, it is totally irrelevant and extraneous.
  3. So what's the point of having A(), B(), C() instead of having Thing(name="Letter A"), Thing(name="Letter B"), Thing(name="Letter C")? Isn't having just one Thing better than having three things? It is because now we can get the type-checker involved: type-checkers can't check states. In general, having the type-checker involved is better than not having it involved, and that's one of the key reasons for why we want to have objects in the first place. Therefore, having A(), B(), C() is a logical extension of this line of reasoning.

@RogerHYang RogerHYang merged commit 13af3b5 into main Jul 30, 2024
11 checks passed
@RogerHYang RogerHYang deleted the insertion-retries branch July 30, 2024 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feedback] in-memory backup queue processing of annotations
2 participants