-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serialize ContextItems in SerializedChatMessages #5910
base: main
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead allow subtypes more generally, and not add an explicit list like this?
@@ -65,7 +65,7 @@ export async function evaluateChatStrategy( | |||
const concisenessScore = await llm.judge(concisenessPrompt({ response: llmResponse })) | |||
const contextItems = query?.contextFiles?.map(i => ({ | |||
source: i.source, | |||
file: i.uri.path + `:${i.range?.start.line}-${i.range?.end.line}`, | |||
file: i.uri + `:${i.range?.start.line}-${i.range?.end.line}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What consumes this? Because if it looks for that :
separator for line ranges, it is going to encounter a :
in file:///
now.
public isStringType(type: scip.Type): boolean { | ||
public isStringType( | ||
type: scip.Type, | ||
options: TypecheckerOptions = { allowSubtyping: false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear to me what this means.
Consider the case immediately below:
if (type.has_constant_type) {
return type.constant_type.constant.has_string_constant
}
In that sense, "some literal" : string
has type "some literal"
and is a subtype of string
. So if allowSubtyping
is false
then we are returning an inaccurate result there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesmcnamara wrote this. @jamesmcnamara, can you pls reply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option allows us consider the a union type of T | undefined | null
(or some subset) as assignable to T
. Perhaps allowNullableSupertyping
would be better?
const sym = type.type_ref.symbol | ||
if ( | ||
options.allowSubtyping && | ||
[typescriptKeyword('undefined'), typescriptKeyword('null')].includes(sym) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about the TypeScript AST here: Is a type like T?
desugared to a representation like T | undefined
by this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesmcnamara wrote this. @jamesmcnamara, can you pls reply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, can confirm.
options.allowSubtyping && | ||
[typescriptKeyword('undefined'), typescriptKeyword('null')].includes(sym) | ||
) { | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If allow sub typing is true, this doesn't make sense to me, because string | undefined
is a supertype of string
... all values of string
inhabit string | undefined
...
Would it make sense to recast the function not as isStringType
but something about a bound on a type involving string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesmcnamara wrote this. @jamesmcnamara, can you pls reply?
}) | ||
) { | ||
diagnostic.message = dedent`${diagnostic.message} Note that these types are | ||
sub-type compatible, so if you want to allow subtyping for this class, add the symbol to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this were safe, why wouldn't we just do it everywhere all the time?
If this is not safe, then it seems dubious to encourage people to enable it without any provisos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesmcnamara wrote this. @jamesmcnamara, can you pls reply?
) | ||
} | ||
|
||
return false | ||
} | ||
|
||
public compatibleSignatures(a: scip.SymbolInformation, b: scip.SymbolInformation): boolean { | ||
if (this.isStringTypeInfo(a) && this.isStringTypeInfo(b)) { | ||
public compatibleSignatures( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am uncertain about this approach.
IF I understand it, you're saying that
Item = {
title?: string
description?: string
} & (HasTitle | HasDescription)
is safe because HasTitle
has a string field to discriminate on, and HasDescription
has a string field to discriminate on? But IIRC the downstream doesn't work that way... it discriminates on a specific field.
And in practice the escape hatch works for SerializedContextItem because there's no discrimination happening... the consumer just gets that big union type. But would things blow up if we had a method with a type like SerializedContextItemTypeA | SerializedContextItemTypeB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesmcnamara wrote this. @jamesmcnamara, can you pls reply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be any method descriptions in here because this is used to generate code from the agent protocol so it should only contain immediate data types.
A type alias like type AorB = SerializedContextItemTypeA | SerializedContextItemTypeB
should generate a discriminated union without any trouble.
The only issue we're trying to resolve here is the combination of a union type and an intersection type which is treated at the top level as an intersection type (as described in this comment). The current behavior is to check that every shared field in the sub-discriminated unions are identical and we throw an exception if they differ (even on nullability).
(Note that there is a bug in the existing implementation in that the output intersection type has all fields from all discriminated union members included, so:
type Intersect = {
name: string
} & ({title: string} | {description: string})
would produce:
type Intersect = {
name: string
title: string
description: string
}
instead of
type Intersect = {
name: string
title: string | undefined
description: string | undefined
}
and that is not changed in this implementation)
The only case we're trying to handle here is where the the types of a field on the intersection type and the sub-union types differ, but they differ only in their nullability (as with the above Item
type). 99% of the time, I think this was a mistake by the programmer, or this type should not be included in the agent protocol, hence it not being handled by default.
However in the 1% of cases where it is intentional, as with SerializedContextItem
which takes advantage of the type-refinedness on the TypeScript side and is reasonably included in the protocol for chat migrations and exports, we have 4 options:
- Define a protocol specific variant of
SerializedContextItem
(like we do forProtocolAuthStatus
) and convert every method coming out of the agent that uses that type to the simpler protocol variant. - Simplify the type on the TypeScript side to be consistent with the current code generation (would require inserting null handling in various parts of the app which would be misleading as they were never reachable)
- Generate the type as an union type, inlining all properties of the intersection type onto each variant of the union.
- Generate the type as an intersection type with each field using it's highest upper-bound supertype from the unions (or null).
1 is the most practical and probably the best practice but introduces an annoying maintenance burden of maintaining two types that are semantically identical but duplicated. 3 feels to me the most correct way but would require a careful, non-trivial implementation. 4 is what the current implementation does only for white-listed types (and indeed we see there is exactly one instance of this in the entire protocol so it's not likely a valuable case to invest in.)
@olafurpg's opinion was to use option 1 and in general to avoid allowing complex types to leak into the protocol definition. I implemented 4 because it was quick and allowed us to avoid the conversion types. But with this much pushback and the bug noted above, perhaps 1 is really the way we should go.
Previously, a SerializedChatMessage contained ContextItems, which had URI instances. This meant that the JSON serialization of SerializedChatMessage was (1) not stable, since
URI.toJSON
andvscode.Uri.toJSON
differ and are not standardized, and (2) not readable without special re-hydration. This seems to not have caused major problems, but it absolutely could have, such as if we tried to call.fsPath
(a dynamic getter) on a URI in a ChatMessage that we deserialized.This also paves the way to make it so that dehydration/hydration are not needed across the postMessage boundary.
Test plan
CI & e2e