Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialize ContextItems in SerializedChatMessages #5910

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sqs
Copy link
Member

@sqs sqs commented Oct 15, 2024

Previously, a SerializedChatMessage contained ContextItems, which had URI instances. This meant that the JSON serialization of SerializedChatMessage was (1) not stable, since URI.toJSON and vscode.Uri.toJSON differ and are not standardized, and (2) not readable without special re-hydration. This seems to not have caused major problems, but it absolutely could have, such as if we tried to call .fsPath (a dynamic getter) on a URI in a ChatMessage that we deserialized.

This also paves the way to make it so that dehydration/hydration are not needed across the postMessage boundary.

Test plan

CI & e2e

@sqs sqs requested review from dominiccooney and a team October 15, 2024 18:02
Copy link

‼️ Hey @sourcegraph/cody-security, please review this PR carefully as it introduces the usage of an unsafe_ function or abuses PromptString.

Copy link
Contributor

@dominiccooney dominiccooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead allow subtypes more generally, and not add an explicit list like this?

@@ -65,7 +65,7 @@ export async function evaluateChatStrategy(
const concisenessScore = await llm.judge(concisenessPrompt({ response: llmResponse }))
const contextItems = query?.contextFiles?.map(i => ({
source: i.source,
file: i.uri.path + `:${i.range?.start.line}-${i.range?.end.line}`,
file: i.uri + `:${i.range?.start.line}-${i.range?.end.line}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What consumes this? Because if it looks for that : separator for line ranges, it is going to encounter a : in file:/// now.

public isStringType(type: scip.Type): boolean {
public isStringType(
type: scip.Type,
options: TypecheckerOptions = { allowSubtyping: false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me what this means.

Consider the case immediately below:

if (type.has_constant_type) {
  return type.constant_type.constant.has_string_constant
}

In that sense, "some literal" : string has type "some literal" and is a subtype of string. So if allowSubtyping is false then we are returning an inaccurate result there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesmcnamara wrote this. @jamesmcnamara, can you pls reply?

Copy link
Contributor

@jamesmcnamara jamesmcnamara Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option allows us consider the a union type of T | undefined | null (or some subset) as assignable to T. Perhaps allowNullableSupertyping would be better?

const sym = type.type_ref.symbol
if (
options.allowSubtyping &&
[typescriptKeyword('undefined'), typescriptKeyword('null')].includes(sym)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about the TypeScript AST here: Is a type like T? desugared to a representation like T | undefined by this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesmcnamara wrote this. @jamesmcnamara, can you pls reply?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, can confirm.

options.allowSubtyping &&
[typescriptKeyword('undefined'), typescriptKeyword('null')].includes(sym)
) {
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If allow sub typing is true, this doesn't make sense to me, because string | undefined is a supertype of string... all values of string inhabit string | undefined...

Would it make sense to recast the function not as isStringType but something about a bound on a type involving string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesmcnamara wrote this. @jamesmcnamara, can you pls reply?

})
) {
diagnostic.message = dedent`${diagnostic.message} Note that these types are
sub-type compatible, so if you want to allow subtyping for this class, add the symbol to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were safe, why wouldn't we just do it everywhere all the time?

If this is not safe, then it seems dubious to encourage people to enable it without any provisos.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesmcnamara wrote this. @jamesmcnamara, can you pls reply?

)
}

return false
}

public compatibleSignatures(a: scip.SymbolInformation, b: scip.SymbolInformation): boolean {
if (this.isStringTypeInfo(a) && this.isStringTypeInfo(b)) {
public compatibleSignatures(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am uncertain about this approach.

IF I understand it, you're saying that

Item = {
    title?: string
    description?: string
} & (HasTitle | HasDescription)

is safe because HasTitle has a string field to discriminate on, and HasDescription has a string field to discriminate on? But IIRC the downstream doesn't work that way... it discriminates on a specific field.

And in practice the escape hatch works for SerializedContextItem because there's no discrimination happening... the consumer just gets that big union type. But would things blow up if we had a method with a type like SerializedContextItemTypeA | SerializedContextItemTypeB?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesmcnamara wrote this. @jamesmcnamara, can you pls reply?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any method descriptions in here because this is used to generate code from the agent protocol so it should only contain immediate data types.

A type alias like type AorB = SerializedContextItemTypeA | SerializedContextItemTypeB should generate a discriminated union without any trouble.

The only issue we're trying to resolve here is the combination of a union type and an intersection type which is treated at the top level as an intersection type (as described in this comment). The current behavior is to check that every shared field in the sub-discriminated unions are identical and we throw an exception if they differ (even on nullability).

(Note that there is a bug in the existing implementation in that the output intersection type has all fields from all discriminated union members included, so:

type Intersect = {
  name: string
} & ({title: string} | {description: string})

would produce:

type Intersect = {
  name: string
  title: string
  description: string
}

instead of

type Intersect = {
  name: string
  title: string | undefined
  description: string | undefined
}

and that is not changed in this implementation)

The only case we're trying to handle here is where the the types of a field on the intersection type and the sub-union types differ, but they differ only in their nullability (as with the above Item type). 99% of the time, I think this was a mistake by the programmer, or this type should not be included in the agent protocol, hence it not being handled by default.

However in the 1% of cases where it is intentional, as with SerializedContextItem which takes advantage of the type-refinedness on the TypeScript side and is reasonably included in the protocol for chat migrations and exports, we have 4 options:

  1. Define a protocol specific variant of SerializedContextItem (like we do for ProtocolAuthStatus) and convert every method coming out of the agent that uses that type to the simpler protocol variant.
  2. Simplify the type on the TypeScript side to be consistent with the current code generation (would require inserting null handling in various parts of the app which would be misleading as they were never reachable)
  3. Generate the type as an union type, inlining all properties of the intersection type onto each variant of the union.
  4. Generate the type as an intersection type with each field using it's highest upper-bound supertype from the unions (or null).

1 is the most practical and probably the best practice but introduces an annoying maintenance burden of maintaining two types that are semantically identical but duplicated. 3 feels to me the most correct way but would require a careful, non-trivial implementation. 4 is what the current implementation does only for white-listed types (and indeed we see there is exactly one instance of this in the entire protocol so it's not likely a valuable case to invest in.)

@olafurpg's opinion was to use option 1 and in general to avoid allowing complex types to leak into the protocol definition. I implemented 4 because it was quick and allowed us to avoid the conversion types. But with this much pushback and the bug noted above, perhaps 1 is really the way we should go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants