Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new wasm rewriter w/ orchestrion #5494

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
Draft

Conversation

bengl
Copy link
Collaborator

@bengl bengl commented Mar 27, 2025

What does this PR do?

Adds support for langchain via Orchestrion-JS, by way of the new wasm-js-rewriter, which replaces the old native-iast-rewriter. The new rewriter contains both Orchestrion-JS and IAST rewriting for taint tracking.

Motivation

Prior to Orchestrion-JS, we were unable to support langchain when used in an ESM app.

Additional Notes

Right now this PR depends on a branch of the rewriter. Once the branch is merged on the rewriter, a cleanup commit will be added that removes things like yarn link and git dependencies.

getRewriterOriginalPathAndLineFromSourceMap =
getGetOriginalPathAndLineFromSourceMapFunction(chainSourceMap, getOriginalPathAndLineFromSourceMap)
}
setGetOriginalPathAndLineFromSourceMapFunction(chainSourceMap, iastRewriter)

rewriter = new Rewriter({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to move this outside IAST ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but not in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we supposed to have one single orchestrion.yml file or one for each library?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One single one.

That said, I can have it generated by having a separate one for each library and just combining them. That's relatively easy. Would you prefer that?

Copy link

@dubloom dubloom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed the part related to the rewriter, main comment:

  • the rewriter part should be extracted as soon as possible from the taint-tracking (I get this is like that to release it quickly). I can help if needed.

telemetryVerbosity,
chainSourceMap
}

try {
Module.register('./rewriter-esm.mjs', {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is not going to be specific thing for iast anymore, should we move this or at least import this from initialize.mjs or loader-hooks.mjs?

For iast we can assume that the rewriter is not working if node version is <v20.6.0, but can whole orchestrion assume that? Asking because module.register was added in 20.6.0.

Keep in mind that this code is not executed if there is no loader configured before, we added it for iast because there isn't any need to rewrite esm customer code when esm is not properly configured.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about this with @rochdev and the biggest issue with doing this early on in the loading process is that we don't have config. We need to see if config.iast.enabled before initializing the rewriter. That might be a breaking change. We can probably figure out a way of initializing at least that config early on, but changing how all that works is still a bigger change, and I'm not sure yet if it's breaking. What I've done instead is ensure that the existing langchain integration (i.e. by traditional RITM instrumentation) still works, so we're not reliant on Orchestrion for that. That way this PR is not a breaking change, and we can figure out how to enable Orchestrion for older Node.js at a later time, maybe in v6. How does that sound?

Copy link
Collaborator

@uurien uurien Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to see if config.iast.enabled before initializing the rewriter.

oh, I'm stupid, yes. It was also one of the reasons why I wrote this here originally.

Keep in mind that this code is not executed if there is no loader configured before, we added it for iast because there isn't any need to rewrite esm customer code when esm is not properly configured.

I emphasize on this, this code is not executed if customer has not --loader configured or initialize.mjs loaded.
Is something that you can assume for now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is something that you can assume for now?

That's fine for now. We can adjust this later if needed, but for now we just assume this way of enabling ESM support was used.

@bengl
Copy link
Collaborator Author

bengl commented Mar 28, 2025

@dubloom

the rewriter part should be extracted as soon as possible from the taint-tracking (I get this is like that to release it quickly).

To reduce PR noise, I'd like to do that as a separate PR after this one. Is that alright?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants