Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary string join and split #129370

Closed
rebornix opened this issue Jul 26, 2021 · 6 comments · Fixed by #131035
Closed

Avoid unnecessary string join and split #129370

rebornix opened this issue Jul 26, 2021 · 6 comments · Fixed by #131035
Assignees
Labels
debt Code quality issues insiders-released Patch has been released in VS Code Insiders notebook-ipynb
Milestone

Comments

@rebornix
Copy link
Member

rebornix commented Jul 26, 2021

function convertStreamOutput(output: NotebookCellOutput): JupyterOutput {
const outputs = output.items
.filter((opit) => opit.mime === CellOutputMimeTypes.stderr || opit.mime === CellOutputMimeTypes.stdout)
.map((opit) => convertOutputMimeToJupyterOutput(opit.mime, opit.data as Uint8Array) as string)
.reduceRight<string[]>((prev, curr) => (Array.isArray(curr) ? prev.concat(...curr) : prev.concat(curr)), []);
const streamType = getOutputStreamType(output) || 'stdout';
return {
output_type: 'stream',
name: streamType,
text: splitMultilineString(outputs.join(''))
};
}

Currently the stream output conversion (from VS Code types to Jupyter) does unnecessary V8 string concatenation and split, which slows down the conversion (using more memory and gc):

  • Stream output is either CellOutputMimeTypes.stderr or CellOutputMimeTypes.stdout, so convertOutputMimeToJupyterOutput will always return string. Using prev.concat(curr) will keep creating arrays
  • splitMultilineString(outputs.join('')) can slow down the process significantly. It firstly joins all the string, and then split by line breaks, this will trigger v8 to flatten the concatenated string (outputs.join('')) and double the memory usage.

We can probably run splitMultilineString on each output and concatenate last line of each output with the first line of next output (split first, then join).

@rebornix
Copy link
Member Author

const arr = str.split('\n');

Do we also want to split \r\n?

@roblourens
Copy link
Member

This way would preserve the \r if they exist, if they are expected to be preserved, I don't know

@roblourens
Copy link
Member

Discussed with @DonJayamanne, he plans to update this during debt week

@rebornix
Copy link
Member Author

image

Another potential improvement we can make for the serializer is indentation detection. Not sure how it's implemented but it has caused quite a lot of GC (which I think can be improved by checking whitespace and line break char codes).

@DonJayamanne
Copy link
Contributor

DonJayamanne commented Jul 26, 2021

can make for the serializer is indentation detection.

I agree, we can just look at the string immediately after the first { & before the next character.
Ideally it would be of the form:

{
    cell:....

Thus all we need to know is the white space between the first { and the next non-whiespace character.
& if there are no line feeds, then I think we can ignore indents completely.
Thoughts on that, @rebornix /cc

@rebornix rebornix mentioned this issue Aug 2, 2021
11 tasks
@roblourens
Copy link
Member

Discussed with @DonJayamanne, he plans to update this during debt week

Tbh I don't remember talking about this @DonJayamanne, but are you planning on looking into this? I also can. The string | string[] stuff here is a little confusing to me

I agree, we can just look at the string immediately after the first { & before the next character.

What does the indentation detection library do beyond that? This sounds fine to me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
debt Code quality issues insiders-released Patch has been released in VS Code Insiders notebook-ipynb
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@roblourens @rebornix @DonJayamanne @joyceerhl and others