Avoid unnecessary string join and split #129370

rebornix · 2021-07-26T05:14:51Z

Lines 364 to 377 in 4e5d793

    
           function convertStreamOutput(output: NotebookCellOutput): JupyterOutput { 
        
           	const outputs = output.items 
        
           		.filter((opit) => opit.mime === CellOutputMimeTypes.stderr || opit.mime === CellOutputMimeTypes.stdout) 
        
           		.map((opit) => convertOutputMimeToJupyterOutput(opit.mime, opit.data as Uint8Array) as string) 
        
           		.reduceRight<string[]>((prev, curr) => (Array.isArray(curr) ? prev.concat(...curr) : prev.concat(curr)), []); 
        
           	const streamType = getOutputStreamType(output) || 'stdout'; 
        
           	return { 
        
           		output_type: 'stream', 
        
           		name: streamType, 
        
           		text: splitMultilineString(outputs.join('')) 
        
           	}; 
        
           }

Currently the stream output conversion (from VS Code types to Jupyter) does unnecessary V8 string concatenation and split, which slows down the conversion (using more memory and gc):

Stream output is either CellOutputMimeTypes.stderr or CellOutputMimeTypes.stdout, so convertOutputMimeToJupyterOutput will always return string. Using prev.concat(curr) will keep creating arrays
splitMultilineString(outputs.join('')) can slow down the process significantly. It firstly joins all the string, and then split by line breaks, this will trigger v8 to flatten the concatenated string (outputs.join('')) and double the memory usage.

We can probably run splitMultilineString on each output and concatenate last line of each output with the first line of next output (split first, then join).

The text was updated successfully, but these errors were encountered:

rebornix · 2021-07-26T05:18:18Z

vscode/extensions/ipynb/src/helpers.ts

Line 171 in 4e5d793

const arr = str.split('\n');

Do we also want to split \r\n?

roblourens · 2021-07-26T16:37:07Z

This way would preserve the \r if they exist, if they are expected to be preserved, I don't know

roblourens · 2021-07-26T19:37:20Z

Discussed with @DonJayamanne, he plans to update this during debt week

rebornix · 2021-07-26T20:56:32Z

Another potential improvement we can make for the serializer is indentation detection. Not sure how it's implemented but it has caused quite a lot of GC (which I think can be improved by checking whitespace and line break char codes).

DonJayamanne · 2021-07-26T21:14:46Z

can make for the serializer is indentation detection.

I agree, we can just look at the string immediately after the first { & before the next character.
Ideally it would be of the form:

{
    cell:....

Thus all we need to know is the white space between the first { and the next non-whiespace character.
& if there are no line feeds, then I think we can ignore indents completely.
Thoughts on that, @rebornix /cc

roblourens · 2021-08-06T04:03:39Z

Discussed with @DonJayamanne, he plans to update this during debt week

Tbh I don't remember talking about this @DonJayamanne, but are you planning on looking into this? I also can. The string | string[] stuff here is a little confusing to me

I agree, we can just look at the string immediately after the first { & before the next character.

What does the indentation detection library do beyond that? This sounds fine to me.

rebornix assigned joyceerhl and roblourens Jul 26, 2021

rebornix mentioned this issue Jul 26, 2021

Ensure ipynb serializer supports node environments and large # of outputs #129282

Merged

rebornix added the debt Code quality issues label Jul 26, 2021

roblourens assigned DonJayamanne Jul 26, 2021

roblourens added this to the August 2021 milestone Jul 26, 2021

rebornix added the notebook-ipynb label Jul 26, 2021

rebornix mentioned this issue Aug 2, 2021

Notebook performance #129972

Closed

11 tasks

kieferrm mentioned this issue Aug 8, 2021

Jupyter: Iteration Plan for August 2021 microsoft/vscode-jupyter#7008

Closed

12 tasks

DonJayamanne mentioned this issue Aug 17, 2021

Perf improvements to the ipynb serializer #131035

Merged

DonJayamanne closed this as completed in #131035 Aug 24, 2021

github-actions bot locked and limited conversation to collaborators Oct 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid unnecessary string join and split #129370

Avoid unnecessary string join and split #129370

rebornix commented Jul 26, 2021 •

edited

Loading

rebornix commented Jul 26, 2021

roblourens commented Jul 26, 2021

roblourens commented Jul 26, 2021

rebornix commented Jul 26, 2021

DonJayamanne commented Jul 26, 2021 •

edited

Loading

roblourens commented Aug 6, 2021

Avoid unnecessary string join and split #129370

Avoid unnecessary string join and split #129370

Comments

rebornix commented Jul 26, 2021 • edited Loading

rebornix commented Jul 26, 2021

roblourens commented Jul 26, 2021

roblourens commented Jul 26, 2021

rebornix commented Jul 26, 2021

DonJayamanne commented Jul 26, 2021 • edited Loading

roblourens commented Aug 6, 2021

rebornix commented Jul 26, 2021 •

edited

Loading

DonJayamanne commented Jul 26, 2021 •

edited

Loading