Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance fixes for #659 #684

Merged
merged 31 commits into from
Feb 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
037ca14
refactor: construct children collection once
Ellpeck Feb 23, 2024
5ddf704
lint-fix: specify return type of isRoot directly
Ellpeck Feb 23, 2024
65cb830
refactor: further performance & style improvements
Ellpeck Feb 23, 2024
2253b82
refactor: helpless panic
EagleoutIce Feb 23, 2024
e0bf428
refactor: another iteration of urgh
EagleoutIce Feb 23, 2024
63c19a2
Revert "refactor: another iteration of urgh"
EagleoutIce Feb 23, 2024
b77cebe
refactor: swapping tries for gods sake
EagleoutIce Feb 23, 2024
9fb0a12
Revert "refactor: swapping tries for gods sake"
EagleoutIce Feb 23, 2024
bac3419
Reapply "refactor: another iteration of urgh"
EagleoutIce Feb 23, 2024
f07a90f
refactor(wip): interim mental breakdown
EagleoutIce Feb 23, 2024
857751e
refactor: at least... happy tests?
EagleoutIce Feb 24, 2024
65f5563
refactor: lock jsonlite options
EagleoutIce Feb 24, 2024
7d919c9
refactor(cfg-test): remove fake test
EagleoutIce Feb 24, 2024
578c64e
refactor: try manual conversion, is worse
EagleoutIce Feb 24, 2024
9e71efa
Revert "refactor: try manual conversion, is worse"
EagleoutIce Feb 24, 2024
85ac86b
refactor: simplify json conversion
EagleoutIce Feb 24, 2024
22dcb66
refactor: several minor performance tunes :3
EagleoutIce Feb 24, 2024
0c913c8
refactor: ensure the executor dies directly (and gracefully)
EagleoutIce Feb 24, 2024
96c56fa
refactor: dynamic `flowr_get` compile
EagleoutIce Feb 24, 2024
55fcc92
refactor: fine-tune cmp retrieval
EagleoutIce Feb 24, 2024
150378e
refactor: reduce size of json produce, defer obj map to flowR
EagleoutIce Feb 24, 2024
19228b5
doc(test): remove dead jsdoc comment
EagleoutIce Feb 24, 2024
abf3036
refactor: clean up tmp logfiles
EagleoutIce Feb 24, 2024
50483cb
lint-fix: handle minor linter problems
EagleoutIce Feb 24, 2024
e1d92b7
refactor: wordly clean-up
EagleoutIce Feb 24, 2024
f95735d
feat: jsonlite begone!
EagleoutIce Feb 24, 2024
86950af
refactor: further improve print-out-performance
EagleoutIce Feb 24, 2024
91c1edc
refactor: always compile seems to be faster!
EagleoutIce Feb 24, 2024
a942ae4
refactor: use build for benchmark run
EagleoutIce Feb 24, 2024
b521086
refactor: switch back to `sort`
EagleoutIce Feb 24, 2024
de3b7d0
feat(tests): cover for optional xmlparsedata availability
EagleoutIce Feb 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/actions/setup/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,8 @@ runs:
uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ inputs.r-version }}

- name: Install R packages
if: ${{ inputs.r-version != '' }}
shell: Rscript {0}
run: install.packages("xmlparsedata", repos="https://cloud.r-project.org/")
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"slicer": "ts-node src/cli/slicer-app.ts",
"release": "release-it --ci",
"benchmark-helper": "ts-node src/cli/benchmark-helper-app.ts",
"benchmark": "ts-node src/cli/benchmark-app.ts",
"benchmark": "npm run build && node dist/src/cli/benchmark-app.js",
"summarizer": "ts-node src/cli/summarizer-app.ts",
"export-quads": "ts-node src/cli/export-quads-app.ts",
"build": "tsc --project .",
Expand Down
1 change: 0 additions & 1 deletion src/benchmark/slicer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,6 @@ export class BenchmarkSlicer {
this.loadedXml = await this.measureCommonStep('parse', 'retrieve AST from R code')
this.normalizedAst = await this.measureCommonStep('normalize', 'normalize R AST')
this.dataflow = await this.measureCommonStep('dataflow', 'produce dataflow information')
this.ai = await this.measureCommonStep('ai', 'run abstract interpretation')

this.stepper.switchToSliceStage()

Expand Down
5 changes: 2 additions & 3 deletions src/benchmark/stats/stats.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
import type { SingleSlicingCriterion, SlicingCriteria } from '../../slicing'
import type { ReconstructionResult, SingleSlicingCriterion, SlicingCriteria } from '../../slicing'
import type { NodeId, RParseRequestFromFile, RParseRequestFromText } from '../../r-bridge'
import type { ReconstructionResult } from '../../slicing'

export const CommonSlicerMeasurements = ['initialize R session', 'retrieve AST from R code', 'normalize R AST', 'produce dataflow information', 'run abstract interpretation', 'close R session', 'total'] as const
export const CommonSlicerMeasurements = ['initialize R session', 'retrieve AST from R code', 'normalize R AST', 'produce dataflow information', 'close R session', 'total'] as const
export type CommonSlicerMeasurements = typeof CommonSlicerMeasurements[number]

export const PerSliceMeasurements = ['static slicing', 'reconstruct code', 'total'] as const
Expand Down
2 changes: 1 addition & 1 deletion src/cli/benchmark-helper-app.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ async function benchmark() {
const fileStat = fs.statSync(options.input)
guard(fileStat.isFile(), `File ${options.input} does not exist or is no file`)

const request = { request: 'file', content: options.input } as RParseRequestFromFile
const request: RParseRequestFromFile = { request: 'file', content: options.input }

const slicer = new BenchmarkSlicer()
try {
Expand Down
9 changes: 3 additions & 6 deletions src/cli/repl/commands/parse.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
import type { XmlBasedJson} from '../../../r-bridge'
import {childrenKey} from '../../../r-bridge'
import {attributesKey, contentKey} from '../../../r-bridge'
import {
parseCSV
} from '../../../r-bridge'
import {getKeysGuarded, RawRType, requestFromInput} from '../../../r-bridge'
import {
extractLocation,
Expand All @@ -14,8 +11,8 @@
import { FontStyles } from '../../../statistics'
import type { ReplCommand } from './main'
import { SteppingSlicer } from '../../../core'
import {csvToRecord} from '../../../r-bridge/lang-4.x/ast/parser/csv/format'
import {convertToXmlBasedJson} from '../../../r-bridge/lang-4.x/ast/parser/csv/parser'
import {prepareParsedData} from '../../../r-bridge/lang-4.x/ast/parser/json/format'
import {convertPreparedParsedData} from '../../../r-bridge/lang-4.x/ast/parser/json/parser'

type DepthList = { depth: number, node: XmlBasedJson, leaf: boolean }[]

Expand Down Expand Up @@ -132,7 +129,7 @@
request: requestFromInput(remainingLine.trim())
}).allRemainingSteps()

const object = convertToXmlBasedJson(csvToRecord(parseCSV(result.parse)))
const object = convertPreparedParsedData(prepareParsedData(result.parse))

Check warning on line 132 in src/cli/repl/commands/parse.ts

View check run for this annotation

Codecov / codecov/patch

src/cli/repl/commands/parse.ts#L132

Added line #L132 was not covered by tests

output.stdout(depthListToTextTree(toDepthMap(object), output.formatter))
}
Expand Down
3 changes: 1 addition & 2 deletions src/cli/repl/server/connection.ts
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,7 @@ export class FlowRServerConnection {
results: {
parse: await printStepResult('parse', results.parse as string, StepOutputFormat.RdfQuads, config()),
normalize: await printStepResult('normalize', results.normalize as NormalizedAst, StepOutputFormat.RdfQuads, config()),
dataflow: await printStepResult('dataflow', results.dataflow as DataflowInformation, StepOutputFormat.RdfQuads, config()),
ai: ''
dataflow: await printStepResult('dataflow', results.dataflow as DataflowInformation, StepOutputFormat.RdfQuads, config())
}
})
} else {
Expand Down
2 changes: 1 addition & 1 deletion src/core/input.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ interface BaseSteppingSlicerInput<InterestedIn extends StepName | undefined> ext
autoSelectIf?: AutoSelectPredicate
}

interface NormalizeSteppingSlicerInput<InterestedIn extends 'ai' | 'dataflow' | 'normalize'> extends BaseSteppingSlicerInput<InterestedIn> {
interface NormalizeSteppingSlicerInput<InterestedIn extends 'dataflow' | 'normalize'> extends BaseSteppingSlicerInput<InterestedIn> {
stepOfInterest: InterestedIn
}

Expand Down
3 changes: 1 addition & 2 deletions src/core/output.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ type StepResultsHelper<InterestedIn extends StepName> = {
'parse': Out<'parse'>
'normalize': StepResultsHelper<'parse'> & Out<'normalize'>
'dataflow': StepResultsHelper<'normalize'> & Out<'dataflow'>
'ai': StepResultsHelper<'dataflow'> & Out<'ai'>
'slice': StepResultsHelper<'ai'> & Out<'slice'>
'slice': StepResultsHelper<'dataflow'> & Out<'slice'>
'reconstruct': StepResultsHelper<'slice'> & Out<'reconstruct'>
}[InterestedIn]
7 changes: 3 additions & 4 deletions src/core/print/parse-printer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@
import { serialize2quads } from '../../util/quads'
import type { XmlBasedJson} from '../../r-bridge'
import {attributesKey, childrenKey, contentKey} from '../../r-bridge'
import {parseCSV} from '../../r-bridge'
import {csvToRecord} from '../../r-bridge/lang-4.x/ast/parser/csv/format'
import {convertToXmlBasedJson} from '../../r-bridge/lang-4.x/ast/parser/csv/parser'
import {prepareParsedData} from '../../r-bridge/lang-4.x/ast/parser/json/format'
import {convertPreparedParsedData} from '../../r-bridge/lang-4.x/ast/parser/json/parser'

function filterObject(obj: XmlBasedJson, keys: Set<string>): XmlBasedJson[] | XmlBasedJson {
if(typeof obj !== 'object') {
Expand All @@ -28,7 +27,7 @@
}

export function parseToQuads(code: string, config: QuadSerializationConfiguration): string{
const obj = convertToXmlBasedJson(csvToRecord(parseCSV(code)))
const obj = convertPreparedParsedData(prepareParsedData(code))

Check warning on line 30 in src/core/print/parse-printer.ts

View check run for this annotation

Codecov / codecov/patch

src/core/print/parse-printer.ts#L30

Added line #L30 was not covered by tests
// recursively filter so that if the object contains one of the keys 'a', 'b' or 'c', all other keys are ignored
return serialize2quads(
filterObject(obj, new Set([attributesKey, childrenKey, contentKey])) as XmlBasedJson,
Expand Down
8 changes: 2 additions & 6 deletions src/core/slicer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -211,15 +211,11 @@ export class SteppingSlicer<InterestedIn extends StepName | undefined = typeof L
result = executeSingleSubStep(step, this.request, this.results.normalize as NormalizedAst)
break
case 3:
step = guardStep('ai')
result = executeSingleSubStep(step, this.results.normalize as NormalizedAst, this.results.dataflow as DataflowInformation)
break
case 4:
guard(this.criterion !== undefined, 'Cannot decode criteria without a criterion')
step = guardStep('slice')
result = executeSingleSubStep(step, (this.results.ai as DataflowInformation).graph, this.results.normalize as NormalizedAst, this.criterion)
result = executeSingleSubStep(step, (this.results.dataflow as DataflowInformation).graph, this.results.normalize as NormalizedAst, this.criterion)
break
case 5:
case 4:
step = guardStep('reconstruct')
result = executeSingleSubStep(step, this.results.normalize as NormalizedAst<NoInfo>, (this.results.slice as SliceResult).result)
break
Expand Down
22 changes: 6 additions & 16 deletions src/core/steps.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
*/

import type { MergeableRecord } from '../util/objects'
import { retrieveCsvFromRCode } from '../r-bridge'
import { retrieveParseDataFromRCode } from '../r-bridge'
import { produceDataFlowGraph } from '../dataflow'
import { reconstructToCode, staticSlicing } from '../slicing'
import type { IStepPrinter} from './print/print'
Expand All @@ -33,9 +33,7 @@ import {
dataflowGraphToMermaidUrl,
dataflowGraphToQuads
} from './print/dataflow-printer'
import type {DataflowInformation} from '../dataflow/internal/info'
import type {runAbstractInterpretation} from '../abstract-interpretation/processor'
import {normalize} from '../r-bridge/lang-4.x/ast/parser/csv/parser'
import {normalize} from '../r-bridge/lang-4.x/ast/parser/json/parser'

/**
* This represents close a function that we know completely nothing about.
Expand Down Expand Up @@ -72,14 +70,14 @@ export interface IStep<
export const STEPS_PER_FILE = {
'parse': {
description: 'Parse the given R code into an AST',
processor: retrieveCsvFromRCode,
processor: retrieveParseDataFromRCode,
required: 'once-per-file',
printer: {
[StepOutputFormat.Internal]: internalPrinter,
[StepOutputFormat.Json]: text => text,
[StepOutputFormat.RdfQuads]: parseToQuads
}
} satisfies IStep<typeof retrieveCsvFromRCode>,
} satisfies IStep<typeof retrieveParseDataFromRCode>,
'normalize': {
description: 'Normalize the AST to flowR\'s AST (first step of the normalization)',
processor: normalize,
Expand All @@ -103,15 +101,7 @@ export const STEPS_PER_FILE = {
[StepOutputFormat.Mermaid]: dataflowGraphToMermaid,
[StepOutputFormat.MermaidUrl]: dataflowGraphToMermaidUrl
}
} satisfies IStep<typeof produceDataFlowGraph>,
'ai': {
description: 'Run abstract interpretation',
processor: (_, dfInfo: DataflowInformation) => dfInfo, // Use runAbstractInterpretation here when it's ready
required: 'once-per-file',
printer: {
[StepOutputFormat.Internal]: internalPrinter
}
} satisfies IStep<typeof runAbstractInterpretation>
} satisfies IStep<typeof produceDataFlowGraph>
} as const

export const STEPS_PER_SLICE = {
Expand All @@ -134,7 +124,7 @@ export const STEPS_PER_SLICE = {
} as const

export const STEPS = { ...STEPS_PER_FILE, ...STEPS_PER_SLICE } as const
export const LAST_PER_FILE_STEP = 'ai' as const
export const LAST_PER_FILE_STEP = 'dataflow' as const
export const LAST_STEP = 'reconstruct' as const

export type StepName = keyof typeof STEPS
Expand Down
2 changes: 1 addition & 1 deletion src/r-bridge/lang-4.x/ast/index.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
export * from './model'
export * from './parser/xml'
export {parseLog} from './parser/csv/parser'
export {parseLog} from './parser/json/parser'
36 changes: 0 additions & 36 deletions src/r-bridge/lang-4.x/ast/parser/csv/format.ts

This file was deleted.

41 changes: 41 additions & 0 deletions src/r-bridge/lang-4.x/ast/parser/json/format.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import { removeTokenMapQuotationMarks } from '../../../../retriever'
import { guard } from '../../../../../util/assert'

export const RootId = 0

export interface Entry extends Record<string, unknown> {
line1: number,
col1: number,
line2: number,
col2: number,
id: number,
parent: number,
token: string,
terminal: boolean,
text: string,
children?: Entry[]
}

type ParsedDataRow = [line1: number, col1: number, line2: number, col2: number, id: number, parent: number, token: string, terminal: boolean, text: string]

export function prepareParsedData(data: string): Map<number, Entry> {
const json: unknown = JSON.parse(data)
guard(Array.isArray(json), () => `Expected ${data} to be an array but was not`)

const ret = new Map<number, Entry>((json as ParsedDataRow[]).map(([line1, col1, line2, col2, id, parent, token, terminal, text]) => {
return [id, { line1, col1, line2, col2, id, parent, token: removeTokenMapQuotationMarks(token), terminal, text }] satisfies [number, Entry]
}))

// iterate a second time to set parent-child relations (since they may be out of order in the csv)
for(const entry of ret.values()) {
if(entry.parent != RootId) {
const parent = ret.get(entry.parent)
if(parent) {
parent.children ??= []
parent.children.push(entry)
}
}
}

return ret
}
Original file line number Diff line number Diff line change
Expand Up @@ -7,34 +7,36 @@
import type {IdGenerator, NoInfo} from '../../model'
import {decorateAst, deterministicCountingIdGenerator, type NormalizedAst} from '../../model'
import {deepMergeObject} from '../../../../../util/objects'
import type {CsvEntry} from './format'
import {csvToRecord, getChildren, type ParsedCsv} from './format'
import {parseCSV} from '../../../values'
import type { Entry} from './format'
import { RootId, prepareParsedData } from './format'
import {parseRootObjToAst} from '../xml/internal'
import {log} from '../../../../../util/log'

export const parseLog = log.getSubLogger({name: 'ast-parser'})

export function normalize(csvString: string, hooks?: DeepPartial<XmlParserHooks>, getId: IdGenerator<NoInfo> = deterministicCountingIdGenerator(0)): NormalizedAst {
export function normalize(jsonString: string, hooks?: DeepPartial<XmlParserHooks>, getId: IdGenerator<NoInfo> = deterministicCountingIdGenerator(0)): NormalizedAst {
const hooksWithDefaults = deepMergeObject(DEFAULT_PARSER_HOOKS, hooks) as XmlParserHooks

const data: ParserData = { hooks: hooksWithDefaults, currentRange: undefined, currentLexeme: undefined }
const object = convertToXmlBasedJson(csvToRecord(parseCSV(csvString)))
const object = convertPreparedParsedData(prepareParsedData(jsonString))

return decorateAst(parseRootObjToAst(data, object), getId)
}

export function convertToXmlBasedJson(csv: ParsedCsv): XmlBasedJson{
export function convertPreparedParsedData(valueMapping: Map<number, Entry>): XmlBasedJson {
const exprlist: XmlBasedJson = {}
exprlist[nameKey] = 'exprlist'
exprlist[childrenKey] = Object.values(csv)
// we convert all roots, which are entries with parent 0
.filter(v => v.parent == 0)
.map(v => convertEntry(v, csv))
const children = []
for(const entry of valueMapping.values()) {
if(entry.parent == RootId) {
children.push(convertEntry(entry))
}
}
exprlist[childrenKey] = children
return {'exprlist': exprlist}
}

function convertEntry(csvEntry: CsvEntry, csv: ParsedCsv): XmlBasedJson {
function convertEntry(csvEntry: Entry): XmlBasedJson {
const xmlEntry: XmlBasedJson = {}

xmlEntry[attributesKey] = {
Expand All @@ -49,19 +51,18 @@
}

// check and recursively iterate children
const children = getChildren(csv, csvEntry)
if(children && children.length > 0){
xmlEntry[childrenKey] = children
if(csvEntry.children && csvEntry.children.length > 0){
xmlEntry[childrenKey] = csvEntry.children
// we sort children the same way xmlparsedata does (by line, by column, by inverse end line, by inverse end column, by terminal state, by combined "start" tiebreaker value)
// (https://github.com/r-lib/xmlparsedata/blob/main/R/package.R#L153C72-L153C78)
.sort((c1,c2) => c1.line1-c2.line1 || c1.col1-c2.col1 || c2.line2-c1.line2 || c2.col2-c1.col2 || Number(c1.terminal)-Number(c2.terminal) || sortTiebreak(c1)-sortTiebreak(c2))
.map(c => convertEntry(c, csv))
.map(convertEntry)
}

return xmlEntry
}

function sortTiebreak(entry: CsvEntry){
function sortTiebreak(entry: Entry) {

Check warning on line 65 in src/r-bridge/lang-4.x/ast/parser/json/parser.ts

View check run for this annotation

Codecov / codecov/patch

src/r-bridge/lang-4.x/ast/parser/json/parser.ts#L65

Added line #L65 was not covered by tests
// see https://github.com/r-lib/xmlparsedata/blob/main/R/package.R#L110C5-L110C11
return entry.line1 * (Math.max(entry.col1, entry.col2) + 1) + entry.col1
}
Loading
Loading