-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feat: Follow up on source calls (#609)
* wip: added testfile & started on environment (??) * wip: some todos * wip: start of source argument resolve * wip: added testfile & started on environment (??) * wip: some todos * wip: start of source argument resolve * refactor: start using sync r shell implementation * wip: rebase * refactor: start using sync r shell implementation * wip: use executeSingleSubStep for parsing sourced code * wip: added testfile & started on environment (??) * wip: some todos * wip: start of source argument resolve * wip: rebase * refactor: start using sync r shell implementation * wip: some todos * wip: use executeSingleSubStep for parsing sourced code * wip: fix merge issues * feat-fix: avoid cyclic dependency when using step executor * wip: run normalize and dataflow on sourced file * wip: some work on source dataflowing * refactor: remove print * refactor: clean up todos and move source to its own function * refactor: explicitly as in processSourceCall * refactor: damn u typescript * feat-fix: ensure we only parse built-in source calls * refactor: remove todo * feat: allow overriding the source file provider * test: start on source tests * refactor: overhaul source providers * refactor: generify source providers to RParseRequestProvider * test: added test for conditional source * refactor: properly handle missing/invalid sourced files * wip: test for recursive sources * feat: skip dataflow analysis for re-sourced references * wip: add another todo * refactor: use parse requests in dataflow processor info * refactor: first pass of reference chain impl * feat-fix: also catch normalize and dataflow errors * test: finished recursive source test * test: added test for non-constant source argument * test: added multi-source test * feat-fix: sourcing multiple files works correctly now * refactor: resolve review comments * test: reset the source provider to the default value after each describe * test-fix: reset the source provider in the source describe instead --------- Co-authored-by: Florian Sihler <florian.sihler@uni-ulm.de>
- Loading branch information
1 parent
2a72e25
commit ec104b8
Showing
15 changed files
with
467 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
import type {IdGenerator, NoInfo, RArgument, RParseRequest, RParseRequestProvider} from '../../../../r-bridge' | ||
import { requestFingerprint} from '../../../../r-bridge' | ||
import { sourcedDeterministicCountingIdGenerator} from '../../../../r-bridge' | ||
import {requestProviderFromFile} from '../../../../r-bridge' | ||
import {type NormalizedAst, type ParentInformation, removeTokenMapQuotationMarks, type RFunctionCall, RType} from '../../../../r-bridge' | ||
import {RShellExecutor} from '../../../../r-bridge/shell-executor' | ||
import {executeSingleSubStep} from '../../../../core' | ||
import {type DataflowProcessorInformation, processDataflowFor} from '../../../processor' | ||
import {type DataflowScopeName, type Identifier, overwriteEnvironments, type REnvironmentInformation, resolveByName} from '../../../environments' | ||
import type {DataflowInformation} from '../../info' | ||
import {dataflowLogger} from '../../../index' | ||
|
||
let sourceProvider = requestProviderFromFile() | ||
|
||
export function setSourceProvider(provider: RParseRequestProvider): void { | ||
sourceProvider = provider | ||
} | ||
|
||
export function isSourceCall(name: Identifier, scope: DataflowScopeName, environments: REnvironmentInformation): boolean { | ||
const definitions = resolveByName(name, scope, environments) | ||
if(definitions === undefined) { | ||
return false | ||
} | ||
// fail if there are multiple definitions because then we must treat the complete import as a maybe because it might do something different | ||
if(definitions.length !== 1) { | ||
return false | ||
} | ||
const def = definitions[0] | ||
return def.name == 'source' && def.kind == 'built-in-function' | ||
} | ||
|
||
export function processSourceCall<OtherInfo>(functionCall: RFunctionCall<OtherInfo & ParentInformation>, data: DataflowProcessorInformation<OtherInfo & ParentInformation>, information: DataflowInformation): DataflowInformation { | ||
const sourceFile = functionCall.arguments[0] as RArgument<ParentInformation> | undefined | ||
if(sourceFile?.value?.type == RType.String) { | ||
const path = removeTokenMapQuotationMarks(sourceFile.lexeme) | ||
const request = sourceProvider.createRequest(path) | ||
|
||
// check if the sourced file has already been dataflow analyzed, and if so, skip it | ||
if(data.referenceChain.includes(requestFingerprint(request))) { | ||
dataflowLogger.info(`Found loop in dataflow analysis for ${JSON.stringify(request)}: ${JSON.stringify(data.referenceChain)}, skipping further dataflow analysis`) | ||
return information | ||
} | ||
|
||
return sourceRequest(request, data, information, sourcedDeterministicCountingIdGenerator(path, functionCall.location)) | ||
} else { | ||
dataflowLogger.info(`Non-constant argument ${JSON.stringify(sourceFile)} for source is currently not supported, skipping`) | ||
return information | ||
} | ||
} | ||
|
||
export function sourceRequest<OtherInfo>(request: RParseRequest, data: DataflowProcessorInformation<OtherInfo & ParentInformation>, information: DataflowInformation, getId: IdGenerator<NoInfo>): DataflowInformation { | ||
const executor = new RShellExecutor() | ||
|
||
// parse, normalize and dataflow the sourced file | ||
let normalized: NormalizedAst<OtherInfo & ParentInformation> | ||
let dataflow: DataflowInformation | ||
try { | ||
const parsed = executeSingleSubStep('parse', request, executor) as string | ||
normalized = executeSingleSubStep('normalize', parsed, executor.getTokenMap(), undefined, getId) as NormalizedAst<OtherInfo & ParentInformation> | ||
dataflow = processDataflowFor(normalized.ast, { | ||
...data, | ||
currentRequest: request, | ||
environments: information.environments, | ||
referenceChain: [...data.referenceChain, requestFingerprint(request)] | ||
}) | ||
} catch(e) { | ||
dataflowLogger.warn(`Failed to analyze sourced file ${JSON.stringify(request)}, skipping: ${(e as Error).message}`) | ||
return information | ||
} | ||
|
||
// update our graph with the sourced file's information | ||
const newInformation = {...information} | ||
newInformation.environments = overwriteEnvironments(information.environments, dataflow.environments) | ||
newInformation.graph.mergeWith(dataflow.graph) | ||
// this can be improved, see issue #628 | ||
for(const [k, v] of normalized.idMap) { | ||
data.completeAst.idMap.set(k, v) | ||
} | ||
return newInformation | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
ec104b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"artificial" Benchmark Suite
Total per-file
1512.2355823181817
ms (3694.360896380434
)1511.494083
ms (3708.3225553463017
)1.00
Retrieve AST from R code
63.72869486363637
ms (123.81549012137785
)64.46248863636363
ms (125.66414120100016
)0.99
Normalize R AST
94.4223031818182
ms (153.7545470057254
)94.99519236363636
ms (152.9376581920758
)0.99
Produce dataflow information
67.760007
ms (172.61951563539108
)65.2556795909091
ms (167.18441854609554
)1.04
Run abstract interpretation
0.036419181818181816
ms (0.018520681738474185
)0.03478995454545455
ms (0.01086253065474186
)1.05
Total per-slice
1.9578501304386124
ms (1.4212083640995328
)1.8724288794806876
ms (1.3873679811565907
)1.05
Static slicing
1.494833880817364
ms (1.3461684991098852
)1.4074784311593942
ms (1.3118563756339259
)1.06
Reconstruct code
0.44712880173053754
ms (0.21014848413193918
)0.4524929302663976
ms (0.22636683004337768
)0.99
failed to reconstruct/re-parse
0
#0
#1
times hit threshold
0
#0
#1
reduction (characters)
0.7329390759026896
#0.7329390759026896
#1
reduction (normalized tokens)
0.720988345209971
#0.720988345209971
#1
This comment was automatically generated by workflow using github-action-benchmark.
ec104b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"social-science" Benchmark Suite
Total per-file
3780.83924088
ms (6262.620791394061
)3566.86774236
ms (5920.286185213901
)1.06
Retrieve AST from R code
72.47843676000001
ms (60.60667043532563
)72.22227936
ms (60.97026629229811
)1.00
Normalize R AST
112.02863244
ms (69.56951604022248
)113.02594858
ms (70.71306906384982
)0.99
Produce dataflow information
182.53878808000002
ms (284.4251735862355
)163.44175874
ms (276.9623037407309
)1.12
Run abstract interpretation
0.05759796
ms (0.02710947569390452
)0.06162254
ms (0.02971958909151336
)0.93
Total per-slice
9.230454502244271
ms (15.112408308435244
)8.599255365044066
ms (14.312877376595168
)1.07
Static slicing
8.704662440375671
ms (14.982894834915669
)8.071953766135923
ms (14.188089279803133
)1.08
Reconstruct code
0.5164380502435043
ms (0.26249066514714825
)0.5187709959800451
ms (0.27627204677573897
)1.00
failed to reconstruct/re-parse
9
#9
#1
times hit threshold
967
#967
#1
reduction (characters)
0.898713819973478
#0.898713819973478
#1
reduction (normalized tokens)
0.8579790415512589
#0.8579790415512589
#1
This comment was automatically generated by workflow using github-action-benchmark.