Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Follow up on source calls #609

Merged
merged 51 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
b11decd
wip: added testfile & started on environment (??)
Ellpeck Jan 22, 2024
006aa10
wip: some todos
Ellpeck Jan 22, 2024
e5268f5
wip: start of source argument resolve
Ellpeck Jan 22, 2024
969815e
wip: added testfile & started on environment (??)
Ellpeck Jan 22, 2024
50a461b
wip: some todos
Ellpeck Jan 22, 2024
552bf5d
wip: start of source argument resolve
Ellpeck Jan 22, 2024
74d71da
Merge remote-tracking branch 'origin/605-feat-follow-up-on-source-cal…
Ellpeck Jan 25, 2024
df033ab
refactor: start using sync r shell implementation
Ellpeck Jan 25, 2024
97761a7
wip: rebase
Ellpeck Jan 22, 2024
e6ec40f
refactor: start using sync r shell implementation
Ellpeck Jan 25, 2024
7f5a006
Merge remote-tracking branch 'origin/605-feat-follow-up-on-source-cal…
Ellpeck Jan 25, 2024
c1d1346
wip: use executeSingleSubStep for parsing sourced code
Ellpeck Jan 25, 2024
1eae7f6
wip: added testfile & started on environment (??)
Ellpeck Jan 22, 2024
a88f8bb
wip: some todos
Ellpeck Jan 22, 2024
83cb725
wip: start of source argument resolve
Ellpeck Jan 22, 2024
10699e8
wip: rebase
Ellpeck Jan 22, 2024
5f4f31e
refactor: start using sync r shell implementation
Ellpeck Jan 25, 2024
559d80c
wip: some todos
Ellpeck Jan 22, 2024
061aaea
wip: use executeSingleSubStep for parsing sourced code
Ellpeck Jan 25, 2024
3d74428
Merge remote-tracking branch 'origin/605-feat-follow-up-on-source-cal…
Ellpeck Jan 30, 2024
4f9e9ea
wip: fix merge issues
Ellpeck Jan 30, 2024
a0f8f00
feat-fix: avoid cyclic dependency when using step executor
Ellpeck Jan 30, 2024
9fc51f0
wip: run normalize and dataflow on sourced file
Ellpeck Jan 30, 2024
32b049e
wip: some work on source dataflowing
Ellpeck Jan 31, 2024
0392ad6
refactor: remove print
Ellpeck Jan 31, 2024
ac7eb1d
Merge branch 'main' into 605-feat-follow-up-on-source-calls
EagleoutIce Feb 1, 2024
b7add0c
refactor: clean up todos and move source to its own function
Ellpeck Feb 1, 2024
92c96ac
Merge remote-tracking branch 'origin/605-feat-follow-up-on-source-cal…
Ellpeck Feb 1, 2024
787fe00
refactor: explicitly as in processSourceCall
Ellpeck Feb 1, 2024
197c418
refactor: damn u typescript
Ellpeck Feb 1, 2024
136a8eb
feat-fix: ensure we only parse built-in source calls
Ellpeck Feb 1, 2024
f335eee
refactor: remove todo
Ellpeck Feb 1, 2024
1bdbe44
feat: allow overriding the source file provider
Ellpeck Feb 1, 2024
019d49c
test: start on source tests
Ellpeck Feb 1, 2024
345bf4c
refactor: overhaul source providers
Ellpeck Feb 1, 2024
7507a18
refactor: generify source providers to RParseRequestProvider
Ellpeck Feb 1, 2024
911d349
test: added test for conditional source
Ellpeck Feb 5, 2024
ba6dce2
refactor: properly handle missing/invalid sourced files
Ellpeck Feb 5, 2024
48c7928
wip: test for recursive sources
Ellpeck Feb 5, 2024
3f21bcf
feat: skip dataflow analysis for re-sourced references
Ellpeck Feb 5, 2024
53d69de
wip: add another todo
Ellpeck Feb 5, 2024
c0eb3fc
refactor: use parse requests in dataflow processor info
Ellpeck Feb 6, 2024
5bc6d08
refactor: first pass of reference chain impl
Ellpeck Feb 6, 2024
56a4047
feat-fix: also catch normalize and dataflow errors
Ellpeck Feb 6, 2024
11b625b
test: finished recursive source test
Ellpeck Feb 6, 2024
85dd0fd
test: added test for non-constant source argument
Ellpeck Feb 6, 2024
0c239af
test: added multi-source test
Ellpeck Feb 6, 2024
f6323c6
feat-fix: sourcing multiple files works correctly now
Ellpeck Feb 6, 2024
03b4618
refactor: resolve review comments
Ellpeck Feb 7, 2024
d1ea24a
test: reset the source provider to the default value after each describe
Ellpeck Feb 7, 2024
b5ddd9a
test-fix: reset the source provider in the source describe instead
Ellpeck Feb 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/cli/repl/commands/parse.ts
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@
}).allRemainingSteps()

const config = deepMergeObject<XmlParserConfig>(DEFAULT_XML_PARSER_CONFIG, { tokenMap: await shell.tokenMap() })
const object = await xlm2jsonObject(config, result.parse)
const object = xlm2jsonObject(config, result.parse)

Check warning on line 138 in src/cli/repl/commands/parse.ts

View check run for this annotation

Codecov / codecov/patch

src/cli/repl/commands/parse.ts#L138

Added line #L138 was not covered by tests

output.stdout(depthListToTextTree(toDepthMap(object, config), config, output.formatter))
}
Expand Down
4 changes: 2 additions & 2 deletions src/core/print/parse-printer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@

}

export async function parseToQuads(code: string, config: QuadSerializationConfiguration, parseConfig: XmlParserConfig): Promise<string> {
const obj = await xlm2jsonObject(parseConfig, code)
export function parseToQuads(code: string, config: QuadSerializationConfiguration, parseConfig: XmlParserConfig): string{
const obj = xlm2jsonObject(parseConfig, code)

Check warning on line 28 in src/core/print/parse-printer.ts

View check run for this annotation

Codecov / codecov/patch

src/core/print/parse-printer.ts#L28

Added line #L28 was not covered by tests
// recursively filter so that if the object contains one of the keys 'a', 'b' or 'c', all other keys are ignored
return serialize2quads(
filterObject(obj, new Set([parseConfig.attributeName, parseConfig.childrenName, parseConfig.contentName])) as XmlBasedJson,
Expand Down
4 changes: 2 additions & 2 deletions src/core/slicer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -204,11 +204,11 @@ export class SteppingSlicer<InterestedIn extends StepName | undefined = typeof L
break
case 1:
step = guardStep('normalize')
result = await executeSingleSubStep(step, this.results.parse as string, await this.shell.tokenMap(), this.hooks, this.getId)
result = executeSingleSubStep(step, this.results.parse as string, await this.shell.tokenMap(), this.hooks, this.getId)
break
case 2:
step = guardStep('dataflow')
result = executeSingleSubStep(step, this.results.normalize as NormalizedAst)
result = executeSingleSubStep(step, this.request, this.results.normalize as NormalizedAst)
break
case 3:
step = guardStep('ai')
Expand Down
2 changes: 1 addition & 1 deletion src/core/steps.ts
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ export const STEPS_PER_FILE = {
} satisfies IStep<typeof normalize>,
'dataflow': {
description: 'Construct the dataflow graph',
processor: produceDataFlowGraph,
processor: (r, a) => produceDataFlowGraph(r, a),
required: 'once-per-file',
printer: {
[StepOutputFormat.Internal]: internalPrinter,
Expand Down
8 changes: 8 additions & 0 deletions src/dataflow/environments/environment.ts
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,14 @@ export const DefaultEnvironmentMemory = new Map<Identifier, IdentifierDefinition
definedAt: BuiltIn,
name: 'print',
nodeId: BuiltIn
}]],
['source', [{
kind: 'built-in-function',
scope: GlobalScope,
used: 'always',
definedAt: BuiltIn,
name: 'source',
nodeId: BuiltIn
}]]
])

Expand Down
14 changes: 11 additions & 3 deletions src/dataflow/extractor.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import type { NormalizedAst, ParentInformation, RAssignmentOp, RBinaryOp} from '../r-bridge'
import type {NormalizedAst, ParentInformation, RAssignmentOp, RBinaryOp, RParseRequest} from '../r-bridge'
import { requestFingerprint} from '../r-bridge'
import { RType } from '../r-bridge'
import type { DataflowInformation } from './internal/info'
import type { DataflowProcessorInformation, DataflowProcessors} from './processor'
Expand Down Expand Up @@ -48,8 +49,15 @@ const processors: DataflowProcessors<any> = {
[RType.ExpressionList]: processExpressionList,
}

export function produceDataFlowGraph<OtherInfo>(ast: NormalizedAst<OtherInfo & ParentInformation>, initialScope: DataflowScopeName = LocalScope): DataflowInformation {
return processDataflowFor<OtherInfo>(ast.ast, { completeAst: ast, activeScope: initialScope, environments: initializeCleanEnvironments(), processors: processors as DataflowProcessors<OtherInfo & ParentInformation> })
export function produceDataFlowGraph<OtherInfo>(request: RParseRequest, ast: NormalizedAst<OtherInfo & ParentInformation>, initialScope: DataflowScopeName = LocalScope): DataflowInformation {
return processDataflowFor<OtherInfo>(ast.ast, {
completeAst: ast,
activeScope: initialScope,
environments: initializeCleanEnvironments(),
processors: processors as DataflowProcessors<OtherInfo & ParentInformation>,
currentRequest: request,
referenceChain: [requestFingerprint(request)]
})
}

export function processBinaryOp<OtherInfo>(node: RBinaryOp<OtherInfo & ParentInformation>, data: DataflowProcessorInformation<OtherInfo & ParentInformation>) {
Expand Down
22 changes: 14 additions & 8 deletions src/dataflow/internal/process/functions/function-call.ts
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import type { DataflowInformation } from '../../info'
import type { DataflowProcessorInformation} from '../../../processor'
import type {DataflowProcessorInformation} from '../../../processor'
import { processDataflowFor } from '../../../processor'
import { define, overwriteEnvironments, resolveByName } from '../../../environments'
import type { ParentInformation, RFunctionCall} from '../../../../r-bridge'
import { RType } from '../../../../r-bridge'
import {define, overwriteEnvironments, resolveByName} from '../../../environments'
import type {ParentInformation, RFunctionCall} from '../../../../r-bridge'
import { RType} from '../../../../r-bridge'
import { guard } from '../../../../util/assert'
import type { FunctionArgument } from '../../../index'
import type {FunctionArgument} from '../../../index'
import { DataflowGraph, dataflowLogger, EdgeType } from '../../../index'
import { linkArgumentsOnCall } from '../../linker'
import { LocalScope } from '../../../environments/scopes'
import {isSourceCall, processSourceCall} from './source'

export const UnnamedFunctionCallPrefix = 'unnamed-function-call-'

Expand Down Expand Up @@ -40,7 +41,6 @@ export function processFunctionCall<OtherInfo>(functionCall: RFunctionCall<Other
finalGraph.mergeWith(functionName.graph)
}


for(const arg of functionCall.arguments) {
if(arg === undefined) {
callArgs.push('empty')
Expand Down Expand Up @@ -107,13 +107,19 @@ export function processFunctionCall<OtherInfo>(functionCall: RFunctionCall<Other
inIds.push(...functionName.in, ...functionName.unknownReferences)
}

return {
let info: DataflowInformation = {
unknownReferences: [],
in: inIds,
out: functionName.out, // we do not keep argument out as it has been linked by the function
graph: finalGraph,
environments: finalEnv,
scope: data.activeScope
}
}

// parse a source call and analyze the referenced code
if(isSourceCall(functionCallName, data.activeScope,finalEnv)) {
info = processSourceCall(functionCall, data, info)
}

return info
}
80 changes: 80 additions & 0 deletions src/dataflow/internal/process/functions/source.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
import type {IdGenerator, NoInfo, RArgument, RParseRequest, RParseRequestProvider} from '../../../../r-bridge'
import { requestFingerprint} from '../../../../r-bridge'
import { sourcedDeterministicCountingIdGenerator} from '../../../../r-bridge'
import {requestProviderFromFile} from '../../../../r-bridge'
import {type NormalizedAst, type ParentInformation, removeTokenMapQuotationMarks, type RFunctionCall, RType} from '../../../../r-bridge'
import {RShellExecutor} from '../../../../r-bridge/shell-executor'
import {executeSingleSubStep} from '../../../../core'
import {type DataflowProcessorInformation, processDataflowFor} from '../../../processor'
import {type DataflowScopeName, type Identifier, overwriteEnvironments, type REnvironmentInformation, resolveByName} from '../../../environments'
import type {DataflowInformation} from '../../info'
import {dataflowLogger} from '../../../index'

let sourceProvider = requestProviderFromFile()

export function setSourceProvider(provider: RParseRequestProvider): void {
sourceProvider = provider
}

export function isSourceCall(name: Identifier, scope: DataflowScopeName, environments: REnvironmentInformation): boolean {
const definitions = resolveByName(name, scope, environments)
if(definitions === undefined) {
return false
}
// fail if there are multiple definitions because then we must treat the complete import as a maybe because it might do something different
if(definitions.length !== 1) {
return false

Check warning on line 26 in src/dataflow/internal/process/functions/source.ts

View check run for this annotation

Codecov / codecov/patch

src/dataflow/internal/process/functions/source.ts#L26

Added line #L26 was not covered by tests
}
const def = definitions[0]
return def.name == 'source' && def.kind == 'built-in-function'
}

export function processSourceCall<OtherInfo>(functionCall: RFunctionCall<OtherInfo & ParentInformation>, data: DataflowProcessorInformation<OtherInfo & ParentInformation>, information: DataflowInformation): DataflowInformation {
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
const sourceFile = functionCall.arguments[0] as RArgument<ParentInformation> | undefined
if(sourceFile?.value?.type == RType.String) {
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
const path = removeTokenMapQuotationMarks(sourceFile.lexeme)
const request = sourceProvider.createRequest(path)

// check if the sourced file has already been dataflow analyzed, and if so, skip it
if(data.referenceChain.includes(requestFingerprint(request))) {
dataflowLogger.info(`Found loop in dataflow analysis for ${JSON.stringify(request)}: ${JSON.stringify(data.referenceChain)}, skipping further dataflow analysis`)
return information
}

return sourceRequest(request, data, information, sourcedDeterministicCountingIdGenerator(path, functionCall.location))
} else {
dataflowLogger.info(`Non-constant argument ${JSON.stringify(sourceFile)} for source is currently not supported, skipping`)
return information
}
}

export function sourceRequest<OtherInfo>(request: RParseRequest, data: DataflowProcessorInformation<OtherInfo & ParentInformation>, information: DataflowInformation, getId: IdGenerator<NoInfo>): DataflowInformation {
const executor = new RShellExecutor()

// parse, normalize and dataflow the sourced file
let normalized: NormalizedAst<OtherInfo & ParentInformation>
let dataflow: DataflowInformation
try {
const parsed = executeSingleSubStep('parse', request, executor) as string
normalized = executeSingleSubStep('normalize', parsed, executor.getTokenMap(), undefined, getId) as NormalizedAst<OtherInfo & ParentInformation>
dataflow = processDataflowFor(normalized.ast, {
...data,
currentRequest: request,
environments: information.environments,
referenceChain: [...data.referenceChain, requestFingerprint(request)]
})
} catch(e) {
dataflowLogger.warn(`Failed to analyze sourced file ${JSON.stringify(request)}, skipping: ${(e as Error).message}`)
return information
}

// update our graph with the sourced file's information
const newInformation = {...information}
newInformation.environments = overwriteEnvironments(information.environments, dataflow.environments)
newInformation.graph.mergeWith(dataflow.graph)
// this can be improved, see issue #628
for(const [k, v] of normalized.idMap) {
data.completeAst.idMap.set(k, v)
}
return newInformation
}
22 changes: 14 additions & 8 deletions src/dataflow/processor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import type {
NormalizedAst,
ParentInformation, RNode,
RNodeWithParent
RNodeWithParent, RParseRequest
} from '../r-bridge'
import type { DataflowInformation } from './internal/info'
import type { DataflowScopeName, REnvironmentInformation } from './environments'
Expand All @@ -13,20 +13,29 @@ export interface DataflowProcessorInformation<OtherInfo> {
/**
* Initial and frozen ast-information
*/
readonly completeAst: NormalizedAst<OtherInfo>
readonly completeAst: NormalizedAst<OtherInfo>
/**
* Correctly contains pushed local scopes introduced by `function` scopes.
* Will by default *not* contain any symbol-bindings introduces along the way, they have to be decorated when moving up the tree.
*/
readonly environments: REnvironmentInformation
readonly environments: REnvironmentInformation
/**
* Name of the currently active scope, (hopefully) always {@link LocalScope | Local}
*/
readonly activeScope: DataflowScopeName
readonly activeScope: DataflowScopeName
/**
* Other processors to be called by the given functions
*/
readonly processors: DataflowProcessors<OtherInfo>
readonly processors: DataflowProcessors<OtherInfo>
/**
* The {@link RParseRequest} that is currently being parsed
*/
readonly currentRequest: RParseRequest
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
/**
* The chain of {@link RParseRequest} fingerprints ({@link requestFingerprint}) that lead to the {@link currentRequest}.
* The most recent (last) entry is expected to always be the {@link currentRequest}.
*/
readonly referenceChain: string[]
}

export type DataflowProcessor<OtherInfo, NodeType extends RNodeWithParent<OtherInfo>> = (node: NodeType, data: DataflowProcessorInformation<OtherInfo>) => DataflowInformation
Expand Down Expand Up @@ -55,6 +64,3 @@ export type DataflowProcessors<OtherInfo> = {
export function processDataflowFor<OtherInfo>(current: RNodeWithParent<OtherInfo>, data: DataflowProcessorInformation<OtherInfo & ParentInformation>): DataflowInformation {
return data.processors[current.type](current as never, data)
}



5 changes: 5 additions & 0 deletions src/r-bridge/lang-4.x/ast/model/processing/decorate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ export function deterministicCountingIdGenerator(start = 0): () => NodeId {
return () => `${id++}`
}

export function sourcedDeterministicCountingIdGenerator(path: string, location: SourceRange, start = 0): () => NodeId {
let id = start
return () => `${path}-${loc2Id(location)}-${id++}`
}

function loc2Id(loc: SourceRange) {
return `${loc.start.line}:${loc.start.column}-${loc.end.line}:${loc.end.column}`
}
Expand Down
10 changes: 7 additions & 3 deletions src/r-bridge/lang-4.x/ast/parser/xml/internal/xml-to-json.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,11 @@ import type { XmlBasedJson } from '../input-format'
* @param config - The configuration to use (i.e., what names should be used for the attributes, children, ...)
* @param xmlString - The xml input to parse
*/
export function xlm2jsonObject(config: XmlParserConfig, xmlString: string): Promise<XmlBasedJson> {
return xml2js.parseStringPromise(xmlString, {
export function xlm2jsonObject(config: XmlParserConfig, xmlString: string): XmlBasedJson {
let result: XmlBasedJson = {}
xml2js.parseString(xmlString, {
// we want this to be strictly synchronous so that the result can be returned immediately below!
async: false,
attrkey: config.attributeName,
charkey: config.contentName,
childkey: config.childrenName,
Expand All @@ -22,5 +25,6 @@ export function xlm2jsonObject(config: XmlParserConfig, xmlString: string): Prom
includeWhiteChars: true,
normalize: false,
strict: true
}) as Promise<XmlBasedJson>
}, (_, r)=> result = r as XmlBasedJson)
return result
}
4 changes: 2 additions & 2 deletions src/r-bridge/lang-4.x/ast/parser/xml/parser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,12 @@ export const parseLog = log.getSubLogger({ name: 'ast-parser' })
*
* @returns The normalized and decorated AST (i.e., as a doubly linked tree)
*/
export async function normalize(xmlString: string, tokenMap: TokenMap, hooks?: DeepPartial<XmlParserHooks>, getId: IdGenerator<NoInfo> = deterministicCountingIdGenerator(0)): Promise<NormalizedAst> {
export function normalize(xmlString: string, tokenMap: TokenMap, hooks?: DeepPartial<XmlParserHooks>, getId: IdGenerator<NoInfo> = deterministicCountingIdGenerator(0)): NormalizedAst {
const config = { ...DEFAULT_XML_PARSER_CONFIG, tokenMap }
const hooksWithDefaults = deepMergeObject(DEFAULT_PARSER_HOOKS, hooks) as XmlParserHooks

const data: ParserData = { config, hooks: hooksWithDefaults, currentRange: undefined, currentLexeme: undefined }
const object = await xlm2jsonObject(config, xmlString)
const object = xlm2jsonObject(config, xmlString)

return decorateAst(parseRootObjToAst(data, object), getId)
}
Loading
Loading