Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Follow up on source calls #609

Merged
merged 51 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
b11decd
wip: added testfile & started on environment (??)
Ellpeck Jan 22, 2024
006aa10
wip: some todos
Ellpeck Jan 22, 2024
e5268f5
wip: start of source argument resolve
Ellpeck Jan 22, 2024
969815e
wip: added testfile & started on environment (??)
Ellpeck Jan 22, 2024
50a461b
wip: some todos
Ellpeck Jan 22, 2024
552bf5d
wip: start of source argument resolve
Ellpeck Jan 22, 2024
74d71da
Merge remote-tracking branch 'origin/605-feat-follow-up-on-source-cal…
Ellpeck Jan 25, 2024
df033ab
refactor: start using sync r shell implementation
Ellpeck Jan 25, 2024
97761a7
wip: rebase
Ellpeck Jan 22, 2024
e6ec40f
refactor: start using sync r shell implementation
Ellpeck Jan 25, 2024
7f5a006
Merge remote-tracking branch 'origin/605-feat-follow-up-on-source-cal…
Ellpeck Jan 25, 2024
c1d1346
wip: use executeSingleSubStep for parsing sourced code
Ellpeck Jan 25, 2024
1eae7f6
wip: added testfile & started on environment (??)
Ellpeck Jan 22, 2024
a88f8bb
wip: some todos
Ellpeck Jan 22, 2024
83cb725
wip: start of source argument resolve
Ellpeck Jan 22, 2024
10699e8
wip: rebase
Ellpeck Jan 22, 2024
5f4f31e
refactor: start using sync r shell implementation
Ellpeck Jan 25, 2024
559d80c
wip: some todos
Ellpeck Jan 22, 2024
061aaea
wip: use executeSingleSubStep for parsing sourced code
Ellpeck Jan 25, 2024
3d74428
Merge remote-tracking branch 'origin/605-feat-follow-up-on-source-cal…
Ellpeck Jan 30, 2024
4f9e9ea
wip: fix merge issues
Ellpeck Jan 30, 2024
a0f8f00
feat-fix: avoid cyclic dependency when using step executor
Ellpeck Jan 30, 2024
9fc51f0
wip: run normalize and dataflow on sourced file
Ellpeck Jan 30, 2024
32b049e
wip: some work on source dataflowing
Ellpeck Jan 31, 2024
0392ad6
refactor: remove print
Ellpeck Jan 31, 2024
ac7eb1d
Merge branch 'main' into 605-feat-follow-up-on-source-calls
EagleoutIce Feb 1, 2024
b7add0c
refactor: clean up todos and move source to its own function
Ellpeck Feb 1, 2024
92c96ac
Merge remote-tracking branch 'origin/605-feat-follow-up-on-source-cal…
Ellpeck Feb 1, 2024
787fe00
refactor: explicitly as in processSourceCall
Ellpeck Feb 1, 2024
197c418
refactor: damn u typescript
Ellpeck Feb 1, 2024
136a8eb
feat-fix: ensure we only parse built-in source calls
Ellpeck Feb 1, 2024
f335eee
refactor: remove todo
Ellpeck Feb 1, 2024
1bdbe44
feat: allow overriding the source file provider
Ellpeck Feb 1, 2024
019d49c
test: start on source tests
Ellpeck Feb 1, 2024
345bf4c
refactor: overhaul source providers
Ellpeck Feb 1, 2024
7507a18
refactor: generify source providers to RParseRequestProvider
Ellpeck Feb 1, 2024
911d349
test: added test for conditional source
Ellpeck Feb 5, 2024
ba6dce2
refactor: properly handle missing/invalid sourced files
Ellpeck Feb 5, 2024
48c7928
wip: test for recursive sources
Ellpeck Feb 5, 2024
3f21bcf
feat: skip dataflow analysis for re-sourced references
Ellpeck Feb 5, 2024
53d69de
wip: add another todo
Ellpeck Feb 5, 2024
c0eb3fc
refactor: use parse requests in dataflow processor info
Ellpeck Feb 6, 2024
5bc6d08
refactor: first pass of reference chain impl
Ellpeck Feb 6, 2024
56a4047
feat-fix: also catch normalize and dataflow errors
Ellpeck Feb 6, 2024
11b625b
test: finished recursive source test
Ellpeck Feb 6, 2024
85dd0fd
test: added test for non-constant source argument
Ellpeck Feb 6, 2024
0c239af
test: added multi-source test
Ellpeck Feb 6, 2024
f6323c6
feat-fix: sourcing multiple files works correctly now
Ellpeck Feb 6, 2024
03b4618
refactor: resolve review comments
Ellpeck Feb 7, 2024
d1ea24a
test: reset the source provider to the default value after each describe
Ellpeck Feb 7, 2024
b5ddd9a
test-fix: reset the source provider in the source describe instead
Ellpeck Feb 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/cli/repl/commands/parse.ts
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ export const parseCommand: ReplCommand = {
}).allRemainingSteps()

const config = deepMergeObject<XmlParserConfig>(DEFAULT_XML_PARSER_CONFIG, { tokenMap: await shell.tokenMap() })
const object = await xlm2jsonObject(config, result.parse)
const object = xlm2jsonObject(config, result.parse)

output.stdout(depthListToTextTree(toDepthMap(object, config), config, output.formatter))
}
Expand Down
4 changes: 2 additions & 2 deletions src/core/print/parse-printer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ function filterObject(obj: XmlBasedJson, keys: Set<string>): XmlBasedJson[] | Xm

}

export async function parseToQuads(code: string, config: QuadSerializationConfiguration, parseConfig: XmlParserConfig): Promise<string> {
const obj = await xlm2jsonObject(parseConfig, code)
export function parseToQuads(code: string, config: QuadSerializationConfiguration, parseConfig: XmlParserConfig): string{
const obj = xlm2jsonObject(parseConfig, code)
// recursively filter so that if the object contains one of the keys 'a', 'b' or 'c', all other keys are ignored
return serialize2quads(
filterObject(obj, new Set([parseConfig.attributeName, parseConfig.childrenName, parseConfig.contentName])) as XmlBasedJson,
Expand Down
2 changes: 1 addition & 1 deletion src/core/slicer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ export class SteppingSlicer<InterestedIn extends StepName | undefined = typeof L
break
case 1:
step = guardStep('normalize')
result = await executeSingleSubStep(step, this.results.parse as string, await this.shell.tokenMap(), this.hooks, this.getId)
result = executeSingleSubStep(step, this.results.parse as string, await this.shell.tokenMap(), this.hooks, this.getId)
break
case 2:
step = guardStep('dataflow')
Expand Down
2 changes: 1 addition & 1 deletion src/core/steps.ts
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ export const STEPS_PER_FILE = {
} satisfies IStep<typeof normalize>,
'dataflow': {
description: 'Construct the dataflow graph',
processor: produceDataFlowGraph,
processor: a => produceDataFlowGraph(a),
required: 'once-per-file',
printer: {
[StepOutputFormat.Internal]: internalPrinter,
Expand Down
8 changes: 8 additions & 0 deletions src/dataflow/environments/environment.ts
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,14 @@ export const DefaultEnvironmentMemory = new Map<Identifier, IdentifierDefinition
definedAt: BuiltIn,
name: 'print',
nodeId: BuiltIn
}]],
['source', [{
kind: 'built-in-function',
scope: GlobalScope,
used: 'always',
definedAt: BuiltIn,
name: 'source',
nodeId: BuiltIn
}]]
])

Expand Down
9 changes: 8 additions & 1 deletion src/dataflow/extractor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,14 @@ const processors: DataflowProcessors<any> = {
}

export function produceDataFlowGraph<OtherInfo>(ast: NormalizedAst<OtherInfo & ParentInformation>, initialScope: DataflowScopeName = LocalScope): DataflowInformation {
return processDataflowFor<OtherInfo>(ast.ast, { completeAst: ast, activeScope: initialScope, environments: initializeCleanEnvironments(), processors: processors as DataflowProcessors<OtherInfo & ParentInformation> })
return processDataflowFor<OtherInfo>(ast.ast, {
completeAst: ast,
activeScope: initialScope,
environments: initializeCleanEnvironments(),
processors: processors as DataflowProcessors<OtherInfo & ParentInformation>,
currentPath: 'initial',
sourceReferences: new Map<string, string[]>()
})
}

export function processBinaryOp<OtherInfo>(node: RBinaryOp<OtherInfo & ParentInformation>, data: DataflowProcessorInformation<OtherInfo & ParentInformation>) {
Expand Down
20 changes: 13 additions & 7 deletions src/dataflow/internal/process/functions/function-call.ts
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import type { DataflowInformation } from '../../info'
import type { DataflowProcessorInformation} from '../../../processor'
import type {DataflowProcessorInformation} from '../../../processor'
import { processDataflowFor } from '../../../processor'
import { define, overwriteEnvironments, resolveByName } from '../../../environments'
import type { ParentInformation, RFunctionCall} from '../../../../r-bridge'
import { RType } from '../../../../r-bridge'
import {define, overwriteEnvironments, resolveByName} from '../../../environments'
import type {ParentInformation, RFunctionCall} from '../../../../r-bridge'
import { RType} from '../../../../r-bridge'
import { guard } from '../../../../util/assert'
import type { FunctionArgument } from '../../../index'
import type {FunctionArgument} from '../../../index'
import { DataflowGraph, dataflowLogger, EdgeType } from '../../../index'
import { linkArgumentsOnCall } from '../../linker'
import { LocalScope } from '../../../environments/scopes'
import {isSourceCall, processSourceCall} from './source'

export const UnnamedFunctionCallPrefix = 'unnamed-function-call-'

Expand Down Expand Up @@ -40,7 +41,6 @@ export function processFunctionCall<OtherInfo>(functionCall: RFunctionCall<Other
finalGraph.mergeWith(functionName.graph)
}


for(const arg of functionCall.arguments) {
if(arg === undefined) {
callArgs.push('empty')
Expand Down Expand Up @@ -107,13 +107,19 @@ export function processFunctionCall<OtherInfo>(functionCall: RFunctionCall<Other
inIds.push(...functionName.in, ...functionName.unknownReferences)
}

return {
let info: DataflowInformation = {
unknownReferences: [],
in: inIds,
out: functionName.out, // we do not keep argument out as it has been linked by the function
graph: finalGraph,
environments: finalEnv,
scope: data.activeScope
}

// parse a source call and analyze the referenced code
if(isSourceCall(functionCallName, data.activeScope,finalEnv))
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
info = processSourceCall(functionCall, data, info)

return info
}

63 changes: 63 additions & 0 deletions src/dataflow/internal/process/functions/source.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import type {RArgument, RParseRequestProvider} from '../../../../r-bridge'
import {requestProviderFromFile} from '../../../../r-bridge'
import {fileNameDeterministicCountingIdGenerator, type NormalizedAst, type ParentInformation, removeTokenMapQuotationMarks, type RFunctionCall, RType} from '../../../../r-bridge'
import {RShellExecutor} from '../../../../r-bridge/shell-executor'
import {executeSingleSubStep} from '../../../../core'
import {type DataflowProcessorInformation, processDataflowFor} from '../../../processor'
import {type DataflowScopeName, type Identifier, overwriteEnvironments, type REnvironmentInformation, resolveByName} from '../../../environments'
import type {DataflowInformation} from '../../info'
import {dataflowLogger} from '../../../index'

let sourceProvider = requestProviderFromFile()

export function setSourceProvider(provider: RParseRequestProvider): void {
sourceProvider = provider
}

export function isSourceCall(name: Identifier, scope: DataflowScopeName, environments: REnvironmentInformation): boolean {
if(name != 'source')
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
return false
const definitions = resolveByName(name, scope, environments)
return definitions !== undefined && definitions.some(d => d.kind == 'built-in-function')
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
}

export function processSourceCall<OtherInfo>(functionCall: RFunctionCall<OtherInfo & ParentInformation>, data: DataflowProcessorInformation<OtherInfo & ParentInformation>, information: DataflowInformation): DataflowInformation {
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
const sourceFile = functionCall.arguments[0] as RArgument<ParentInformation> | undefined
if(sourceFile?.value?.type == RType.String) {
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
const executor = new RShellExecutor()
const path = removeTokenMapQuotationMarks(sourceFile.lexeme)
const request = sourceProvider.createRequest(path)

// TODO we shouldn't skip a re-analysis *always*, just when it's a cycle - right?
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
// check if the sourced file has already been dataflow analyzed, and if so, skip it
if(data.sourceReferences.has(path)) {
dataflowLogger.info(`Sourced file ${path} was already dataflow analyzed, skipping`)
return information
}

// parse, normalize and dataflow the sourced file
let parsed: string
try {
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
parsed = executeSingleSubStep('parse', request, executor) as string
} catch(e) {
dataflowLogger.warn(`Failed to parse sourced file ${path}, ignoring: ${(e as Error).message}`)
return information
}

// make the currently analyzed file remember that it already referenced the path
data.sourceReferences.set(data.currentPath, [...(data.sourceReferences.get(data.currentPath) ?? []), path])

const normalized = executeSingleSubStep('normalize', parsed, executor.getTokenMap(), undefined, fileNameDeterministicCountingIdGenerator(path)) as NormalizedAst<OtherInfo & ParentInformation>
const dataflow = processDataflowFor(normalized.ast, {...data, currentPath: path, environments: information.environments})

// update our graph with the sourced file's information
const newInformation = {...information}
newInformation.environments = overwriteEnvironments(information.environments, dataflow.environments)
newInformation.graph.mergeWith(dataflow.graph)
// this can be improved, see issue #628
for(const [k, v] of normalized.idMap)
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
data.completeAst.idMap.set(k, v)
return newInformation
}
return information
}
14 changes: 7 additions & 7 deletions src/dataflow/processor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,23 @@ export interface DataflowProcessorInformation<OtherInfo> {
/**
* Initial and frozen ast-information
*/
readonly completeAst: NormalizedAst<OtherInfo>
readonly completeAst: NormalizedAst<OtherInfo>
/**
* Correctly contains pushed local scopes introduced by `function` scopes.
* Will by default *not* contain any symbol-bindings introduces along the way, they have to be decorated when moving up the tree.
*/
readonly environments: REnvironmentInformation
readonly environments: REnvironmentInformation
/**
* Name of the currently active scope, (hopefully) always {@link LocalScope | Local}
*/
readonly activeScope: DataflowScopeName
readonly activeScope: DataflowScopeName
/**
* Other processors to be called by the given functions
*/
readonly processors: DataflowProcessors<OtherInfo>
readonly processors: DataflowProcessors<OtherInfo>
// TODO using "initial" as the default path doesn't allow us to skip re-sourcing the initial file - how do we find out the initial file's name/path?
readonly currentPath: string | 'initial'
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
readonly sourceReferences: Map<string, string[]>
}

export type DataflowProcessor<OtherInfo, NodeType extends RNodeWithParent<OtherInfo>> = (node: NodeType, data: DataflowProcessorInformation<OtherInfo>) => DataflowInformation
Expand Down Expand Up @@ -55,6 +58,3 @@ export type DataflowProcessors<OtherInfo> = {
export function processDataflowFor<OtherInfo>(current: RNodeWithParent<OtherInfo>, data: DataflowProcessorInformation<OtherInfo & ParentInformation>): DataflowInformation {
return data.processors[current.type](current as never, data)
}



5 changes: 5 additions & 0 deletions src/r-bridge/lang-4.x/ast/model/processing/decorate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ export function deterministicCountingIdGenerator(start = 0): () => NodeId {
return () => `${id++}`
}

export function fileNameDeterministicCountingIdGenerator(filename: string, start = 0): () => NodeId {
let id = start
return () => `${filename}-${id++}`
}

function loc2Id(loc: SourceRange) {
return `${loc.start.line}:${loc.start.column}-${loc.end.line}:${loc.end.column}`
}
Expand Down
10 changes: 7 additions & 3 deletions src/r-bridge/lang-4.x/ast/parser/xml/internal/xml-to-json.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,11 @@ import type { XmlBasedJson } from '../input-format'
* @param config - The configuration to use (i.e., what names should be used for the attributes, children, ...)
* @param xmlString - The xml input to parse
*/
export function xlm2jsonObject(config: XmlParserConfig, xmlString: string): Promise<XmlBasedJson> {
return xml2js.parseStringPromise(xmlString, {
export function xlm2jsonObject(config: XmlParserConfig, xmlString: string): XmlBasedJson {
let result: XmlBasedJson = {}
xml2js.parseString(xmlString, {
// we want this to be strictly synchronous so that the result can be returned immediately below!
async: false,
attrkey: config.attributeName,
charkey: config.contentName,
childkey: config.childrenName,
Expand All @@ -22,5 +25,6 @@ export function xlm2jsonObject(config: XmlParserConfig, xmlString: string): Prom
includeWhiteChars: true,
normalize: false,
strict: true
}) as Promise<XmlBasedJson>
}, (_, r)=> result = r as XmlBasedJson)
return result
}
4 changes: 2 additions & 2 deletions src/r-bridge/lang-4.x/ast/parser/xml/parser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,12 @@ export const parseLog = log.getSubLogger({ name: 'ast-parser' })
*
* @returns The normalized and decorated AST (i.e., as a doubly linked tree)
*/
export async function normalize(xmlString: string, tokenMap: TokenMap, hooks?: DeepPartial<XmlParserHooks>, getId: IdGenerator<NoInfo> = deterministicCountingIdGenerator(0)): Promise<NormalizedAst> {
export function normalize(xmlString: string, tokenMap: TokenMap, hooks?: DeepPartial<XmlParserHooks>, getId: IdGenerator<NoInfo> = deterministicCountingIdGenerator(0)): NormalizedAst {
const config = { ...DEFAULT_XML_PARSER_CONFIG, tokenMap }
const hooksWithDefaults = deepMergeObject(DEFAULT_PARSER_HOOKS, hooks) as XmlParserHooks

const data: ParserData = { config, hooks: hooksWithDefaults, currentRange: undefined, currentLexeme: undefined }
const object = await xlm2jsonObject(config, xmlString)
const object = xlm2jsonObject(config, xmlString)

return decorateAst(parseRootObjToAst(data, object), getId)
}
73 changes: 58 additions & 15 deletions src/r-bridge/retriever.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@ import { type RShell } from './shell'
import type { XmlParserHooks, NormalizedAst } from './lang-4.x'
import { ts2r, normalize } from './lang-4.x'
import { startAndEndsWith } from '../util/strings'
import type { DeepPartial, DeepReadonly } from 'ts-essentials'
import type {AsyncOrSync, DeepPartial, DeepReadonly} from 'ts-essentials'
import { guard } from '../util/assert'
import {RShellExecutor} from './shell-executor'

export interface RParseRequestFromFile {
request: 'file';
Expand All @@ -25,6 +26,10 @@ interface RParseRequestBase {
ensurePackageInstalled: boolean
}

export interface RParseRequestProvider {
Ellpeck marked this conversation as resolved.
Show resolved Hide resolved
createRequest(path: string): RParseRequest
}

/**
* A request that can be passed along to {@link retrieveXmlFromRCode}.
*/
Expand All @@ -45,6 +50,29 @@ export function requestFromInput(input: `file://${string}` | string): RParseRequ
}
}

export function requestProviderFromFile(): RParseRequestProvider {
return {
createRequest(path: string): RParseRequest {
return {
request: 'file',
content: path,
ensurePackageInstalled: false}
}
}
}

export function requestProviderFromText(text: {[path: string]: string}): RParseRequestProvider{
return {
createRequest(path: string): RParseRequest {
return {
request: 'text',
content: text[path],
ensurePackageInstalled: false
}
}
}
}

const ErrorMarker = 'err'

/**
Expand All @@ -54,22 +82,37 @@ const ErrorMarker = 'err'
* Throws if the file could not be parsed.
* If successful, allows to further query the last result with {@link retrieveNumberOfRTokensOfLastParse}.
*/
export async function retrieveXmlFromRCode(request: RParseRequest, shell: RShell): Promise<string> {
if(request.ensurePackageInstalled) {
await shell.ensurePackageInstalled('xmlparsedata', true)
}

export function retrieveXmlFromRCode(request: RParseRequest, shell: (RShell | RShellExecutor)): AsyncOrSync<string> {
const suffix = request.request === 'file' ? ', encoding="utf-8"' : ''

shell.sendCommands(`flowr_output <- flowr_parsed <- "${ErrorMarker}"`,
const setupCommands = [
`flowr_output <- flowr_parsed <- "${ErrorMarker}"`,
// now, try to retrieve the ast
`try(flowr_parsed<-parse(${request.request}=${JSON.stringify(request.content)},keep.source=TRUE${suffix}),silent=FALSE)`,
'try(flowr_output<-xmlparsedata::xml_parse_data(flowr_parsed,includeText=TRUE,pretty=FALSE),silent=FALSE)'
)
const xml = await shell.sendCommandWithOutput(`cat(flowr_output,${ts2r(shell.options.eol)})`)
const output = xml.join(shell.options.eol)
guard(output !== ErrorMarker, () => `unable to parse R code (see the log for more information) for request ${JSON.stringify(request)}}`)
return output
'try(flowr_output<-xmlparsedata::xml_parse_data(flowr_parsed,includeText=TRUE,pretty=FALSE),silent=FALSE)',
]
const outputCommand = `cat(flowr_output,${ts2r(shell.options.eol)})`

if(shell instanceof RShellExecutor){
if(request.ensurePackageInstalled)
shell.ensurePackageInstalled('xmlparsedata',true)

shell.addPrerequisites(setupCommands)
return guardOutput(shell.run(outputCommand))
} else {
const run = async() => {
if(request.ensurePackageInstalled)
await shell.ensurePackageInstalled('xmlparsedata', true)

shell.sendCommands(...setupCommands)
return guardOutput((await shell.sendCommandWithOutput(outputCommand)).join(shell.options.eol))
}
return run()
}

function guardOutput(output: string): string {
guard(output !== ErrorMarker, () => `unable to parse R code (see the log for more information) for request ${JSON.stringify(request)}}`)
return output
}
}

/**
Expand All @@ -78,7 +121,7 @@ export async function retrieveXmlFromRCode(request: RParseRequest, shell: RShell
*/
export async function retrieveNormalizedAstFromRCode(request: RParseRequest, shell: RShell, hooks?: DeepPartial<XmlParserHooks>): Promise<NormalizedAst> {
const xml = await retrieveXmlFromRCode(request, shell)
return await normalize(xml, await shell.tokenMap(), hooks)
return normalize(xml, await shell.tokenMap(), hooks)
}

/**
Expand Down
Loading
Loading