Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unquoted argument parsing issue #18

Open
mahtab-nejati opened this issue Aug 14, 2023 · 6 comments
Open

Unquoted argument parsing issue #18

mahtab-nejati opened this issue Aug 14, 2023 · 6 comments

Comments

@mahtab-nejati
Copy link

Hi,

I've noticed that for unquoted arguments such as in the following code snippet (which is quite a common use case)

single_argument_command(var_name_${varA}_is_compound)

The unquoted text payload is lost ("var_name_" and "_is_compound").

I've tried unhiding the _unquoted_text but it parses each letter as a node. Another approach is to define some external rules.

Do you think there is a cleaner solution to this?

@uyha
Copy link
Owner

uyha commented Aug 15, 2023

I'm not sure what you mean by lost here. Could you give a snippet, its parse result (using tree-sitter parse) and the expected parse result?

@mahtab-nejati
Copy link
Author

mahtab-nejati commented Aug 15, 2023

For example, when parsing the following (note that all arguments are valid syntax according to CMake documentation and used in real projects):

include_directories(
  ${outer_${inner_variable}_variable}
  "${BASE_DIR}/sub/directory"
  ${OTHER_DIR}/${SUB_DIR}/included/ir
  )

the output tree looks like this:

<?xml version="1.0" ?>
<tree type="source_file" pos="0" length="128">
	<tree type="normal_command" pos="0" length="127">
		<tree type="identifier" pos="0" length="19" label="include_directories"/>
		<tree type="(" pos="19" length="1" label="("/>
		<tree type="argument_list" pos="20" length="106">
			<tree type="argument" pos="23" length="35">
				<tree type="unquoted_argument" pos="23" length="35">
					<tree type="variable_ref" pos="23" length="35">
						<tree type="normal_var" pos="23" length="35">
							<tree type="$" pos="23" length="1" label="$"/>
							<tree type="{" pos="24" length="1" label="{"/>
							<tree type="variable" pos="25" length="32">
								<tree type="variable_ref" pos="31" length="17">
									<tree type="normal_var" pos="31" length="17">
										<tree type="$" pos="31" length="1" label="$"/>
										<tree type="{" pos="32" length="1" label="{"/>
										<tree type="variable" pos="33" length="14" label="inner_variable"/>
										<tree type="}" pos="47" length="1" label="}"/>
									</tree>
								</tree>
							</tree>
							<tree type="}" pos="57" length="1" label="}"/>
						</tree>
					</tree>
				</tree>
			</tree>
			<tree type="argument" pos="61" length="32">
				<tree type="quoted_argument" pos="61" length="32">
					<tree type="&quot;" pos="61" length="1" label="&quot;"/>
					<tree type="quoted_element" pos="62" length="30">
						<tree type="variable_ref" pos="62" length="14">
							<tree type="normal_var" pos="62" length="14">
								<tree type="$" pos="62" length="1" label="$"/>
								<tree type="{" pos="63" length="1" label="{"/>
								<tree type="variable" pos="64" length="11" label="ZEPHYR_BASE"/>
								<tree type="}" pos="75" length="1" label="}"/>
							</tree>
						</tree>
						<tree type="$" pos="84" length="1" label="$"/>
					</tree>
					<tree type="&quot;" pos="92" length="1" label="&quot;"/>
				</tree>
			</tree>
			<tree type="argument" pos="96" length="27">
				<tree type="unquoted_argument" pos="96" length="27">
					<tree type="variable_ref" pos="96" length="11">
						<tree type="normal_var" pos="96" length="11">
							<tree type="$" pos="96" length="1" label="$"/>
							<tree type="{" pos="97" length="1" label="{"/>
							<tree type="variable" pos="98" length="8" label="ARCH_DIR"/>
							<tree type="}" pos="106" length="1" label="}"/>
						</tree>
					</tree>
					<tree type="variable_ref" pos="108" length="7">
						<tree type="normal_var" pos="108" length="7">
							<tree type="$" pos="108" length="1" label="$"/>
							<tree type="{" pos="109" length="1" label="{"/>
							<tree type="variable" pos="110" length="4" label="ARCH"/>
							<tree type="}" pos="114" length="1" label="}"/>
						</tree>
					</tree>
				</tree>
			</tree>
		</tree>
		<tree type=")" pos="126" length="1" label=")"/>
	</tree>
</tree>

The problem is that the text immediately concatenated to a variable_ref node, e.g., outer_ and _variabe from ${outer_${inner_variable}_variable} or /sub/directory from "${BASE_DIR}/sub/directory" is lost in the tree, i.e., no node has a payload (label) with these contents.

Another issue with the unquoted argument parsing (according to the documentation) is that the character ' is actually allowed but you have excluded it in your rule. I have seen this used in CMake scripts and the parser runs into an error when encountering this.

@uyha
Copy link
Owner

uyha commented Aug 15, 2023

so you want them to be named nodes instead of anonymous nodes? I remember having difficulties trying to do it, that's why I hide those nodes since it's good enough for me. Could you share why you want to get that information?

@mahtab-nejati
Copy link
Author

Yes, I'd like to have them as name nodes. I am trying to verify the existence of CMake scripts in the code base when an include or add_subdirectory command is invoked. As you can see in the last two arguments, losing this information makes it impossible to find the scripts...

@uyha
Copy link
Owner

uyha commented Aug 15, 2023

they are not really lost, they are just not explicitly named. You have do the interpretation yourself, looking at your output

<tree type="argument" pos="96" length="27">
  <tree type="unquoted_argument" pos="96" length="27">
    <tree type="variable_ref" pos="96" length="11">
      <tree type="normal_var" pos="96" length="11">
        <tree type="$" pos="96" length="1" label="$"/>
        <tree type="{" pos="97" length="1" label="{"/>
        <tree type="variable" pos="98" length="8" label="ARCH_DIR"/>
        <tree type="}" pos="106" length="1" label="}"/>
      </tree>
    </tree>
    <tree type="variable_ref" pos="108" length="7">
      <tree type="normal_var" pos="108" length="7">
        <tree type="$" pos="108" length="1" label="$"/>
        <tree type="{" pos="109" length="1" label="{"/>
        <tree type="variable" pos="110" length="4" label="ARCH"/>
        <tree type="}" pos="114" length="1" label="}"/>
      </tree>
    </tree>
  </tree>
</tree>

you can collect the content from position 96 and take 27 tokens, do more processing if they are refered to in the nested nodes, otherwise you can assume they are normal text. Like I said, I couldn't figure out how to do this easily using just tree-sitter, so I can't help you further.

@uyha
Copy link
Owner

uyha commented Aug 15, 2023

and thank you for reporting the errorneous exclusion of single quote in unquoted arguments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants