chore: Add script for generating HANDLED_RULES.md #588

algomaster99 · 2021-08-10T21:57:39Z

Fixes #367

Since this PR is quite huge, I will explain the flow of the script I have written.

Usage

python3 -m sorald.handled_rules -o <output_file>

This script invokes the main function of sorald/experimentation/tools/sorald/handled_rules.py which in turn executes sorald/experimentation/tools/scripts/GetKeyAndDescription.java.

Purpose of `GetKeyAndDescription.java`

This script creates a Spoon model for the entire processor package and returns a JSON object of rule key and description.

[
{"rule_key": 1068, "repair_description": "A long elaborate description of repair as outlined in HANDLED_RULES.md"},
...
]

We could also have used python only to extract the docstrings (description) and rule key using RegEx maybe. However, I felt that might be prone to error since some processors had either one or two annotations (ProcessorAnnotation and IncompleteProcessor). Moreover, we would have to carefully parse the docstrings to avoid "*". Such inconsistencies and technicalities made me want to use Spoon to extract this AST information.

`handled_rules.py`

Takes in the JSON object and renders the HANDLED_RULES,md. The HANDLED_RULES,md in this PR is automatically generated. :)

Enhancements and ToDos

Write tests for this script
Render headings of HANDLED_RULES,md in a for-loop itself. In other words, Bug heading shouldn't be rendered if no sorald doesn't have any repairs for any kind of bugs. This is mainly important if in future, we ever implement a repair for one of the Security Hotspots.
Update README for tools package.

experimentation/tools/scripts/GetKeyAndDescription.java

experimentation/tools/sorald/handled_rules.py

algomaster99 · 2021-08-11T06:57:15Z

I think we would also need to add an exemption for docstrings in spotless. Spotless is adding HTML tags that escape the markdown syntax.

Signed-off-by: Aman Sharma <mannu.poski10@gmail.com>

slarse · 2021-08-11T12:08:01Z

You probably need to put the markdown stuff into <code> tags to avoid Javadoc freaking out about bad HTML.

slarse · 2021-08-11T12:10:18Z

Alternatively, an easier solution to avoid problems with Spotless and Javadoc might be to just create a markdown file for each processor to contain the documentation. For example, DeadStoreProcessor.java would be accompanied by DeadStoreProcessor.md. That also eliminates the need to use Spoon in the script, which simplifies the CI.

algomaster99 · 2021-08-11T12:17:06Z

That also eliminates the need to use Spoon in the script, which simplifies the CI.

Right. We can then get the rule key corresponding to each processor using simple RegEx.

I initially wanted to avoid RegEx so that's why I had to write the spoon script. The reason being it's tough to write and prone to error.

This reverts commit 91de73c.

algomaster99 · 2021-08-11T21:24:29Z

@slarse Hold the review. There are some things I need to refactor in the script because I have hard-coded a few things in there.

Signed-off-by: Aman Sharma <mannu.poski10@gmail.com>

slarse · 2021-08-12T13:30:23Z

Will have a look tomorrow morning!

algomaster99 · 2021-08-12T13:31:06Z

@slarse I have made the required changes. I have sorted the output based on the python default string comparison. However, I lowecased the title before comparison.

I wanted some advice regarding tests. Should we create a dummy processor package in tests/resources and then test the entire thing at once? I wanted to test for more individual components of the script because that would be easier and more efficient. And also, if they are individually tested, we can assume the script is behaving correctly because I don't think so there are side effects in my program now.

Signed-off-by: Aman Sharma <mannu.poski10@gmail.com>

algomaster99 · 2021-08-12T16:42:30Z

@slarse Which python formatter has been used to maintain the tools package? I think I would need to use it given the amount of code I have written.

slarse

Mostly looks great! I have a few notes on simplifications and improvements, but not much at all.

In addition to my comments on the code, the following would also be good:

Add a comment (i.e. a quote or something, with >) at the top of HANDLED_RULES.md saying that it is a generated file that should not be edited manually, and link to the script.
Update CONTRIBUTING.md with the fact that one should add a BlaBla.md file when creating a BlaBla.java processor.

As we don't want Java contributors to have to deal with Python, we'll simply update HANDLED_RULES.md in CI like we update ACHIEVEMENTS.md, see .github/workflows/support.yml. But we'll add that in a separate PR.

experimentation/tools/sorald/handled_rules.py

src/main/java/sorald/processor/EqualsArgumentTypeProcessor.md

src/test/java/sorald/processor/ProcessorDescriptionTest.java

slarse · 2021-08-13T08:58:15Z

@slarse Which python formatter has been used to maintain the tools package? I think I would need to use it given the amount of code I have written.

Missed this question. I use black for everything.

pip install black
black experimentation/tools

Co-authored-by: Simon Larsén <slarse@kth.se>

slarse

I suggest one clarification in the docs and a final improvement to the test (as much to show you another possibility as for making the test cleaner), then it's ready to merge.

slarse · 2021-08-13T14:52:17Z

docs/CONTRIBUTING.md

+`sorald.processor` package. An example name of such file could be `CastArithmeticOperandProcessor.md` if your
+processor's name is `CastArithmeticOperandProcessor`.


This makes it sound like there are multiple choices of name for the description file, but there is only one correct name.

Suggested change

`sorald.processor` package. An example name of such file could be `CastArithmeticOperandProcessor.md` if your

processor's name is `CastArithmeticOperandProcessor`.

`sorald.processor` package. For example, if your processor is in `CastArithmeticOperandProcessor.java`, then

the description file should be called `CastArithmeticOperandProcessor.md`.

Since we are enforcing the requirement, we should use 'must' instead of 'should'. The rest of the wording sounds good to me.

slarse · 2021-08-16T06:30:33Z

src/test/java/sorald/processor/ProcessorDescriptionTest.java

+    @Test
+    public void test_eachProcessorIsAccompaniedByDescription() {
+        List<File> processors =
+                Processors.getAllProcessors().stream()
+                        .map(Class::getSimpleName)
+                        .map(procName -> PROCESSOR_PACKAGE.resolve(procName + ".java"))
+                        .map(Path::toFile)
+                        .collect(Collectors.toList());
+        processors.forEach(
+                processor ->
+                        assertTrue(
+                                getDescription(processor).isFile(),
+                                "Description corresponding to "
+                                        + processor.getName()
+                                        + " does not exist."));
+    }


As a final improvement here, I'd suggest making this a parameterized test instead of a single test with a loop. IMO, it's almost always preferable to do this when feasible as there's zero risk of for example there being no matching files (a parameterized test with empty parameterization throws an exception). In this case, I also think it increases the readability of the test.

A simple way to do it would be like so:

Suggested change

@Test

public void test_eachProcessorIsAccompaniedByDescription() {

List<File> processors =

Processors.getAllProcessors().stream()

.map(Class::getSimpleName)

.map(procName -> PROCESSOR_PACKAGE.resolve(procName + ".java"))

.map(Path::toFile)

.collect(Collectors.toList());

processors.forEach(

processor ->

assertTrue(

getDescription(processor).isFile(),

"Description corresponding to "

+ processor.getName()

+ " does not exist."));

}

@ParameterizedTest

@MethodSource("processorFileProvider")

public void test_eachProcessorIsAccompaniedByDescription(File processor) {

assertTrue(

getDescription(processor).isFile(),

"Description corresponding to " + processor.getName() + " does not exist.");

}

private static Stream<Arguments> processorFileProvider() {

return Processors.getAllProcessors().stream()

.map(Class::getSimpleName)

.map(procName -> PROCESSOR_PACKAGE.resolve(procName + ".java"))

.map(Path::toFile)

.map(Arguments::of);

}

for example there being no matching files (a parameterized test with empty parameterization throws an exception)

Just to clarify: do you mean that if there are no processors, the test with argument source would throw an error?

When asserting in a loop (or stream) inside a test, you always run the risk of a vacuously passing test as the loop/stream is empty. That's why, when asserting in a loop, you should always first assert that collection you're iterating over is non-empty (or better, has the expected number of entries).

algomaster99 · 2021-08-16T09:01:39Z

@slarse Any idea how we could customise the display name of the parameterized test? Currently, it shows the full path of the processor for project root (string representation of File object of a processor). We can just show the name of the processor because that would be clearer. However, the name parameter of @ParameterizedTest doesn't let us invoke functions.

slarse · 2021-08-16T09:04:25Z

@algomaster99 Easiest way is to do something like this: https://stackoverflow.com/a/57894201

IMO it's overkill for this test and the fully qualified name to the file is fine.

algomaster99 · 2021-08-16T10:06:26Z

@slarse I think we can proceed for merging if there are no more suggestions.

The SO post would introduce a clearer display name but at the expense of clarity of the test function since we won't use the nicer representation inside the test.

slarse

LGTM, thanks @algomaster99

algomaster99 added 4 commits August 10, 2021 18:14

Refactor tools to get sonar rule metadata

d7e0690

Add scripts to extract and render HANDLED_RULES.md

fb1acb7

Write description of each violation type in the processor itself

3f24c88

Fix linking of Bug heading

1fcbca2