Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Add script for generating HANDLED_RULES.md #588

Merged
merged 25 commits into from
Aug 16, 2021
Merged

chore: Add script for generating HANDLED_RULES.md #588

merged 25 commits into from
Aug 16, 2021

Conversation

algomaster99
Copy link
Member

@algomaster99 algomaster99 commented Aug 10, 2021

Fixes #367

Since this PR is quite huge, I will explain the flow of the script I have written.

Usage

python3 -m sorald.handled_rules -o <output_file>

This script invokes the main function of sorald/experimentation/tools/sorald/handled_rules.py which in turn executes sorald/experimentation/tools/scripts/GetKeyAndDescription.java.

Purpose of GetKeyAndDescription.java

This script creates a Spoon model for the entire processor package and returns a JSON object of rule key and description.

[
{"rule_key": 1068, "repair_description": "A long elaborate description of repair as outlined in HANDLED_RULES.md"},
...
]

We could also have used python only to extract the docstrings (description) and rule key using RegEx maybe. However, I felt that might be prone to error since some processors had either one or two annotations (ProcessorAnnotation and IncompleteProcessor). Moreover, we would have to carefully parse the docstrings to avoid "*". Such inconsistencies and technicalities made me want to use Spoon to extract this AST information.

handled_rules.py

Takes in the JSON object and renders the HANDLED_RULES,md. The HANDLED_RULES,md in this PR is automatically generated. :)

Enhancements and ToDos

  • Write tests for this script
  • Render headings of HANDLED_RULES,md in a for-loop itself. In other words, Bug heading shouldn't be rendered if no sorald doesn't have any repairs for any kind of bugs. This is mainly important if in future, we ever implement a repair for one of the Security Hotspots.
  • Update README for tools package.

@algomaster99
Copy link
Member Author

I think we would also need to add an exemption for docstrings in spotless. Spotless is adding HTML tags that escape the markdown syntax.

@slarse
Copy link
Collaborator

slarse commented Aug 11, 2021

You probably need to put the markdown stuff into <code> tags to avoid Javadoc freaking out about bad HTML.

@slarse
Copy link
Collaborator

slarse commented Aug 11, 2021

Alternatively, an easier solution to avoid problems with Spotless and Javadoc might be to just create a markdown file for each processor to contain the documentation. For example, DeadStoreProcessor.java would be accompanied by DeadStoreProcessor.md. That also eliminates the need to use Spoon in the script, which simplifies the CI.

@algomaster99
Copy link
Member Author

algomaster99 commented Aug 11, 2021

That also eliminates the need to use Spoon in the script, which simplifies the CI.

Right. We can then get the rule key corresponding to each processor using simple RegEx.

I initially wanted to avoid RegEx so that's why I had to write the spoon script. The reason being it's tough to write and prone to error.

@algomaster99
Copy link
Member Author

@slarse Hold the review. There are some things I need to refactor in the script because I have hard-coded a few things in there.

@algomaster99 algomaster99 requested a review from slarse August 12, 2021 13:25
@slarse
Copy link
Collaborator

slarse commented Aug 12, 2021

Will have a look tomorrow morning!

@algomaster99
Copy link
Member Author

algomaster99 commented Aug 12, 2021

@slarse I have made the required changes. I have sorted the output based on the python default string comparison. However, I lowecased the title before comparison.

I wanted some advice regarding tests. Should we create a dummy processor package in tests/resources and then test the entire thing at once? I wanted to test for more individual components of the script because that would be easier and more efficient. And also, if they are individually tested, we can assume the script is behaving correctly because I don't think so there are side effects in my program now.

@algomaster99
Copy link
Member Author

@slarse Which python formatter has been used to maintain the tools package? I think I would need to use it given the amount of code I have written.

Copy link
Collaborator

@slarse slarse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks great! I have a few notes on simplifications and improvements, but not much at all.

In addition to my comments on the code, the following would also be good:

  1. Add a comment (i.e. a quote or something, with >) at the top of HANDLED_RULES.md saying that it is a generated file that should not be edited manually, and link to the script.
  2. Update CONTRIBUTING.md with the fact that one should add a BlaBla.md file when creating a BlaBla.java processor.

As we don't want Java contributors to have to deal with Python, we'll simply update HANDLED_RULES.md in CI like we update ACHIEVEMENTS.md, see .github/workflows/support.yml. But we'll add that in a separate PR.

experimentation/tools/sorald/handled_rules.py Outdated Show resolved Hide resolved
@slarse
Copy link
Collaborator

slarse commented Aug 13, 2021

@slarse Which python formatter has been used to maintain the tools package? I think I would need to use it given the amount of code I have written.

Missed this question. I use black for everything.

pip install black
black experimentation/tools

@algomaster99 algomaster99 requested a review from slarse August 13, 2021 14:46
Copy link
Collaborator

@slarse slarse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest one clarification in the docs and a final improvement to the test (as much to show you another possibility as for making the test cleaner), then it's ready to merge.

Comment on lines 108 to 109
`sorald.processor` package. An example name of such file could be `CastArithmeticOperandProcessor.md` if your
processor's name is `CastArithmeticOperandProcessor`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it sound like there are multiple choices of name for the description file, but there is only one correct name.

Suggested change
`sorald.processor` package. An example name of such file could be `CastArithmeticOperandProcessor.md` if your
processor's name is `CastArithmeticOperandProcessor`.
`sorald.processor` package. For example, if your processor is in `CastArithmeticOperandProcessor.java`, then
the description file should be called `CastArithmeticOperandProcessor.md`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are enforcing the requirement, we should use 'must' instead of 'should'. The rest of the wording sounds good to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Comment on lines 17 to 32
@Test
public void test_eachProcessorIsAccompaniedByDescription() {
List<File> processors =
Processors.getAllProcessors().stream()
.map(Class::getSimpleName)
.map(procName -> PROCESSOR_PACKAGE.resolve(procName + ".java"))
.map(Path::toFile)
.collect(Collectors.toList());
processors.forEach(
processor ->
assertTrue(
getDescription(processor).isFile(),
"Description corresponding to "
+ processor.getName()
+ " does not exist."));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a final improvement here, I'd suggest making this a parameterized test instead of a single test with a loop. IMO, it's almost always preferable to do this when feasible as there's zero risk of for example there being no matching files (a parameterized test with empty parameterization throws an exception). In this case, I also think it increases the readability of the test.

A simple way to do it would be like so:

Suggested change
@Test
public void test_eachProcessorIsAccompaniedByDescription() {
List<File> processors =
Processors.getAllProcessors().stream()
.map(Class::getSimpleName)
.map(procName -> PROCESSOR_PACKAGE.resolve(procName + ".java"))
.map(Path::toFile)
.collect(Collectors.toList());
processors.forEach(
processor ->
assertTrue(
getDescription(processor).isFile(),
"Description corresponding to "
+ processor.getName()
+ " does not exist."));
}
@ParameterizedTest
@MethodSource("processorFileProvider")
public void test_eachProcessorIsAccompaniedByDescription(File processor) {
assertTrue(
getDescription(processor).isFile(),
"Description corresponding to " + processor.getName() + " does not exist.");
}
private static Stream<Arguments> processorFileProvider() {
return Processors.getAllProcessors().stream()
.map(Class::getSimpleName)
.map(procName -> PROCESSOR_PACKAGE.resolve(procName + ".java"))
.map(Path::toFile)
.map(Arguments::of);
}

Copy link
Member Author

@algomaster99 algomaster99 Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example there being no matching files (a parameterized test with empty parameterization throws an exception)

Just to clarify: do you mean that if there are no processors, the test with argument source would throw an error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Collaborator

@slarse slarse Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When asserting in a loop (or stream) inside a test, you always run the risk of a vacuously passing test as the loop/stream is empty. That's why, when asserting in a loop, you should always first assert that collection you're iterating over is non-empty (or better, has the expected number of entries).

@algomaster99
Copy link
Member Author

@slarse Any idea how we could customise the display name of the parameterized test? Currently, it shows the full path of the processor for project root (string representation of File object of a processor). We can just show the name of the processor because that would be clearer. However, the name parameter of @ParameterizedTest doesn't let us invoke functions.

@slarse
Copy link
Collaborator

slarse commented Aug 16, 2021

@algomaster99 Easiest way is to do something like this: https://stackoverflow.com/a/57894201

IMO it's overkill for this test and the fully qualified name to the file is fine.

@algomaster99
Copy link
Member Author

@slarse I think we can proceed for merging if there are no more suggestions.

The SO post would introduce a clearer display name but at the expense of clarity of the test function since we won't use the nicer representation inside the test.

Copy link
Collaborator

@slarse slarse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @algomaster99

@slarse slarse merged commit 7dc9037 into ASSERT-KTH:master Aug 16, 2021
@algomaster99 algomaster99 deleted the generate-handled_rules branch August 16, 2021 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Automatically generate HANDLED_RULES.md
2 participants