Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/normalization option #1479

Merged
merged 18 commits into from
Feb 16, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 18 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,24 +145,24 @@ Clustering

--cluster-skip Skips the clustering (default: false)
Commands:
cpp
cpp2
csharp
emf
emf-model
go
java
kotlin
llvmir
python3
rlang
rust
scala
scheme
scxml
swift
text
typescript
cpp supports normalization: false
cpp2 supports normalization: true
csharp supports normalization: false
emf supports normalization: false
emf-model supports normalization: false
go supports normalization: false
java supports normalization: true
kotlin supports normalization: false
llvmir supports normalization: false
python3 supports normalization: false
rlang supports normalization: false
rust supports normalization: false
scala supports normalization: false
scheme supports normalization: false
scxml supports normalization: false
swift supports normalization: false
text supports normalization: false
typescript supports normalization: false
```

### Java API
Expand Down
4 changes: 3 additions & 1 deletion cli/src/main/java/de/jplag/cli/CLI.java
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,8 @@ public CLI() {
private List<CommandSpec> buildSubcommands() {
return LanguageLoader.getAllAvailableLanguages().values().stream().map(language -> {
CommandSpec command = CommandSpec.create().name(language.getIdentifier());
command.usageMessage(new CommandLine.Model.UsageMessageSpec()
.description(String.format("supports normalization: %s", language.supportsNormalization())));

for (LanguageOption<?> option : language.getOptions().getOptionsAsList()) {
command.addOption(OptionSpec.builder(option.getNameAsUnixParameter()).type(option.getType().getJavaType())
Expand Down Expand Up @@ -173,7 +175,7 @@ public JPlagOptions buildOptionsFromArguments(ParseResult parseResult) throws Cl
JPlagOptions jPlagOptions = new JPlagOptions(loadLanguage(parseResult), this.options.minTokenMatch, submissionDirectories,
oldSubmissionDirectories, null, this.options.advanced.subdirectory, suffixes, this.options.advanced.exclusionFileName,
JPlagOptions.DEFAULT_SIMILARITY_METRIC, this.options.advanced.similarityThreshold, this.options.shownComparisons, clusteringOptions,
this.options.advanced.debug, mergingOptions);
this.options.advanced.debug, mergingOptions, this.options.advanced.normalize);

String baseCodePath = this.options.baseCode;
File baseCodeDirectory = baseCodePath == null ? null : new File(baseCodePath);
Expand Down
3 changes: 3 additions & 0 deletions cli/src/main/java/de/jplag/cli/CliOptions.java
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ public static class Advanced {

@Option(names = "--csv-export", description = "If present, a csv export will be generated in addition to the zip file.")
public boolean csvExport = false;

@Option(names = {"--normalize"}, description = "Activate the normalization of tokens. Only allowed if the language supports it.")
public boolean normalize = false;
TwoOfTwelve marked this conversation as resolved.
Show resolved Hide resolved
}

public static class Clustering {
Expand Down
11 changes: 11 additions & 0 deletions core/src/main/java/de/jplag/JPlag.java
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import org.slf4j.LoggerFactory;

import de.jplag.clustering.ClusteringFactory;
import de.jplag.exceptions.ConfigurationException;
import de.jplag.exceptions.ExitException;
import de.jplag.exceptions.SubmissionException;
import de.jplag.merging.MatchMerging;
Expand Down Expand Up @@ -61,11 +62,15 @@ public JPlagResult run() throws ExitException {
* @throws ExitException if JPlag exits preemptively.
*/
public static JPlagResult run(JPlagOptions options) throws ExitException {
checkForConfigurationConsistency(options);
GreedyStringTiling coreAlgorithm = new GreedyStringTiling(options);
ComparisonStrategy comparisonStrategy = new ParallelComparisonStrategy(options, coreAlgorithm);
// Parse and validate submissions.
SubmissionSetBuilder builder = new SubmissionSetBuilder(options);
SubmissionSet submissionSet = builder.buildSubmissionSet();
if (options.normalize() && options.language().supportsNormalization()) {
submissionSet.normalizeSubmissions();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for now, when we enable it for EMF, we need a language method that can be called for the normalization.

}
int submissionCount = submissionSet.numberOfSubmissions();
if (submissionCount < 2)
throw new SubmissionException("Not enough valid submissions! (found " + submissionCount + " valid submissions)");
Expand Down Expand Up @@ -96,4 +101,10 @@ private static void logSkippedSubmissions(SubmissionSet submissionSet, JPlagOpti
}
}
}

private static void checkForConfigurationConsistency(JPlagOptions options) throws ConfigurationException {
if (options.normalize() && !options.language().supportsNormalization()) {
throw new ConfigurationException(String.format("The language %s cannot be used with normalization.", options.language().getName()));
TwoOfTwelve marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
10 changes: 10 additions & 0 deletions core/src/main/java/de/jplag/exceptions/ConfigurationException.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
package de.jplag.exceptions;

/**
* Exceptions used if configuration is wrong.
*/
public class ConfigurationException extends ExitException {
public ConfigurationException(String message) {
super(message);
}
}
43 changes: 25 additions & 18 deletions core/src/main/java/de/jplag/options/JPlagOptions.java
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
public record JPlagOptions(Language language, Integer minimumTokenMatch, Set<File> submissionDirectories, Set<File> oldSubmissionDirectories,
File baseCodeSubmissionDirectory, String subdirectoryName, List<String> fileSuffixes, String exclusionFileName,
SimilarityMetric similarityMetric, double similarityThreshold, int maximumNumberOfComparisons, ClusteringOptions clusteringOptions,
boolean debugParser, MergingOptions mergingOptions) {
boolean debugParser, MergingOptions mergingOptions, boolean normalize) {

public static final double DEFAULT_SIMILARITY_THRESHOLD = 0;
public static final int DEFAULT_SHOWN_COMPARISONS = 500;
Expand All @@ -61,13 +61,13 @@ public record JPlagOptions(Language language, Integer minimumTokenMatch, Set<Fil

public JPlagOptions(Language language, Set<File> submissionDirectories, Set<File> oldSubmissionDirectories) {
this(language, null, submissionDirectories, oldSubmissionDirectories, null, null, null, null, DEFAULT_SIMILARITY_METRIC,
DEFAULT_SIMILARITY_THRESHOLD, DEFAULT_SHOWN_COMPARISONS, new ClusteringOptions(), false, new MergingOptions());
DEFAULT_SIMILARITY_THRESHOLD, DEFAULT_SHOWN_COMPARISONS, new ClusteringOptions(), false, new MergingOptions(), false);
}

public JPlagOptions(Language language, Integer minimumTokenMatch, Set<File> submissionDirectories, Set<File> oldSubmissionDirectories,
File baseCodeSubmissionDirectory, String subdirectoryName, List<String> fileSuffixes, String exclusionFileName,
SimilarityMetric similarityMetric, double similarityThreshold, int maximumNumberOfComparisons, ClusteringOptions clusteringOptions,
boolean debugParser, MergingOptions mergingOptions) {
boolean debugParser, MergingOptions mergingOptions, boolean normalize) {
this.language = language;
this.debugParser = debugParser;
this.fileSuffixes = fileSuffixes == null || fileSuffixes.isEmpty() ? null : Collections.unmodifiableList(fileSuffixes);
Expand All @@ -82,90 +82,97 @@ public JPlagOptions(Language language, Integer minimumTokenMatch, Set<File> subm
this.subdirectoryName = subdirectoryName;
this.clusteringOptions = clusteringOptions;
this.mergingOptions = mergingOptions;
this.normalize = normalize;
}

public JPlagOptions withLanguageOption(Language language) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withDebugParser(boolean debugParser) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withFileSuffixes(List<String> fileSuffixes) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withSimilarityThreshold(double similarityThreshold) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withMaximumNumberOfComparisons(int maximumNumberOfComparisons) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withSimilarityMetric(SimilarityMetric similarityMetric) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withMinimumTokenMatch(Integer minimumTokenMatch) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withExclusionFileName(String exclusionFileName) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withSubmissionDirectories(Set<File> submissionDirectories) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withOldSubmissionDirectories(Set<File> oldSubmissionDirectories) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withBaseCodeSubmissionDirectory(File baseCodeSubmissionDirectory) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withSubdirectoryName(String subdirectoryName) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withClusteringOptions(ClusteringOptions clusteringOptions) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withMergingOptions(MergingOptions mergingOptions) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions);
clusteringOptions, debugParser, mergingOptions, normalize);
}

public JPlagOptions withNormalize(boolean normalize) {
return new JPlagOptions(language, minimumTokenMatch, submissionDirectories, oldSubmissionDirectories, baseCodeSubmissionDirectory,
subdirectoryName, fileSuffixes, exclusionFileName, similarityMetric, similarityThreshold, maximumNumberOfComparisons,
clusteringOptions, debugParser, mergingOptions, normalize);
}

public boolean hasBaseCode() {
Expand Down Expand Up @@ -257,7 +264,7 @@ public JPlagOptions(Language language, Integer minimumTokenMatch, File submissio
boolean debugParser, MergingOptions mergingOptions) throws BasecodeException {
this(language, minimumTokenMatch, Set.of(submissionDirectory), oldSubmissionDirectories,
convertLegacyBaseCodeToFile(baseCodeSubmissionName, submissionDirectory), subdirectoryName, fileSuffixes, exclusionFileName,
similarityMetric, similarityThreshold, maximumNumberOfComparisons, clusteringOptions, debugParser, mergingOptions);
similarityMetric, similarityThreshold, maximumNumberOfComparisons, clusteringOptions, debugParser, mergingOptions, false);
}

/**
Expand Down
34 changes: 18 additions & 16 deletions docs/1.-How-to-Use-JPlag.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,22 +84,24 @@ Clustering

--cluster-skip Skips the clustering (default: false)
Commands:
cpp
cpp2
csharp
emf
emf-model
go
java
kotlin
python3
rlang
rust
scala
scheme
scxml
swift
text
cpp supports normalization: false
cpp2 supports normalization: true
csharp supports normalization: false
emf supports normalization: false
emf-model supports normalization: false
go supports normalization: false
java supports normalization: true
kotlin supports normalization: false
llvmir supports normalization: false
python3 supports normalization: false
rlang supports normalization: false
rust supports normalization: false
scala supports normalization: false
scheme supports normalization: false
scxml supports normalization: false
swift supports normalization: false
text supports normalization: false
typescript supports normalization: false
TwoOfTwelve marked this conversation as resolved.
Show resolved Hide resolved
```

*Note that the [legacy CLI](https://github.com/jplag/jplag/blob/legacy/README.md) is varying slightly.*
Expand Down
7 changes: 7 additions & 0 deletions language-api/src/main/java/de/jplag/Language.java
Original file line number Diff line number Diff line change
Expand Up @@ -93,4 +93,11 @@ default boolean expectsSubmissionOrder() {
default List<File> customizeSubmissionOrder(List<File> submissions) {
return submissions;
}

/**
* @return True, if tokens for this language can be normalized
*/
default boolean supportsNormalization() {
return false;
}
}
5 changes: 5 additions & 0 deletions languages/cpp2/src/main/java/de/jplag/cpp2/CPPLanguage.java
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,9 @@ public int minimumTokenMatch() {
public boolean tokensHaveSemantics() {
return true;
}

@Override
public boolean supportsNormalization() {
return true;
}
}
5 changes: 5 additions & 0 deletions languages/java/src/main/java/de/jplag/java/JavaLanguage.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ public boolean tokensHaveSemantics() {
return true;
}

@Override
public boolean supportsNormalization() {
return true;
}

@Override
public String toString() {
return this.getIdentifier();
Expand Down
Loading