A metric analysis framework for Scala used to research multi-paradigm metrics. This framework has been developed as part of a Master's thesis.
- Title: Source Code Metrics for Combined Functional and Object-Oriented Programming in Scala
- Author: Sven Konings
- Year: 2020
- URL: essay.utwente.nl/85223
Table of contents:
Source code metrics are used to measure and evaluate the code quality of software projects. Metrics are available for both Object-Oriented Programming (OOP) and Functional Programming (FP). However, there is little research on source code metrics for the combination of OOP and FP. Furthermore, existing OOP and FP metrics are not always applicable. For example, the usage of mutable class variables (OOP) in lambda functions (FP) is a combination that does not occur in either paradigm on their own. Existing OOP and FP metrics are therefore unsuitable to give an indication of quality regarding these combined constructs.
Scala is a programming language which features an extensive combination of OOP and FP construct. The goal of this thesis is to research metrics for Scala which can detect potential faults when combining OOP and FP. We have implemented a framework for defining and analysing Scala metrics. Using this framework we have measured whether code was written using mostly OOP- or FP-style constructs and analysed whether this affected the occurrence of potential faults. Next, we implemented a baseline model of existing OOP and FP metrics. Candidate metrics were added to this baseline model to verify whether they improve the fault detection performance.
In the analysed projects, there was a relatively higher number of faults when mixing OOP- and FP-style code. Furthermore, most OOP metrics perform well on FP-style Scala code. The baseline model was often able to detect when code was wrong. Therefore, the candidate metrics did not significantly improve the fault detection performance of the baseline model. However, the candidate metrics did help to indicate why code contained faults. Constructs were found for which over half of the objects using those constructs contained faults.
The GitClient module is responsible for managing the Git project and its issues. The module can retrieve all commits that refer to issues labelled as fault, it can calculate the changes between two versions and it can retrieve all files of a certain version of the code.
The CodeAnalysis module is responsible for analysing the code using metrics. Given a set of files, it can parse the code, run the metrics and return the results in a tree-like format based on the structure of the code. It contains all the metrics and the utilities needed to define them.
The Validator module is responsible for running the validation methodology workflow (see paper). It uses the GitClient module to retrieve files for analysis, getting the bugfix commits and getting the changes made by those commits. Files for analysis are passed to the CodeAnalysis module, which returns the results back to the Validator module. The results are then processed and stored in CSV files.
The ResultAnalysis module is responsible for running logistic regression on the Validator results. It also includes functionality to calculate statistics of the Validator results. The statistics and logistic regression results are stored in CSV files.
THe data used and produced during the research is included in this repository.
data/projects
contains the Scala projects that have been analysed.data/gitCache
contains a cached set of ids of the issues labelled as bug and the pull-requests that refer to those issues.data/metricResults
contains the measurement results CSVs produced by the Validator.data/analysisResults
contains the analysis resutls CSVs produced by the ResultAnalysis.
The metric analysis process consists of the following steps:
- Defining metrics
- Selecting projects
- Running the validation methodology
- Running the result analysis
Metrics can be defined in the CodeAnalysis module. A basic metric definition looks as follows:
import codeAnalysis.analyser.Compiler
import codeAnalysis.analyser.metric._
object ExampleMetric extends MetricProducer {
override def apply(compiler: Compiler): Metric = new ExampleMetric(compiler)
}
class ExampleMetric(override val compiler: Compiler)
extends FileMetric
with ObjectMetric
with MethodMetric {
import global.{TreeExtensions, SymbolExtensions}
// File metrics
override def run(tree: global.PackageDef): List[MetricResult] = List(
MetricResult("MetricName", metricValue)
)
// Object metrics
override def run(tree: global.ImplDef): List[MetricResult] = List(
MetricResult("MetricName", metricValue)
)
// Method metrics
override def run(tree: global.DefDef): List[MetricResult] = List(
MetricResult("MetricName", metricValue)
)
}
A metric consits of two components: a MetricProducer
object and the metric
class itself. The MetricProducer
is an object that will be used to instantiate
the metric during a compiler run.
The metric class can implement one or more of the following traits based on the
metric type: FileMetric
, ObjectMetric
and MethodMetric
. Each trait has a
run method that receives a compiler tree matching to the metric type. By
implementing one of these traits the metric class gains access to the compiler
and global
instances.
The global
instance contains the tree types associated with the current
compiler run. The most important types are:
global.Tree
- The top-level tree type.global.PackageDef
- A packaging tree, which contains all statements in the file.global.ImplDef
- Tree supertype used for classes/traits (global.ClassDef
) and objects (global.ModuleDef
).global.DefDef
- Tree type used for method definitions.
For more information about the different Scala tree types and their members, please consult the Scala reference, the API documentation and take a look at this overview image (credits to Mirko Stocker):
The global
instance also contains helper and tree traversal methods which can
be accessed using import global.{TreeExtensions, SymbolExtensions}
. For a list
of available methods see the Source.
Below the implementation of the OutDegree metrics is shown:
package codeAnalysis.metrics.baseline
import codeAnalysis.analyser.Compiler
import codeAnalysis.analyser.metric.{MethodMetric, Metric, MetricProducer, MetricResult}
object OutDegree extends MetricProducer {
override def apply(compiler: Compiler): Metric = new OutDegree(compiler)
}
class OutDegree(override val compiler: Compiler) extends MethodMetric {
import global.TreeExtensions
// Counts the number of method and function calls
def outDegree(tree: global.DefDef): Int = tree.countTraverse {
case _: global.Apply => true
}
// Counts the number of unique method and function calls
def outDegreeDistinct(tree: global.DefDef): Int = tree.collectTraverse {
case tree: global.Apply => tree.fun.symbol
}.toSet.size
override def run(tree: global.DefDef): List[MetricResult] = List(
MetricResult("OutDegree", outDegree(tree)),
MetricResult("OutDegreeDistinct", outDegreeDistinct(tree))
)
}
The countTraverse
method is one of the methods that can be accessed by
importing TreeExtensions
. Given a PartialFunction
that returns a boolean, the countTraverse
method counts all true matches. The
collectTraverse
method is similar, it collects all matches.
The compiler
instance can be used to check which filer are currently loaded
and possibly retrieve the compiler trees of those files if needed. For an
example, see the CouplingBetweenObjects
or NumberOfChildren
implementations.
All metric definitions used in the research can be found in the CodeAnalysis metrics package.
Metrics are measured using the Validator. The measurements are executed by defining test cases. Test cases for the included projects are defined in the Validator UnitSpec. To add new metrics create a tests that extends the UnitSpec and passes the MetricProducers as follows:
class ExampleValidatorTest extends UnitSpec("output-folder", List(
ExampleMetric // Add metric producers in this list
))
Run the test to gather results in the provided output folder.
To add projects to analyse, add a test case to the Validator UnitSpec as follows and fill in the angle brackets:
test("<Test name>") {
val validator = new Validator(
"<repository owner>",
"<repository name>",
"<branch>",
new File("data/projects/<git_folder_name>"),
new File(s"data/metricResults/$folder/<output_folder_name>"),
List("<issue_labels_used_for_bugs>"),
metrics
)
validator.run()
}
All measurements used in the research can be found in the Validator test folder.
In the ResultAnalysis folder run python analysis
with the following arguments:
--folder
the name of the output folder to analyse (required)--exclude-columns
space-separated list of columns to exclude from multivariate regression analysis (optional)--split-paradigm-score
split the analysed methods/objects by paradigm score (optional)--multivariate-baseline
run multivariate regression with the baseline metrics included (optional)
The analysis commands used in the research are as follows:
- Paradigm score analysis
python analysis --folder paradigmScoreBool
python analysis --folder paradigmScoreCount
python analysis --folder paradigmScoreFraction
python analysis --folder paradigmScoreLandkroon
- Baseline model analysis
python analysis --folder baseline --exclude-columns HasPointsFraction ParadigmScoreFraction
python analysis --folder baseline --split-paradigm-score --exclude-columns HasPointsFraction ParadigmScoreFraction
- Candidate metric analysis
python analysis --folder multiparadigm-zuilhof --multivariate-baseline --exclude-columns HasPointsFraction ParadigmScoreFraction
python analysis --folder multiparadigm-constructs --multivariate-baseline --exclude-columns HasPointsFraction ParadigmScoreFraction
The HasPointFraction and ParadigmScoreFraction columns are included in the
baseline results to be able to split the results per paradigm. However, they
should be excluded from the baseline analysis themselves and are therefore added
to the --exclude-columns
command for each analysis using the baseline results.
Several plots have been made to visualise paradigm score and baseline results. The plots can be found in the ResultAnalysis plots folder. For the plots in the research the following commands have been used:
python plots/paradigm_score_plots.py --scatter-color --hist --write
python plots/baseline_paradigm_plots.py --write
python plots/baseline_candidate_metric_plots.py --write
The initial design of this framework has been inspired by the SSCA project.