Skip to content

Commit 5abd3cb

Browse files
committed
Merge branch 'main' into feat/add-team-one-module-two
2 parents e3ee865 + f0df2c9 commit 5abd3cb

File tree

1 file changed

+379
-0
lines changed
  • allhands/spring2025/weekeleven/teamtwo

1 file changed

+379
-0
lines changed
Lines changed: 379 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,379 @@
1+
---
2+
author: [Molly Suppo, Daniel Bekele, Rosa Ruiz, Darius Googe, Gabriel Salvatore]
3+
title: "Investigating test priotitization with traditional and multi-objective sorting algorithms"
4+
page-layout: full
5+
categories: [post, sorting, comparison]
6+
date: "2025-03-27"
7+
date-format: long
8+
toc: true
9+
---
10+
11+
# Repository Link
12+
13+
Below is the link that will direct you to our GitHub repository needed to run
14+
our experiment on your personal device:
15+
<https://github.com/suppo01/Algorithm-Analysis-All-Hands-Module-2>
16+
17+
## Introduction
18+
19+
During our team's exploration of sorting algorithms and their performance
20+
characteristics, an interesting research question emerged: How can we
21+
effectively sort test cases considering multiple factors simultaneously?
22+
Traditional sorting algorithms excel at sorting based on a single criterion, but
23+
real-world test case prioritization often requires balancing multiple
24+
objectives, such as execution time and code coverage.
25+
26+
This research question led us to investigate the potential of multi-objective
27+
optimization algorithms, specifically NSGA-II (Non-dominated Sorting Genetic
28+
Algorithm II), in comparison to traditional sorting approaches. While
29+
traditional algorithms like Quick Sort and Bubble Sort can sort test cases based
30+
on one factor at a time, NSGA-II offers the advantage of considering multiple
31+
objectives simultaneously, potentially providing more nuanced and practical test
32+
case prioritization.
33+
34+
Our research aims to answer the question: How does NSGA-II compare to
35+
traditional sorting algorithms in terms of running time when comparing test
36+
cases in terms of execution speed and code coverage factors? The traditional
37+
algorithms would sort according to one factor at a time while NSGA-II would sort
38+
according to both factors.
39+
40+
### Data Collection and Gathering
41+
42+
For our research, we selected the Chasten project, a publicly available GitHub
43+
repository developed as part of a course in our department. This choice was
44+
strategic for several reasons:
45+
46+
1. **Reproducibility**: Since Chasten was developed within our department, we
47+
have direct access to its development history and can ensure that others can
48+
replicate our study.
49+
50+
2. **Test Infrastructure**: The project includes a comprehensive test suite,
51+
making it an ideal candidate for analyzing test case execution times and
52+
coverage metrics.
53+
54+
3. **Tool Development**: As a tool developed in an academic setting, Chasten
55+
provides a controlled environment for our research, with well-defined test
56+
cases and clear execution patterns.
57+
58+
To collect our data, we developed a custom script (`collect_test_metrics.py`)
59+
that leverages `pytest-cov`, a pytest plugin for measuring code coverage. The
60+
script executes the test suite using Poetry's task runner with specific
61+
`pytest-cov` configurations to generate detailed coverage reports. The coverage
62+
data is collected using the following command structure:
63+
64+
```bash
65+
pytest --cov=chasten tests/ --cov-report=json
66+
```
67+
68+
This command generates a `coverage.json` file that contains detailed coverage information for each module in the project. The
69+
JSON structure includes:
70+
- Executed lines for each file
71+
- Summary statistics including covered lines, total statements, and coverage percentages
72+
- Missing and excluded lines
73+
- Context information for each covered line
74+
75+
The coverage.json file is structured hierarchically, with each module's data organized under the "files" key. For example:
76+
```json
77+
{
78+
"files": {
79+
"chasten/checks.py": {
80+
"executed_lines": [...],
81+
"summary": {
82+
"covered_lines": 50,
83+
"num_statements": 51,
84+
"percent_covered": 98,
85+
"missing_lines": 1,
86+
"excluded_lines": 0
87+
}
88+
}
89+
}
90+
}
91+
```
92+
93+
To prepare this data for analysis, we developed a mapping script (`mapper.py`)
94+
that serves two key purposes:
95+
96+
1. **Traditional Analysis Format**: The script reads both the coverage.json and
97+
test_metrics.json files, creating a mapping between test cases and their
98+
corresponding module coverage data. It computes a ratio of covered lines to test
99+
duration for each test case, which helps identify tests that provide the best
100+
coverage per unit of time.
101+
102+
2. **NSGA-II Format**: The script also transforms the data into a specialized
103+
format required by the NSGA-II algorithm. Each test case is represented as a
104+
list of `[test name, duration, coverage]`, where coverage is the raw number of
105+
covered lines from the coverage.json file. This format enables multi-objective
106+
optimization, allowing us to simultaneously consider both test execution time
107+
and code coverage.
108+
109+
The mapping process ensures proper handling of edge cases:
110+
111+
- Failed or skipped tests are assigned zero coverage values
112+
- Tests with zero duration are handled gracefully
113+
- Each test case is correctly associated with its corresponding module's coverage data
114+
- The data is structured appropriately for both traditional and multi-objective analysis approaches
115+
116+
This data preparation pipeline enables us to:
117+
118+
- Compare the effectiveness of different test prioritization approaches
119+
- Analyze the trade-offs between test execution time and coverage
120+
- Generate reproducible results for both traditional and NSGA-II algorithms
121+
- Maintain data consistency across different analysis methods
122+
123+
## Implementation
124+
125+
The implementation of this project required the use of several algorithms. As we
126+
were comparing traditional algorithms with multi objective algorithms. We
127+
decided upon two different traditional algorithms, quick sort and bubble sort.
128+
As for the multi objective algorithms, we had the NSGA-II sorting algorithm
129+
recommended to us by Professor Kapfhammer, so we decided to look into that one.
130+
More specifics about each algorithm are below.
131+
132+
### The Quick Sort Algorithm
133+
134+
We selected the Quick Sort algorithm for this experiment due to its efficiency
135+
in handling large datasets. With an average time complexity of `O(n log n)`,
136+
Quick Sort is faster than simpler algorithms like Bubble Sort, making it ideal
137+
for optimizing test prioritization. The algorithm selects a random pivot to
138+
avoid worst-case performance of `O(n²)` and recursively sorts the elements less
139+
than and greater than the pivot.
140+
141+
In this implementation, Quick Sort organizes the test cases based on the
142+
coverage/time ratio. The data is partitioned around the pivot, with elements
143+
smaller than the pivot in the left partition and those greater in the right. The
144+
function recursively sorts both partitions, and once sorted, the pivot is placed
145+
between them to produce the final sorted list. This ensures that the most
146+
efficient tests (lowest coverage/time ratio) are prioritized.
147+
148+
```python
149+
import json # Import the JSON module to handle JSON file operations
150+
import time # Import the time module to measure execution time
151+
import random # Import random for selecting a random pivot in QuickSort
152+
from typing import List, Dict, Any
153+
154+
155+
def quicksort(arr: List[Any]) -> List[Any]:
156+
"""Sorts an array using the QuickSort algorithm with a random pivot."""
157+
if len(arr) <= 1: # Base case if the array has 1 or no elements, it's already sorted
158+
return arr
159+
else:
160+
pivot = random.choice(arr) # Select a random pivot element from the array
161+
left = [x for x in arr if x < pivot] # Elements less than the pivot
162+
right = [x for x in arr if x > pivot] # Elements greater than the pivot
163+
# Recursively sort the left and right partitions and combine them with the pivot
164+
return quicksort(left) + [pivot] + quicksort(right)
165+
```
166+
167+
This approach minimizes the risk of worst-case performance and ensures the most
168+
efficient tests are prioritized, optimizing the testing process by focusing on
169+
the best results with the least amount of time.
170+
171+
### The Bucket Sort Algorithm
172+
173+
We selected the Bucket Sort algorithm for this experiment due to its efficiency
174+
in distributing and sorting data across multiple buckets. With an average time
175+
complexity of `O(n + k)`, Bucket Sort is well-suited for handling large datasets,
176+
especially when the input is uniformly distributed. Unlike comparison-based
177+
algorithms like Quick Sort, it efficiently categorizes elements into buckets and
178+
sorts them individually, reducing the overall sorting overhead.
179+
180+
In this implementation, Bucket Sort organizes test cases based on the
181+
coverage/time ratio. The algorithm first distributes the test cases into buckets
182+
according to their values, ensuring that similar elements are grouped together.
183+
Each bucket is then sorted individually using an appropriate sorting method,
184+
such as Insertion Sort, before being concatenated to form the final sorted list.
185+
This process ensures that test cases with the lowest coverage/time ratio are
186+
prioritized, optimizing test execution order.
187+
188+
```python
189+
import json
190+
import os
191+
from typing import List, Dict, Any
192+
193+
# Function to perform bucket sort on test cases based on a given attribute
194+
def bucket_sort(data: List[Dict[str, Any]], attribute: str) -> List[Dict[str, Any]]:
195+
"""Sort a list of dictionaries using bucket sort based on a given attribute."""
196+
max_value = max(item[attribute] for item in data) # Find the maximum value of the attribute
197+
buckets = [[] for _ in range(int(max_value) + 1)] # Create buckets
198+
199+
for item in data: # Place each item in the appropriate bucket
200+
buckets[int(item[attribute])].append(item)
201+
202+
sorted_data = [] # Concatenate all buckets into a sorted list
203+
for bucket in buckets:
204+
sorted_data.extend(bucket)
205+
206+
return sorted_data # Return the sorted list
207+
208+
# Function to load data from a JSON file
209+
def load_data(file_path: str) -> List[Dict[str, Any]]:
210+
"""Load data from a JSON file."""
211+
if not os.path.exists(file_path): # Check if the file exists
212+
raise FileNotFoundError(f"File not found: {file_path}")
213+
with open(file_path, 'r') as f:
214+
data = json.load(f)
215+
return data # Return the loaded data
216+
217+
# Function to find the test case with the highest coverage
218+
def find_highest_coverage_test_case(sorted_tests: List[Dict[str, Any]]) -> Dict[str, Any]:
219+
"""Finds the test case with the highest coverage."""
220+
highest_coverage_test: Dict[str, Any] = sorted_tests[0] # Start with the first test case
221+
for test in sorted_tests: # Loop through all test cases
222+
if test['coverage'] > highest_coverage_test['coverage']:
223+
highest_coverage_test = test # Update the test case with the highest coverage
224+
return highest_coverage_test # Return the test case with the largest coverage
225+
226+
# Main function to execute the script
227+
def main():
228+
file_path = 'data/newtryingToCompute.json' # Path to your test metrics file
229+
230+
# Debugging: Print the absolute path being used
231+
print(f"Looking for file at: {os.path.abspath(file_path)}")
232+
233+
try:
234+
data = load_data(file_path) # Load the test metrics data
235+
except FileNotFoundError as e:
236+
print(e)
237+
return
238+
239+
sorted_tests_by_coverage: List[Dict[str, Any]] = bucket_sort(data, 'coverage') # Sort by coverage
240+
highest_coverage_test_case: Dict[str, Any] = find_highest_coverage_test_case(sorted_tests_by_coverage) # Find the highest coverage
241+
242+
# Print the results
243+
print("\n🌟 Results 🌟")
244+
print("\n🚀 Test Case with Highest Coverage:")
245+
print(f"Test Name: {highest_coverage_test_case['name']}")
246+
print(f"Coverage: {highest_coverage_test_case['coverage']}")
247+
248+
# Entry point of the script
249+
if __name__ == "__main__":
250+
main()
251+
```
252+
253+
This approach reduces the likelihood of uneven data distribution, ensuring
254+
efficient sorting and prioritization of test cases. By grouping similar values
255+
into buckets and sorting them individually, the testing process is optimized,
256+
focusing on the most effective test cases with minimal execution time.
257+
258+
### The NSGA-II Multi Objective Algorithm
259+
260+
The NSGA-II multi objective sorting algorithm is broken down into a variety of
261+
approaches. We utilized the binary tournament approach and slightly adapted it
262+
to suite our needs. The file that runs this part of the experiment has two main
263+
parts, the `binary_tournament` function and `main`.
264+
265+
The `binary_tournament` function runs the bulk of the experiment, utilizing a
266+
list of indices that indicate the opponents for each tournament to be performed,
267+
`P`, and the population object storing all the objects to be pitted against each
268+
other in tournaments, `pop`. From there, the tournaments are run continuously
269+
until all of them have been completed. In the implementation, the function also
270+
collects and constantly updates the list of names dictating the winners with a
271+
list also dictated for the losers to help update the list of winners. At the
272+
end, the final winner's list is printed. It is worth noting that there are
273+
slightly different outcomes each time. This could be due to slightly different
274+
evaluations occurring each time as there are several aspects that go into
275+
running the algorithm, even with a limited number of factors to consider. It is
276+
also worth noting that the variable `S` refers to the result returned by the
277+
function, a list of the memory locations for all the winners. As that is not as
278+
helpful to our purposes, it is not seen in our results.
279+
280+
``` python
281+
for i in range(n_tournaments):
282+
a, b = P[i]
283+
284+
# if the first individual is better, choose it
285+
if pop[a].F < pop[b].F:
286+
S[i] = a
287+
loser = pop[b].name
288+
winner = pop[a].name
289+
# otherwise take the other individual
290+
else:
291+
S[i] = b
292+
loser = pop[a].name
293+
winner = pop[b].name
294+
295+
# update lists with name records
296+
if winner not in winner_list:
297+
if winner not in loser_list:
298+
winner_list.append(winner)
299+
else:
300+
winner_list.remove(loser)
301+
if loser not in loser_list:
302+
loser_list.append(loser)
303+
304+
# return the names of the ideal tests
305+
print(f"The Ideal Tests Are: {winner_list}")
306+
return S
307+
```
308+
309+
Main, on the other hand, looks into generating the list of competitor indices
310+
using the nested for loop method as that allowed the result to be made as a list
311+
of lists instead of a list of tuples which is not the right format for the
312+
`binary_tournament` function. Also, main generates the Population object. First,
313+
a 2d numpy array is created from the JSON file designated for use by the NSGA-II
314+
algorithm as the formatting is slightly different to accommodate the
315+
`binary_tournament` function. Then, a list of Individual objects is created from
316+
the information in the array. Finally, that list is passed into a brand new
317+
Population object. Finally, main runs the tournaments by calling the
318+
`binary_tournament` function with the Population object and array of competitor
319+
index pairs passed in.
320+
321+
The results produced from this algorithm are the best test cases according to a
322+
fitness factor, `F` which is calculated similarly to the values used for the two
323+
more traditional sorting algorithms, dividing the duration of the test case by
324+
the number of lines it covers and comparing those values in each tournament.
325+
326+
## The Results
327+
328+
In this experiment, we focused on comparing the runtime performance of three
329+
algorithms—NSGA-II, QuickSort, and Bucket Sort—by measuring their runtime with a
330+
single factor in mind: coverage. We conducted tests on a single dataset and
331+
recorded the time taken by each algorithm to complete the task.
332+
333+
The results from the experiment are summarized in the following table:
334+
335+
| Dataset Size | NSGA2 (ms) | QuickSort (ms) | Bucket Sort (ms) |
336+
|--------------|------------------|----------------|------------------|
337+
| 92 lines | 7.38 | 0.3 | 0.13 |
338+
339+
- Observations:
340+
341+
NSGA-II had the highest runtime, which is expected given its complexity and the
342+
nature of multi-objective optimization tasks. Its process of evolving solutions
343+
requires significant computational overhead, making it less efficient for simple
344+
tasks like sorting or coverage evaluation. However, one benefit of the algorithm
345+
is that it prioritizes the best tests to run by considering multiple factors at
346+
once. So, it is optimal for solving problems where the most optimal solution in
347+
accordance with multiple factors is needed. Also, the results from this
348+
algorithm are often more than one test case, so the user is given a list of a
349+
few optimal test cases to run instead of just one test case.
350+
351+
QuickSort, a well-known sorting algorithm, performed significantly faster than
352+
NSGA-II, reflecting its efficiency in handling ordered data. With its average
353+
time complexity of O(N log N), QuickSort proved well-suited for the task, even
354+
as the dataset size was relatively small. This algorithm produces the top test
355+
case according to a ratio of duration divided by the number of lines covered.
356+
Seeing as that ratio is all that is used to sort the test cases, the results may
357+
differ from those produced by NSGA-II. This algorithm is effective for simple
358+
sorting tasks like when it was given the ratio mentioned above.
359+
360+
Bucket Sort, with its near-linear time complexity under optimal conditions,
361+
demonstrated the fastest performance in this experiment, significantly
362+
outperforming both NSGA-II and QuickSort on the given dataset. This algorithm
363+
produces the top test case according to a ratio of duration divided by the
364+
number of lines covered. Like with quick sort, since that ratio is all that is
365+
used to sort the test cases, the algorithm's results may differ from those
366+
produced by NSGA-II. This algorithm is effective for simple sorting tasks like
367+
when it was given the ratio mentioned above. Plus, it does not use recursion
368+
like quick sort, making it the fastest sorting algorithm in our experiment.
369+
370+
## Conclusion
371+
372+
The results of this experiment indicate that, under the tested scenario, NSGA-II
373+
did not outperform the algorithms we compared it to (QuickSort and Bucket Sort).
374+
Given that these algorithms are designed for fundamentally different purposes,
375+
the performance discrepancy is expected. Our tests were conducted in a context
376+
that favored QuickSort and Bucket Sort, which is inherently more efficient for
377+
the sorting tasks at hand. Consequently, while NSGA-II excels in multi-objective
378+
optimization, it is not suited for tasks where traditional sorting algorithms
379+
like QuickSort are more appropriate.

0 commit comments

Comments
 (0)