|
| 1 | +--- |
| 2 | +author: [Molly Suppo, Daniel Bekele, Rosa Ruiz, Darius Googe, Gabriel Salvatore] |
| 3 | +title: "Investigating test priotitization with traditional and multi-objective sorting algorithms" |
| 4 | +page-layout: full |
| 5 | +categories: [post, sorting, comparison] |
| 6 | +date: "2025-03-27" |
| 7 | +date-format: long |
| 8 | +toc: true |
| 9 | +--- |
| 10 | + |
| 11 | +# Repository Link |
| 12 | + |
| 13 | +Below is the link that will direct you to our GitHub repository needed to run |
| 14 | +our experiment on your personal device: |
| 15 | +<https://github.com/suppo01/Algorithm-Analysis-All-Hands-Module-2> |
| 16 | + |
| 17 | +## Introduction |
| 18 | + |
| 19 | +During our team's exploration of sorting algorithms and their performance |
| 20 | +characteristics, an interesting research question emerged: How can we |
| 21 | +effectively sort test cases considering multiple factors simultaneously? |
| 22 | +Traditional sorting algorithms excel at sorting based on a single criterion, but |
| 23 | +real-world test case prioritization often requires balancing multiple |
| 24 | +objectives, such as execution time and code coverage. |
| 25 | + |
| 26 | +This research question led us to investigate the potential of multi-objective |
| 27 | +optimization algorithms, specifically NSGA-II (Non-dominated Sorting Genetic |
| 28 | +Algorithm II), in comparison to traditional sorting approaches. While |
| 29 | +traditional algorithms like Quick Sort and Bubble Sort can sort test cases based |
| 30 | +on one factor at a time, NSGA-II offers the advantage of considering multiple |
| 31 | +objectives simultaneously, potentially providing more nuanced and practical test |
| 32 | +case prioritization. |
| 33 | + |
| 34 | +Our research aims to answer the question: How does NSGA-II compare to |
| 35 | +traditional sorting algorithms in terms of running time when comparing test |
| 36 | +cases in terms of execution speed and code coverage factors? The traditional |
| 37 | +algorithms would sort according to one factor at a time while NSGA-II would sort |
| 38 | +according to both factors. |
| 39 | + |
| 40 | +### Data Collection and Gathering |
| 41 | + |
| 42 | +For our research, we selected the Chasten project, a publicly available GitHub |
| 43 | +repository developed as part of a course in our department. This choice was |
| 44 | +strategic for several reasons: |
| 45 | + |
| 46 | +1. **Reproducibility**: Since Chasten was developed within our department, we |
| 47 | + have direct access to its development history and can ensure that others can |
| 48 | +replicate our study. |
| 49 | + |
| 50 | +2. **Test Infrastructure**: The project includes a comprehensive test suite, |
| 51 | + making it an ideal candidate for analyzing test case execution times and |
| 52 | +coverage metrics. |
| 53 | + |
| 54 | +3. **Tool Development**: As a tool developed in an academic setting, Chasten |
| 55 | + provides a controlled environment for our research, with well-defined test |
| 56 | +cases and clear execution patterns. |
| 57 | + |
| 58 | +To collect our data, we developed a custom script (`collect_test_metrics.py`) |
| 59 | +that leverages `pytest-cov`, a pytest plugin for measuring code coverage. The |
| 60 | +script executes the test suite using Poetry's task runner with specific |
| 61 | +`pytest-cov` configurations to generate detailed coverage reports. The coverage |
| 62 | +data is collected using the following command structure: |
| 63 | + |
| 64 | +```bash |
| 65 | +pytest --cov=chasten tests/ --cov-report=json |
| 66 | +``` |
| 67 | + |
| 68 | +This command generates a `coverage.json` file that contains detailed coverage information for each module in the project. The |
| 69 | +JSON structure includes: |
| 70 | +- Executed lines for each file |
| 71 | +- Summary statistics including covered lines, total statements, and coverage percentages |
| 72 | +- Missing and excluded lines |
| 73 | +- Context information for each covered line |
| 74 | + |
| 75 | +The coverage.json file is structured hierarchically, with each module's data organized under the "files" key. For example: |
| 76 | +```json |
| 77 | +{ |
| 78 | + "files": { |
| 79 | + "chasten/checks.py": { |
| 80 | + "executed_lines": [...], |
| 81 | + "summary": { |
| 82 | + "covered_lines": 50, |
| 83 | + "num_statements": 51, |
| 84 | + "percent_covered": 98, |
| 85 | + "missing_lines": 1, |
| 86 | + "excluded_lines": 0 |
| 87 | + } |
| 88 | + } |
| 89 | + } |
| 90 | +} |
| 91 | +``` |
| 92 | + |
| 93 | +To prepare this data for analysis, we developed a mapping script (`mapper.py`) |
| 94 | +that serves two key purposes: |
| 95 | + |
| 96 | +1. **Traditional Analysis Format**: The script reads both the coverage.json and |
| 97 | + test_metrics.json files, creating a mapping between test cases and their |
| 98 | +corresponding module coverage data. It computes a ratio of covered lines to test |
| 99 | +duration for each test case, which helps identify tests that provide the best |
| 100 | +coverage per unit of time. |
| 101 | + |
| 102 | +2. **NSGA-II Format**: The script also transforms the data into a specialized |
| 103 | + format required by the NSGA-II algorithm. Each test case is represented as a |
| 104 | +list of `[test name, duration, coverage]`, where coverage is the raw number of |
| 105 | +covered lines from the coverage.json file. This format enables multi-objective |
| 106 | +optimization, allowing us to simultaneously consider both test execution time |
| 107 | +and code coverage. |
| 108 | + |
| 109 | +The mapping process ensures proper handling of edge cases: |
| 110 | + |
| 111 | +- Failed or skipped tests are assigned zero coverage values |
| 112 | +- Tests with zero duration are handled gracefully |
| 113 | +- Each test case is correctly associated with its corresponding module's coverage data |
| 114 | +- The data is structured appropriately for both traditional and multi-objective analysis approaches |
| 115 | + |
| 116 | +This data preparation pipeline enables us to: |
| 117 | + |
| 118 | +- Compare the effectiveness of different test prioritization approaches |
| 119 | +- Analyze the trade-offs between test execution time and coverage |
| 120 | +- Generate reproducible results for both traditional and NSGA-II algorithms |
| 121 | +- Maintain data consistency across different analysis methods |
| 122 | + |
| 123 | +## Implementation |
| 124 | + |
| 125 | +The implementation of this project required the use of several algorithms. As we |
| 126 | +were comparing traditional algorithms with multi objective algorithms. We |
| 127 | +decided upon two different traditional algorithms, quick sort and bubble sort. |
| 128 | +As for the multi objective algorithms, we had the NSGA-II sorting algorithm |
| 129 | +recommended to us by Professor Kapfhammer, so we decided to look into that one. |
| 130 | +More specifics about each algorithm are below. |
| 131 | + |
| 132 | +### The Quick Sort Algorithm |
| 133 | + |
| 134 | +We selected the Quick Sort algorithm for this experiment due to its efficiency |
| 135 | +in handling large datasets. With an average time complexity of `O(n log n)`, |
| 136 | +Quick Sort is faster than simpler algorithms like Bubble Sort, making it ideal |
| 137 | +for optimizing test prioritization. The algorithm selects a random pivot to |
| 138 | +avoid worst-case performance of `O(n²)` and recursively sorts the elements less |
| 139 | +than and greater than the pivot. |
| 140 | + |
| 141 | +In this implementation, Quick Sort organizes the test cases based on the |
| 142 | +coverage/time ratio. The data is partitioned around the pivot, with elements |
| 143 | +smaller than the pivot in the left partition and those greater in the right. The |
| 144 | +function recursively sorts both partitions, and once sorted, the pivot is placed |
| 145 | +between them to produce the final sorted list. This ensures that the most |
| 146 | +efficient tests (lowest coverage/time ratio) are prioritized. |
| 147 | + |
| 148 | +```python |
| 149 | +import json # Import the JSON module to handle JSON file operations |
| 150 | +import time # Import the time module to measure execution time |
| 151 | +import random # Import random for selecting a random pivot in QuickSort |
| 152 | +from typing import List, Dict, Any |
| 153 | + |
| 154 | + |
| 155 | +def quicksort(arr: List[Any]) -> List[Any]: |
| 156 | + """Sorts an array using the QuickSort algorithm with a random pivot.""" |
| 157 | + if len(arr) <= 1: # Base case if the array has 1 or no elements, it's already sorted |
| 158 | + return arr |
| 159 | + else: |
| 160 | + pivot = random.choice(arr) # Select a random pivot element from the array |
| 161 | + left = [x for x in arr if x < pivot] # Elements less than the pivot |
| 162 | + right = [x for x in arr if x > pivot] # Elements greater than the pivot |
| 163 | + # Recursively sort the left and right partitions and combine them with the pivot |
| 164 | + return quicksort(left) + [pivot] + quicksort(right) |
| 165 | +``` |
| 166 | + |
| 167 | +This approach minimizes the risk of worst-case performance and ensures the most |
| 168 | +efficient tests are prioritized, optimizing the testing process by focusing on |
| 169 | +the best results with the least amount of time. |
| 170 | + |
| 171 | +### The Bucket Sort Algorithm |
| 172 | + |
| 173 | +We selected the Bucket Sort algorithm for this experiment due to its efficiency |
| 174 | +in distributing and sorting data across multiple buckets. With an average time |
| 175 | +complexity of `O(n + k)`, Bucket Sort is well-suited for handling large datasets, |
| 176 | +especially when the input is uniformly distributed. Unlike comparison-based |
| 177 | +algorithms like Quick Sort, it efficiently categorizes elements into buckets and |
| 178 | +sorts them individually, reducing the overall sorting overhead. |
| 179 | + |
| 180 | +In this implementation, Bucket Sort organizes test cases based on the |
| 181 | +coverage/time ratio. The algorithm first distributes the test cases into buckets |
| 182 | +according to their values, ensuring that similar elements are grouped together. |
| 183 | +Each bucket is then sorted individually using an appropriate sorting method, |
| 184 | +such as Insertion Sort, before being concatenated to form the final sorted list. |
| 185 | +This process ensures that test cases with the lowest coverage/time ratio are |
| 186 | +prioritized, optimizing test execution order. |
| 187 | + |
| 188 | +```python |
| 189 | +import json |
| 190 | +import os |
| 191 | +from typing import List, Dict, Any |
| 192 | + |
| 193 | +# Function to perform bucket sort on test cases based on a given attribute |
| 194 | +def bucket_sort(data: List[Dict[str, Any]], attribute: str) -> List[Dict[str, Any]]: |
| 195 | + """Sort a list of dictionaries using bucket sort based on a given attribute.""" |
| 196 | + max_value = max(item[attribute] for item in data) # Find the maximum value of the attribute |
| 197 | + buckets = [[] for _ in range(int(max_value) + 1)] # Create buckets |
| 198 | + |
| 199 | + for item in data: # Place each item in the appropriate bucket |
| 200 | + buckets[int(item[attribute])].append(item) |
| 201 | + |
| 202 | + sorted_data = [] # Concatenate all buckets into a sorted list |
| 203 | + for bucket in buckets: |
| 204 | + sorted_data.extend(bucket) |
| 205 | + |
| 206 | + return sorted_data # Return the sorted list |
| 207 | + |
| 208 | +# Function to load data from a JSON file |
| 209 | +def load_data(file_path: str) -> List[Dict[str, Any]]: |
| 210 | + """Load data from a JSON file.""" |
| 211 | + if not os.path.exists(file_path): # Check if the file exists |
| 212 | + raise FileNotFoundError(f"File not found: {file_path}") |
| 213 | + with open(file_path, 'r') as f: |
| 214 | + data = json.load(f) |
| 215 | + return data # Return the loaded data |
| 216 | + |
| 217 | +# Function to find the test case with the highest coverage |
| 218 | +def find_highest_coverage_test_case(sorted_tests: List[Dict[str, Any]]) -> Dict[str, Any]: |
| 219 | + """Finds the test case with the highest coverage.""" |
| 220 | + highest_coverage_test: Dict[str, Any] = sorted_tests[0] # Start with the first test case |
| 221 | + for test in sorted_tests: # Loop through all test cases |
| 222 | + if test['coverage'] > highest_coverage_test['coverage']: |
| 223 | + highest_coverage_test = test # Update the test case with the highest coverage |
| 224 | + return highest_coverage_test # Return the test case with the largest coverage |
| 225 | + |
| 226 | +# Main function to execute the script |
| 227 | +def main(): |
| 228 | + file_path = 'data/newtryingToCompute.json' # Path to your test metrics file |
| 229 | + |
| 230 | + # Debugging: Print the absolute path being used |
| 231 | + print(f"Looking for file at: {os.path.abspath(file_path)}") |
| 232 | + |
| 233 | + try: |
| 234 | + data = load_data(file_path) # Load the test metrics data |
| 235 | + except FileNotFoundError as e: |
| 236 | + print(e) |
| 237 | + return |
| 238 | + |
| 239 | + sorted_tests_by_coverage: List[Dict[str, Any]] = bucket_sort(data, 'coverage') # Sort by coverage |
| 240 | + highest_coverage_test_case: Dict[str, Any] = find_highest_coverage_test_case(sorted_tests_by_coverage) # Find the highest coverage |
| 241 | + |
| 242 | + # Print the results |
| 243 | + print("\n🌟 Results 🌟") |
| 244 | + print("\n🚀 Test Case with Highest Coverage:") |
| 245 | + print(f"Test Name: {highest_coverage_test_case['name']}") |
| 246 | + print(f"Coverage: {highest_coverage_test_case['coverage']}") |
| 247 | + |
| 248 | +# Entry point of the script |
| 249 | +if __name__ == "__main__": |
| 250 | + main() |
| 251 | +``` |
| 252 | + |
| 253 | +This approach reduces the likelihood of uneven data distribution, ensuring |
| 254 | +efficient sorting and prioritization of test cases. By grouping similar values |
| 255 | +into buckets and sorting them individually, the testing process is optimized, |
| 256 | +focusing on the most effective test cases with minimal execution time. |
| 257 | + |
| 258 | +### The NSGA-II Multi Objective Algorithm |
| 259 | + |
| 260 | +The NSGA-II multi objective sorting algorithm is broken down into a variety of |
| 261 | +approaches. We utilized the binary tournament approach and slightly adapted it |
| 262 | +to suite our needs. The file that runs this part of the experiment has two main |
| 263 | +parts, the `binary_tournament` function and `main`. |
| 264 | + |
| 265 | +The `binary_tournament` function runs the bulk of the experiment, utilizing a |
| 266 | +list of indices that indicate the opponents for each tournament to be performed, |
| 267 | +`P`, and the population object storing all the objects to be pitted against each |
| 268 | +other in tournaments, `pop`. From there, the tournaments are run continuously |
| 269 | +until all of them have been completed. In the implementation, the function also |
| 270 | +collects and constantly updates the list of names dictating the winners with a |
| 271 | +list also dictated for the losers to help update the list of winners. At the |
| 272 | +end, the final winner's list is printed. It is worth noting that there are |
| 273 | +slightly different outcomes each time. This could be due to slightly different |
| 274 | +evaluations occurring each time as there are several aspects that go into |
| 275 | +running the algorithm, even with a limited number of factors to consider. It is |
| 276 | +also worth noting that the variable `S` refers to the result returned by the |
| 277 | +function, a list of the memory locations for all the winners. As that is not as |
| 278 | +helpful to our purposes, it is not seen in our results. |
| 279 | + |
| 280 | +``` python |
| 281 | +for i in range(n_tournaments): |
| 282 | + a, b = P[i] |
| 283 | + |
| 284 | + # if the first individual is better, choose it |
| 285 | + if pop[a].F < pop[b].F: |
| 286 | + S[i] = a |
| 287 | + loser = pop[b].name |
| 288 | + winner = pop[a].name |
| 289 | + # otherwise take the other individual |
| 290 | + else: |
| 291 | + S[i] = b |
| 292 | + loser = pop[a].name |
| 293 | + winner = pop[b].name |
| 294 | + |
| 295 | + # update lists with name records |
| 296 | + if winner not in winner_list: |
| 297 | + if winner not in loser_list: |
| 298 | + winner_list.append(winner) |
| 299 | + else: |
| 300 | + winner_list.remove(loser) |
| 301 | + if loser not in loser_list: |
| 302 | + loser_list.append(loser) |
| 303 | + |
| 304 | + # return the names of the ideal tests |
| 305 | + print(f"The Ideal Tests Are: {winner_list}") |
| 306 | + return S |
| 307 | +``` |
| 308 | + |
| 309 | +Main, on the other hand, looks into generating the list of competitor indices |
| 310 | +using the nested for loop method as that allowed the result to be made as a list |
| 311 | +of lists instead of a list of tuples which is not the right format for the |
| 312 | +`binary_tournament` function. Also, main generates the Population object. First, |
| 313 | +a 2d numpy array is created from the JSON file designated for use by the NSGA-II |
| 314 | +algorithm as the formatting is slightly different to accommodate the |
| 315 | +`binary_tournament` function. Then, a list of Individual objects is created from |
| 316 | +the information in the array. Finally, that list is passed into a brand new |
| 317 | +Population object. Finally, main runs the tournaments by calling the |
| 318 | +`binary_tournament` function with the Population object and array of competitor |
| 319 | +index pairs passed in. |
| 320 | + |
| 321 | +The results produced from this algorithm are the best test cases according to a |
| 322 | +fitness factor, `F` which is calculated similarly to the values used for the two |
| 323 | +more traditional sorting algorithms, dividing the duration of the test case by |
| 324 | +the number of lines it covers and comparing those values in each tournament. |
| 325 | + |
| 326 | +## The Results |
| 327 | + |
| 328 | +In this experiment, we focused on comparing the runtime performance of three |
| 329 | +algorithms—NSGA-II, QuickSort, and Bucket Sort—by measuring their runtime with a |
| 330 | +single factor in mind: coverage. We conducted tests on a single dataset and |
| 331 | +recorded the time taken by each algorithm to complete the task. |
| 332 | + |
| 333 | +The results from the experiment are summarized in the following table: |
| 334 | + |
| 335 | +| Dataset Size | NSGA2 (ms) | QuickSort (ms) | Bucket Sort (ms) | |
| 336 | +|--------------|------------------|----------------|------------------| |
| 337 | +| 92 lines | 7.38 | 0.3 | 0.13 | |
| 338 | + |
| 339 | +- Observations: |
| 340 | + |
| 341 | +NSGA-II had the highest runtime, which is expected given its complexity and the |
| 342 | +nature of multi-objective optimization tasks. Its process of evolving solutions |
| 343 | +requires significant computational overhead, making it less efficient for simple |
| 344 | +tasks like sorting or coverage evaluation. However, one benefit of the algorithm |
| 345 | +is that it prioritizes the best tests to run by considering multiple factors at |
| 346 | +once. So, it is optimal for solving problems where the most optimal solution in |
| 347 | +accordance with multiple factors is needed. Also, the results from this |
| 348 | +algorithm are often more than one test case, so the user is given a list of a |
| 349 | +few optimal test cases to run instead of just one test case. |
| 350 | + |
| 351 | +QuickSort, a well-known sorting algorithm, performed significantly faster than |
| 352 | +NSGA-II, reflecting its efficiency in handling ordered data. With its average |
| 353 | +time complexity of O(N log N), QuickSort proved well-suited for the task, even |
| 354 | +as the dataset size was relatively small. This algorithm produces the top test |
| 355 | +case according to a ratio of duration divided by the number of lines covered. |
| 356 | +Seeing as that ratio is all that is used to sort the test cases, the results may |
| 357 | +differ from those produced by NSGA-II. This algorithm is effective for simple |
| 358 | +sorting tasks like when it was given the ratio mentioned above. |
| 359 | + |
| 360 | +Bucket Sort, with its near-linear time complexity under optimal conditions, |
| 361 | +demonstrated the fastest performance in this experiment, significantly |
| 362 | +outperforming both NSGA-II and QuickSort on the given dataset. This algorithm |
| 363 | +produces the top test case according to a ratio of duration divided by the |
| 364 | +number of lines covered. Like with quick sort, since that ratio is all that is |
| 365 | +used to sort the test cases, the algorithm's results may differ from those |
| 366 | +produced by NSGA-II. This algorithm is effective for simple sorting tasks like |
| 367 | +when it was given the ratio mentioned above. Plus, it does not use recursion |
| 368 | +like quick sort, making it the fastest sorting algorithm in our experiment. |
| 369 | + |
| 370 | +## Conclusion |
| 371 | + |
| 372 | +The results of this experiment indicate that, under the tested scenario, NSGA-II |
| 373 | +did not outperform the algorithms we compared it to (QuickSort and Bucket Sort). |
| 374 | +Given that these algorithms are designed for fundamentally different purposes, |
| 375 | +the performance discrepancy is expected. Our tests were conducted in a context |
| 376 | +that favored QuickSort and Bucket Sort, which is inherently more efficient for |
| 377 | +the sorting tasks at hand. Consequently, while NSGA-II excels in multi-objective |
| 378 | +optimization, it is not suited for tasks where traditional sorting algorithms |
| 379 | +like QuickSort are more appropriate. |
0 commit comments