Skip to content

Commit

Permalink
Autocommit from makefile
Browse files Browse the repository at this point in the history
  • Loading branch information
rahlk committed Jul 24, 2018
1 parent 2dbca77 commit 8e9aa0e
Show file tree
Hide file tree
Showing 196 changed files with 619 additions and 981 deletions.
22 changes: 22 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
TEST_PATH=./

all: test clean git

test:
@echo "Running unit tests."
@echo ""
@nosetests -s $(TEST_PATH)
@echo ""

clean:
@echo "Cleaning *.pyc, *.DS_Store, and other junk files..."
@- find . -name '*.pyc' -exec rm --force {} +
@- find . -name '*.pyo' -exec rm --force {} +
@echo ""

git: clean
@echo "Syncing with repository"
@echo ""
@- git add --all .
@- git commit -am "Autocommit from makefile"
@- git push origin master
2 changes: 1 addition & 1 deletion docs/1708.05442/References.bib
Original file line number Diff line number Diff line change
Expand Up @@ -1020,7 +1020,7 @@ @article{Ma2012
knowledge from different-distribution training data on feature level may
help. We are optimistic that our transfer learning method can guide
optimal resource allocation strategies, which may reduce software testing
cost and increase effectiveness of software testing process.
cost and increase _effectiveness of software testing process.
{\textcopyright} 2011 Elsevier B.V. All rights reserved.},
author = {Ma, Ying and Luo, Guangchun and Zeng, Xue and Chen, Aiguo},
doi = {10.1016/j.infsof.2011.09.007},
Expand Down
26 changes: 13 additions & 13 deletions docs/1708.05442/manuscript.tex
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ \subsubsection*{RQ1: Is within-project planning with XTREE comparatively more ef
In this research question, we explore what happens when XTREE is trained on past data from \textit{within} a project. XTREE uses
historical logs from past releases of a project to recommend changes that might reduce defects in the next version of the software. Since such effects might not actually be casual,
our first research question compares
the effectiveness of XTREE's recommendations against alternative planning methods. Recent work by Shatnawi~\cite{shatnawi}, Alves et al.~\cite{alves}, and Oliveira et al.~\cite{oliveira} assume that unusually large measurements in source code metrics point to larger likelihood of defects. Those
the _effectiveness of XTREE's recommendations against alternative planning methods. Recent work by Shatnawi~\cite{shatnawi}, Alves et al.~\cite{alves}, and Oliveira et al.~\cite{oliveira} assume that unusually large measurements in source code metrics point to larger likelihood of defects. Those
planners recommend changing all such unusual code since they assume that, otherwise, this may lead to defect-prone code. When XTREE is compared to the methods
of Shatnawi, Alves, Oliveira et al., we find that:

Expand All @@ -150,11 +150,11 @@ \subsubsection*{RQ2: Is cross-project planning with BELLTREE effective?}

\subsubsection*{RQ3: Are cross-project plans generated by BELLTREE as effective as within-project plans of XTREE?}

In this research question, we compare the effectiveness of plans obtained with BELLTREE (cross-project) to plans obtained with XTREE (within-project).
In this research question, we compare the _effectiveness of plans obtained with BELLTREE (cross-project) to plans obtained with XTREE (within-project).

This is an important result---when local data is missing, projects can use lessons learned from other projects.

\result{The effectiveness of BELLTREE is comparable to the effectiveness of XTREE.}
\result{The _effectiveness of BELLTREE is comparable to the _effectiveness of XTREE.}

\subsubsection*{RQ4: How many changes do the planners propose?}

Expand Down Expand Up @@ -448,9 +448,9 @@ \subsubsection{The \ktest}
\item Finally, on version $\mathcal{P}_k$, we measure the OO metrics for each class in $\mathcal{P}_j$, then we (a) measure the overlap between plans recommended by the planner and the developer's actions; (b) count the number of defects reduced/increased when compared to the previous release as a result of implementing these plans.
\ee

As the outcome of the {\ktest}~we obtain the number of defects (increased or decreased) and the extent of overlap (from 0\% to 100\%). These two measures enable us to plot the operating characteristic curve for the planners (referred to henceforth as planner effectiveness curve). The operating characteristic (OC) curve depicts the effectiveness of a planner with respect to its ability to reduce defects. The OC curve plots the overlap of developer changes with the planner's recommendations versus the number of defects reduced. A sample curve for one of our datasets is shown in \fig{report_sample}.
As the outcome of the {\ktest}~we obtain the number of defects (increased or decreased) and the extent of overlap (from 0\% to 100\%). These two measures enable us to plot the operating characteristic curve for the planners (referred to henceforth as planner _effectiveness curve). The operating characteristic (OC) curve depicts the _effectiveness of a planner with respect to its ability to reduce defects. The OC curve plots the overlap of developer changes with the planner's recommendations versus the number of defects reduced. A sample curve for one of our datasets is shown in \fig{report_sample}.

For each of the datasets with versions $i, j, k$ we (1)~train the planner on version $i$; (2) deploy the planner to recommend plans for version $j$; and (3) validate plans for version $j$. Following this, we plot the ``planner effectiveness curve''. Finally, as in \fig{report_sample}, we compute the area under the planner effectiveness curve (AUPEC) using simpsons rule~\cite{Burden:1988}.
For each of the datasets with versions $i, j, k$ we (1)~train the planner on version $i$; (2) deploy the planner to recommend plans for version $j$; and (3) validate plans for version $j$. Following this, we plot the ``planner _effectiveness curve''. Finally, as in \fig{report_sample}, we compute the area under the planner _effectiveness curve (AUPEC) using simpsons rule~\cite{Burden:1988}.


% In addition to this, we count the frequency with which a certain code metric is changed. This is expressed as percentage change, measured using the following equation:
Expand All @@ -471,13 +471,13 @@ \section{Experimental Results}
\subsection*{{\bf RQ1: Is within-project planning with XTREE comparatively
more effective?}}

We answer this question in two parts: (a) First, we assess the effectiveness of XTREE (using Area Under Planner Effectiveness Curve); (b) Next, we compare XTREE with other threshold based planners. In each case, we split the available data into training, testing, and validation. That is, given versions $v_1, v_2, ..., v_K$, we,
We answer this question in two parts: (a) First, we assess the _effectiveness of XTREE (using Area Under Planner Effectiveness Curve); (b) Next, we compare XTREE with other threshold based planners. In each case, we split the available data into training, testing, and validation. That is, given versions $v_1, v_2, ..., v_K$, we,
{\em train} the planners on version $v_1$; then
{\em generate plans} using the planners for version $v_2$;
then {\em validate} the effectiveness of those plans on $v_3$ using the \ktest.
then {\em validate} the _effectiveness of those plans on $v_3$ using the \ktest.
Then, we repeat the process by training on $v_2$, testing on $v_3$, and validating on version $v_4$, and so on.

For each of these $\{train, test, validation\}$ sets, we generate performance statistics as per \fig{report_sample}; i.e. plot the planner effectiveness curve to measure the number of defects reduced (and increased) as a function of extent of overlap. Then, we measure the Area-Under the Planner Effectiveness Curve (AUPEC).
For each of these $\{train, test, validation\}$ sets, we generate performance statistics as per \fig{report_sample}; i.e. plot the planner _effectiveness curve to measure the number of defects reduced (and increased) as a function of extent of overlap. Then, we measure the Area-Under the Planner Effectiveness Curve (AUPEC).

\fig{results}\protect\subref{subfig:wp} shows the results of planning with XTREE (see column labeled XTREE). The columns constitutes of 2 parts (labeled $\bigtriangledown$ and $\bigtriangleup$) where:
\begin{itemize}[leftmargin=-1pt]
Expand All @@ -495,17 +495,17 @@ \subsection*{{\bf RQ1: Is within-project planning with XTREE comparatively
\input{deltas.tex}
\subsection*{{\bf RQ2: Is cross-project planning with BELLTREE effective?}}

In the previous research question, we construct XTREE using historical logs of previous releases of a project. However, when such logs are not available, we may seek to generate plans using data from across software projects. To do this we offer BELLTREE, a planner that makes use of the \textit{Bellwether Effect} in conjunction with XTREE to perform cross-project planning. In this research question, we assess the effectiveness of BELLTREE. For details of construction of BELLTREE, see \tion{CPXTREE}.
In the previous research question, we construct XTREE using historical logs of previous releases of a project. However, when such logs are not available, we may seek to generate plans using data from across software projects. To do this we offer BELLTREE, a planner that makes use of the \textit{Bellwether Effect} in conjunction with XTREE to perform cross-project planning. In this research question, we assess the _effectiveness of BELLTREE. For details of construction of BELLTREE, see \tion{CPXTREE}.

Our experimental methodology for answering this research question is as follows:
\be
\item We first discover the bellwether data from the available projects. For the projects studied here, we discovered that $Lucene$ was the bellwether (in accordance with our previous findings~\cite{krishna16, krishna17b}).
\item Next, we construct XTREE, but we do this using the bellwether dataset. We call this variant BELLTREE.
\item For each of the other projects, we use BELLTREE constructed above to recommend plans.
\item Then, we use the subsequent releases of the above projects to validate the effectiveness of those plans.
\item Then, we use the subsequent releases of the above projects to validate the _effectiveness of those plans.
\ee

Finally, we generate performance statistics as per \fig{report_sample}; i.e. plot the planner effectiveness curve to measure the number of defects reduced (and increased) as a function of extent of overlap. Then, we measure the Area-Under the Planner Effectiveness Curve (AUPEC). Figure~\ref{fig:results}\protect\subref{subfig:cp} shows the AUPEC scores that were the outcome cross-project planning with BELLTREE (see column labeled BELLTREE). Similar to our findings in RQ1, we note that, in 15 out of 18 cases, AUPEC of defect reduced is much larger that AUPEC of defects increased. This indicates that cross-project planning BELLTREE is also very effective in generating plans that reduce the number of defects. Further, when we train each of the other planners with the bellwether dataset and compare them with XTREE, we note that, as with RQ1, BELLTREE outperforms other threshold based planners for cross-project planning.
Finally, we generate performance statistics as per \fig{report_sample}; i.e. plot the planner _effectiveness curve to measure the number of defects reduced (and increased) as a function of extent of overlap. Then, we measure the Area-Under the Planner Effectiveness Curve (AUPEC). Figure~\ref{fig:results}\protect\subref{subfig:cp} shows the AUPEC scores that were the outcome cross-project planning with BELLTREE (see column labeled BELLTREE). Similar to our findings in RQ1, we note that, in 15 out of 18 cases, AUPEC of defect reduced is much larger that AUPEC of defects increased. This indicates that cross-project planning BELLTREE is also very effective in generating plans that reduce the number of defects. Further, when we train each of the other planners with the bellwether dataset and compare them with XTREE, we note that, as with RQ1, BELLTREE outperforms other threshold based planners for cross-project planning.

\result{BELLTREE helps reduce a large number of defects in 15 out of 18 datasets (9 out of 10 projects). Plans generated by BELLTREE were also significantly superior to other planners in all 10 projects.}

Expand All @@ -515,7 +515,7 @@ \subsection*{{\bf RQ3: Are cross-project plans generated by BELLTREE as effectiv

\fig{results} tabulates the AUPEC scores for the comparison between the use of within-project XTREE (see \fig{results}\protect\subref{subfig:wp}) and cross-project BELLTREE(see \fig{results}\protect\subref{subfig:cp}) for reducing the number of defects. We note that, out of 18 datasets from 10 projects, the AUPEC scores are quite comparable. In 5 cases XTREE performs better than BELLTREE, in 7 cases BELLTREE outperforms XTREE, and in 6 cases the performance is the same. In summary, we make the following observations:

\result{The effectiveness of BELLTREE and XTREE are similar. If within-project data is available, we recommend using XTREE. If not, BELLTREE is a viable alternative.}
\result{The _effectiveness of BELLTREE and XTREE are similar. If within-project data is available, we recommend using XTREE. If not, BELLTREE is a viable alternative.}

\subsection*{RQ4: How many changes do the planners propose?}

Expand Down Expand Up @@ -546,7 +546,7 @@ \section{Threats to Validity}
For example, data sets in this study come from several sources, but they were all supplied by individuals. Thus, we have documented our selection procedure for data and suggest that researchers
try a broader range of data.
\item[] \textit{Evaluation Bias}:
This paper uses one measure for the quality of the planners: AUPEC (see~\fig{report_sample}). Other quality measures may be used to quantify the effectiveness of planner. A comprehensive analysis using these measures may be performed with our replication package. Additionally, other measures can easily be added to extend this replication package.
This paper uses one measure for the quality of the planners: AUPEC (see~\fig{report_sample}). Other quality measures may be used to quantify the _effectiveness of planner. A comprehensive analysis using these measures may be performed with our replication package. Additionally, other measures can easily be added to extend this replication package.

\item[] \textit{Order Bias}:
Theoretically, with prediction tasks involving learners such as random forests, there is invariably some degree of randomness that is introduced by the algorithm. To mitigate these biases, researchers, including ourselves in our other work, report the central tendency and variations over those runs with some statistical test. However, in this case, all our approaches are \textit{deterministic}. Hence, there is no need to repeat the experiments or run statistical tests. Thus, we conclude that while order bias is theoretically a problem,
Expand Down
5 changes: 5 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pandas
sklearn
tabulate
scipy
numpy
138 changes: 0 additions & 138 deletions src/RQ1.py

This file was deleted.

Loading

0 comments on commit 8e9aa0e

Please sign in to comment.