Skip to content

Commandline Parameters

PaulBredl edited this page Aug 30, 2021 · 18 revisions

Commandline parameters for DiffSearch

Introduction

To simplify the execution, use the command

dfs() { mvn exec:java -Dexec.mainClass=research.diffsearch.main.App -Dexec.args="$1" ; }

Then, DiffSearch can be run using

dfs "args"

Setup

To create an index, perform the following steps:

  1. Clone all repositories using `dfs "-clone ". The path must be a path to a text file with all links to the GitHub repositories (one link per line).
  2. Parse a corpus of code changes using dfs "-d -lang <language>", where language is Java, JavaScript, or Python.
  3. Extract feature vectors and create an index using dfs "-fe -lang <language>". Further below are additional parameters given that modify how features are extracted.
  4. DiffSearch can now accept queries.

DiffSearch Modes

Parameter Usage Description
n -n DiffSearch in console mode. Queries can be entered in the console, where also the results are shown.
g -g DiffSearch serves as webserver for the DiffSearch UI
w -w DiffSearch serves as webserver for the old DiffSearch UI (deprecated)
q -q query performs a search on the given query
b -b inputpath outputpath processes all queries of a text file at the given path and saves the result in the given output file.
fe -fe Feature extraction mode, creates feature vectors from code changes and indexes them
clone -clone repository-list Clones the list of git repositories. The parameter should be a path to a txt files with all links to GitHub repositories.
d -d Extracts and parses code changes of the cloned git repositories.

Other parameters

Parameter Usage Description
pyc -pyc pycommand sets the command to start the python environment. For windows systems, the windows subsystem for linux is required. With WSL this parameter usually should be "wsl python3".
lang -lang (java|javascript|python) sets the target programming language
l -l saves the server log to a log file. (WIP)
oj -oj runs only the java server and does not invoke python. Only matters in "-fe", "-q", "-n", "-w", "-g", and "-b" mode.
p -p port sets the port of the web interface
r -r enables recall measurement. This has requires a lot of time.
silent --silent if given, DiffSearch omits large console outputs, like results of a query.
py_port -py_port port sets the port of the python server
k -k number sets value for k, which is the number of candidate changes.
vl -vl number size a each partition of the feature vectors.
cb -cb number number of count bits (Default 1)
t -t number number of threads to use
extractors -extractors (name(:length)?;)+ defines the extractors DiffSearch uses, e.g. -extractors parentchild:2000;triangle:2000. Valid extractors are node, triangle, parentchild, sibling, rulecount, editscript.
mt -mt seconds maximum matching time for a single code change
extract-query-placeholders --extract-query-placeholders extract query placeholders like EXPR, default is false
tfidf -tfidf tfidf weights are used in the feature vectors
noquerymultiplication -noquerymultiplication query vectors do not get multiplied
nondividedextraction -nondividedextraction feature extraction is not divided in the old and new part
Clone this wiki locally