-
Notifications
You must be signed in to change notification settings - Fork 79
DGA Giraph Options
jonathanostinowsky edited this page Jun 16, 2014
·
7 revisions
Each DGA Analytic has its own properties to set via -ca or "custom" properties in the XML configuration file. Further, most InputFormats provided by DGA have configurable settings as well.
The expected input format is always a com.soteradefense.dga.io.formats.DGAAbstractEdgeInputFormat implementation, but all of the configuration settings are specified in this class, not the child classes.
Property | Required | Description |
---|---|---|
simple.edge.delimiter | N | The string delimiter to use when tokenizing the row of data (default is \t) Note: many characters are ignored by the Apache Commons CLI library |
simple.edge.value.default | N | The default InputFormat expects there to be 2 or 3 columns read in; if no 3rd column exists, this configuration value is used for the edge weight for the current row. For Text, this is an empty String, for Long, it is the value 1. |
io.edge.reverse.duplicator | N | Many datasets are undirected; many analytics require directed graphs. The default for this is false, but specifying true will explicitly convert the provided graph into a directed graph. |
Property | Required | Description |
---|---|---|
edge.delimiter | N | The string delimiter to use when tokenizing the row of data (default is ,) Note: many characters are ignored by the Apache Commons CLI library |
write.vertex.value | N | The default value is false. |
write.edge.value | N | The default value is false. |
Property | Required | Description |
---|---|---|
betweenness.output.dir | Y | Sets the betweenness set output directory. |
betweenness.shortest.path.phases | N | Sets the number of shortest path phases that the algorithm should run through. The default is 1. |
betweenness.set.stability | N | Sets the stability cutoff point. Defaults to 0. |
betweenness.set.stability.counter | N | Counter for the stability cutoff point. |
betweenness.set.maxSize | Y | Sets the maximum number of nodes in a betweenness set. |
pivot.batch.size | Y | The percentage of pivots to select out of the nodes. Must be a decimal between 0.0 and 1.0. |
pivot.batch.size.initial | N | The percentage of pivots to select initially. Must be a decimal between 0.0 and 1.0. |
pivot.batch.random.seed | N | Seed the random number generator for pivot selection. |
vertex.count | Y | Sets the total number of vertices to perform the algorithm on. |
There are no properties to specify for leaf compression
Property | Required | Description |
---|---|---|
minimum.progress | N | (Default: 0) The minimum delta X required to be considered progress, where X is the number of nodes that have changed their community on a particular pass. Delta X is then the difference in number of nodes that changed communities on the current pass compared to the previous pass. Using the default of 0 means that any delta is considered progress. |
progress.tries | N | (Default: 1) Number of times the minimum.progress setting is not met before exiting form the current level and compressing the graph. Default of 1 means the first time minimum.progress is not met the algorithm exits. |
Property | Required | Description |
---|---|---|
damping.factor | N | (Default: 0.85f) The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85. |
There are no properties to specify for weakly connected components