-
Notifications
You must be signed in to change notification settings - Fork 1
Probabilistic Generalized Frequent Itemset Mining (PGFIM) in Uncertain Databases
License
erichpeterson/pgfim
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
IMPORTANT: Two pieces of software must be installed to compile PGFIM: 1. The high-precision static library ARPREC, which one can download and install from the following website: http://crd-legacy.lbl.gov/~dhbailey/mpdist/ 2. The mathematics library Boost, which one can download and install from the following website: http://www.boost.org/ COMPILING: To compile on a Linux-based machine, modify the Makefile file, to your environment and locations of the two software libraries installed above. One can find where the locations are set in the Makefile, next to the CXX_CFLAGS and CXX_LDFLAGS variables. Once your environment is set correctly, simply type 'make' at the command-line in the directory of the source code, and the executable PGFIM should be created. RUNNING: ./PGFIM -tau <TAU> -inputDB <INPUT_DB_FILENAME> -inputXML <INPUT_XML_FILENAME> [optional_args] Sample Execution: ./PGFIM -tau 0.9 -minsup 3000 -inputDB input.data -inputXML taxonomy.xml Command Line Options: -minsup minimum support threshold: [0.0-Size of DB] default: 1 -tau Tau frequent probability threshold: (0.0-1.0] (Required) -inputDB Input DB file name (Required) -inputXML Input XML taxonomy (Required) -calc Frequentness calculation method: {dynamic} default: dynamic -output Output file name -exec Execution stats file name -print Print found itemsets: {0 | 1} (0 = false, 1 = true) default 0 DATABASE FILE & TAXONOMY FORMAT: Attributes in the database file are to be space-delimited, and transactions/instances should be new-line delimited. Currently, only the natural numbers can be used as attribute names / items. Following each attribute name is a colon and then the probability of that attribute occurring. IMPORTANT: The last item and probability of each transaction must be followed by a space. Example database with 3 transactions: 1:0.9 2:0.23 3:0.6 2:0.2 4:0.8 1:0.4 2:0.54 An XML file is used to represent the taxonomy to be used. See a sample taxonomy in the datasets directory, as well as, example databases. CITATION: Erich A. Peterson and Peiyi Tang. Mining Probabilistic Generalized Frequent Itemsets in Uncertain Databases. In Proceedings of the 51tst ACM Southeast Conference (ACMSE), Savannah, GA, USA, April 2013.
About
Probabilistic Generalized Frequent Itemset Mining (PGFIM) in Uncertain Databases
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published