Project-K23a

Team Members

Katsouli Ioanna (1115201400067)
Konomi Marina (1115201700054)
Leonidas-Panagiotis Papadakos (1115201700117)

Hashtable is used, as it helps to form smaller lists and makes the search procedure faster.
Faster, because it has average time complexity O(1) for search, insertion and deletion
(and worst time complexity O(n)).
Single-linked lists has O(1) for insertion and deletion and O(n) for search procedure.

Build

Run make, to build the executables and unit tests
To run the testing executables, (found in tests/), run make check.

Usage

./specs [-e buckets (default: 10007)] -x dataset_x -w dataset_w [-o FILE (default: stdout)]

Notes/Assumptions

The unit testing library used is acutest.
It is assumed that within the directory of Dataset X, there will be only site folders (e.g. www.ebay.com) and .json files.
The way we print the cliques as distinct pairs of specs is outlined in print_pairs(), within hashtable.c. There was a discussion on this in this Piazza question and verbally, in the weekly Zoom meetings.
The program outputs to stdout by default, which is also suggested in the aforementioned Piazza discussion. Use -o somefile.csv or redirection, to point the data elsewhere.
Stopword list: english.txt from Alir3z4/stop-words.
Using the log10() function, because the numeric results in the exercise explanation point to it instead of log9) or log2().
It is assumed that the sets (training, testing e.t.c.) are at least as big at the BATCH_SIZE.

Validation

At the beginnig of every round, validation() creates new structures of cliques for the sake of the algorithm and destroys them at the end of every round.
These cliques are created according to the predictions of the model.
The idea is: we insert relations of specs, if and only if, these relations don't cause any conflicts in these structures. In case they create conflicts we keep a record of: the specs that caused the conflicts, the type of the relation and the prediction of the model (aka possibility). These records are kept in a form of a list.
For every round, after we've inserted everything we wanted into structures we resolve the conflicts by traversing the list of conflicts. According to the type of the conflict we call its corresponding function.
The decision whether we will keep the relation that caused the conflict or not is made according to the possibilities and the number of relations, which should be less or equal to 5 (this number is chosen empirically considering the delay of the progam and the reduction of conflicts).
Every time we change a relation we update the weights.
validation() continues resolving conflicts until the number of conflicts drops by 45%.

Name		Name	Last commit message	Last commit date
Latest commit History 230 Commits
.github/workflows		.github/workflows
include		include
src		src
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
stopwords.txt		stopwords.txt
structs.svg		structs.svg
Τελική Αναφορά.pdf		Τελική Αναφορά.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project-K23a

Team Members

Build

Usage

Notes/Assumptions

Validation

About

Releases

Contributors 3

Languages

joannakats/Project-K23a

Folders and files

Latest commit

History

Repository files navigation

Project-K23a

Team Members

Build

Usage

Notes/Assumptions

Validation

About

Resources

Stars

Watchers

Forks

Releases

Contributors 3

Languages