GitHub - dice-group/CostBased-FedEval: An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

We present novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate the query planners of five different cost-based federated SPARQL query engines using LargeRDFBench queries

Reproducing Results

Please follow the steps to reproduce our results.

First you need to setup LargeRDFBench. The complete details can be found from LargeRDFBench home page
Download the runable jar files of the selected cost-based federation engines from here except Odyssey, for Odyssey there are many dependencies involved and classes are run using scripts provided in scripts folder of project zip file. Detailed instructions to run the engine is provided at Odyssey home page, updated code with our metric is available here.

1. Generating Results From Jars

For generating results after above setups, next step is generate the summaries(not needed for engines using VoID descriptions, as it is already provided along with source code) and then run the engine using the jar files, we provided. Running queries on engines will result in producing similarity files which contains information related to Acctual and Estimated cardinalities, and overall similarity values of query plan. You can run the jar files using CLI replacing argumnets with following commands:

### CostFed: Generating summaries ###

java -jar costfed-summaries.jar [path-of-(summary.n3)-file] [path-of-endpoints-text-file-folder]
example:
java -jar costfed-summaries.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/costfed/summaries/summary.n3 /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/endpoints

Note: endpoints file should contain the URLs of all endpoints

### CostFed: Executing Queries and Generating plan similarity and cardinality values ###

java -jar costfed-core.jar [path-of-(costfed.props)-file] [path-of-query-results-folder] [path-of-queries-folder] [path-of-endpoints-file-folder]  [path-of-similarity-results-folder]
example:
java -jar costfed-core.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/costfed/costfed.props /home/MuhammadSaleem/umair/evaluation/experiments/query_results /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries /home/MuhammadSaleem/umair/evaluation/experiments/endpoints  /home/MuhammadSaleem/umair/evaluation/experiments/queries/results

Note: example constfed.props file in source code folder.We should set Relative_Error variable to "true" in costfed.prop file. More details about properties and index files is mentioned on project [page](https://github.com/dice-group/CostFed).

### SemaGrow: Generating summaries ###

java -jar semagrow-summary-1.4.1.jar [path-of-endpoints-file-folder] [path-of-SemaGrow-index-file]
example:
java -jar semagrow-summary-1.4.1.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/semagrow/semagrow4.ttl


### SemaGrow: Executing Queries and Generating plan similarity and cardinality values ###

java -jar semagrow-core-1.4.1.jar [path-to-(results.csv)-file] [path-to-qeruries-file] [path-to-similatiy-error-folder] [path-to-(repository-index.ttl)-file] true 

example:
java -jar semagrow-core-1.4.1.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/results/results.csv /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/queries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/similarityResults /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/semagrow/repositoryindex.ttl true


### SPLENDID: Executing Queries and Generating plan similarity and cardinality values ###

Note: SPLENDID uses VoID statistics

java -jar splendid-orignal.jar [path-to-file-(federation-test.properties)] [path-to-splendid-output-file] [path-to-queries-folder] [path-to-similarity-results-file] [true]

example:
java -jar splendid-orignal.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/splendid/eval/federation-test.properties /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/res/splendid-output.txt /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/queries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/similarityResults true



### LHD: Executing Queries and Generating plan similarity and cardinality values ###

Note: LHD uses VoID 
java -jar LHD.jar [path-to-stats-file] [path-to-queries-file] [path-to-similarity-results-folder] [true]

example:
java -jar LHD.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/lhd/stats /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/lhdqueries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/results true


Notice that in arguments, if file is mentioned then we give path to exact file, if folder is mentioned then we give path to folder of respective file/files. 
Second thing which is important to notice is that, in other engines except LHD queries folder contains all queries in seperate files, while in LHD all queries are placed in single file. Sample is [here](queries/lhdqueries.txt).

### Odyssey ###

For the case of oddyssey, first you need to extract [project](federatedOptimizer.rar), second step will be to compile the code in code folder, and then you need to run the script(executeQueriesOdyssey.sh) in scripts folder replacing some paths in the script file. For complete instruction you may refer to project readme [file](https://github.com/gmontoya/federatedOptimizer/blob/master/README.md) and [issue page](https://github.com/gmontoya/federatedOptimizer/issues/2), that we posted in order to run the engine successfully.

2. Generating Results From Source Code

Source code is available here , Import each engine as seperate project. It contains 5 -- CostFed, LHD, SemaGrow, splendid-test, Odyssey -- java projects. Each project could be compiled and run seperately. Main files are as following (arguments will be same as in jar files discussed before):

//Execute Queries on SemaGrow from 
package org.semagrow.semagrow.org.aksw.simba.start.semagrow
public class QueryEvaluation 

//Execute Queries on CostFed from 
package org.aksw.simba.start
public class QueryEvaluation 

//Execute Queries on LHD from 
package trunk
public class lhd 

//Execute Queries on SPLENDID from 
package de.uni_koblenz.west.evaluation
public class QueryProcessingEval

In order to run Odyssey instructions are same as discussed before.

Reuseability

Future SPARQL federation engines can make use of the EvaluationMetric class to generate results for the presented metrics. This class provides methods pertaining to the accuracy of the cardinality estimators of the cost/cardinality-based engines. In particular, it contains methods to calculate the relative error, q-error, and cosine similarity error of the individual triple patterns, joins, and the complete query execution plan generated by the uderlying query processing engines. The complete description of the class is given in the java doc given here.

Loading results into Virtuoso and calculating similarity errors:

Similarity results that we have calculated in our experiments are available here

After generating similarity results, these results are loaded into Virtuoso server and then using SPARQL queries we can get the required output using similarity calculation formula we discussed in paper. Our complete evaluation Results are here.

Complete Evaluation Results

We have compared 5 - CostFed, SPLENDID, LHD, Odyssey, SemaGrow - state-of-the-art SPARQL endpoint federation systems using LargeRDFBench on our proposed metric. Our complete evaluation results can be found here

Canonical Citations

There is no predecessor paper of this resource, as such there is not direct canonical citations associated with the resource. However, the resource presented an evaluation of the following contributions.

M. Saleem, A. Potocki, T. Soru, O. Hartig, and A.-C. Ngonga Ngomo. Costfed:Cost-based query optimization for sparql endpoint federation. 06 2018.

G. Montoya, H. Skaf-Molli, and K. Hose. The odyssey approach for optimizingfederated sparql queries. In C. d’Amato, M. Fernandez, V. Tamma, F. Lecue,P. Cudr ́e-Mauroux, J. Sequeda, C. Lange, and J. Heflin, editors,The SemanticWeb – ISWC 2017, pages 471–489, Cham, 2017. Springer International Publishing.

A. Charalambidis, A. Troumpoukis, and S. Konstantopoulos. Semagrow: Optimizingfederated sparql queries. InProceedings of the 11th International Conference onSemantic Systems, SEMANTICS ’15, pages 121–128, New York, NY, USA, 2015.ACM.

X. Wang, T. Tiropanis, and H. Davis. Lhd: Optimising linked data query processingusing parallelisation.CEUR Workshop Proceedings, 996, 05 2013.

O. G ̈orlitz and S. Staab. Splendid: Sparql endpoint federation exploiting voiddescriptions. InProceedings of the Second International Conference on ConsumingLinked Data - Volume 782, COLD’11, pages 13–24, Aachen, Germany, Germany,2010. CEUR-WS.org.

Future plan:

We will add the resource results in to RdfStoreBenchmarking same like we did for our other published benchmarking results such as DBpedia SPARQL benchmark, FEASIBL and Federation evaluation.

Acknowledgement

The work has been supported by the EU H2020 Marie Skłodowska-Curie project KnowGraphs.

How to cite

@article{qudus2020empirical,
  author = {Qudus, Umair and Saleem, Muhammad and Ngonga Ngomo, Axel-Cyrille and Lee, Young-koo},
  biburl = {https://www.bibsonomy.org/bibtex/2f89a005fbe4ab3882ee190610de0e5e7/dice-research},
  journal = {Semantic Web},
  keywords = {dice group_aksw ngonga qudus saleem simba},
  number = {Preprint},
  pages = {1--26},
  publisher = {IOS Press},
  title = {An empirical evaluation of cost-based federated SPARQL query processing engines},
  url = {http://www.semantic-web-journal.net/system/files/swj2604.pdf},
  year = 2021
}

For any further questions or suggestions, please contact: uqudus@mail.uni-paderborn.de

Authors

Umair Qudus (DICE, Paderborn University)
Muhammad Saleem (AKSW, University of Leipzig)
Axel-Cyrille Ngonga Ngomo (AKSW, University of Leipzig)
Young-Koo Lee (DKE, Kyung Hee University)

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
jars		jars
queries		queries
results		results
source code		source code
EvaluationMetric.html		EvaluationMetric.html
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

Reproducing Results

1. Generating Results From Jars

2. Generating Results From Source Code

Reuseability

Loading results into Virtuoso and calculating similarity errors:

Complete Evaluation Results

Canonical Citations

Future plan:

Acknowledgement

How to cite

Authors

About

Releases 1

Packages

Contributors 4

Languages

License

dice-group/CostBased-FedEval

Folders and files

Latest commit

History

Repository files navigation

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

Reproducing Results

1. Generating Results From Jars

2. Generating Results From Source Code

Reuseability

Loading results into Virtuoso and calculating similarity errors:

Complete Evaluation Results

Canonical Citations

Future plan:

Acknowledgement

How to cite

Authors

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages