We present novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate the query planners of five different cost-based federated SPARQL query engines using LargeRDFBench queries
Please follow the steps to reproduce our results.
- First you need to setup LargeRDFBench. The complete details can be found from LargeRDFBench home page
- Download the runable jar files of the selected cost-based federation engines from here except Odyssey, for Odyssey there are many dependencies involved and classes are run using scripts provided in scripts folder of project zip file. Detailed instructions to run the engine is provided at Odyssey home page, updated code with our metric is available here.
For generating results after above setups, next step is generate the summaries(not needed for engines using VoID descriptions, as it is already provided along with source code) and then run the engine using the jar files, we provided. Running queries on engines will result in producing similarity files which contains information related to Acctual and Estimated cardinalities, and overall similarity values of query plan. You can run the jar files using CLI replacing argumnets with following commands:
### CostFed: Generating summaries ###
java -jar costfed-summaries.jar [path-of-(summary.n3)-file] [path-of-endpoints-text-file-folder]
example:
java -jar costfed-summaries.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/costfed/summaries/summary.n3 /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/endpoints
Note: endpoints file should contain the URLs of all endpoints
### CostFed: Executing Queries and Generating plan similarity and cardinality values ###
java -jar costfed-core.jar [path-of-(costfed.props)-file] [path-of-query-results-folder] [path-of-queries-folder] [path-of-endpoints-file-folder] [path-of-similarity-results-folder]
example:
java -jar costfed-core.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/costfed/costfed.props /home/MuhammadSaleem/umair/evaluation/experiments/query_results /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries /home/MuhammadSaleem/umair/evaluation/experiments/endpoints /home/MuhammadSaleem/umair/evaluation/experiments/queries/results
Note: example constfed.props file in source code folder.We should set Relative_Error variable to "true" in costfed.prop file. More details about properties and index files is mentioned on project [page](https://github.com/dice-group/CostFed).
### SemaGrow: Generating summaries ###
java -jar semagrow-summary-1.4.1.jar [path-of-endpoints-file-folder] [path-of-SemaGrow-index-file]
example:
java -jar semagrow-summary-1.4.1.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/semagrow/semagrow4.ttl
### SemaGrow: Executing Queries and Generating plan similarity and cardinality values ###
java -jar semagrow-core-1.4.1.jar [path-to-(results.csv)-file] [path-to-qeruries-file] [path-to-similatiy-error-folder] [path-to-(repository-index.ttl)-file] true
example:
java -jar semagrow-core-1.4.1.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/results/results.csv /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/queries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/similarityResults /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/semagrow/repositoryindex.ttl true
### SPLENDID: Executing Queries and Generating plan similarity and cardinality values ###
Note: SPLENDID uses VoID statistics
java -jar splendid-orignal.jar [path-to-file-(federation-test.properties)] [path-to-splendid-output-file] [path-to-queries-folder] [path-to-similarity-results-file] [true]
example:
java -jar splendid-orignal.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/splendid/eval/federation-test.properties /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/res/splendid-output.txt /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/queries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/similarityResults true
### LHD: Executing Queries and Generating plan similarity and cardinality values ###
Note: LHD uses VoID
java -jar LHD.jar [path-to-stats-file] [path-to-queries-file] [path-to-similarity-results-folder] [true]
example:
java -jar LHD.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/lhd/stats /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/lhdqueries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/results true
Notice that in arguments, if file is mentioned then we give path to exact file, if folder is mentioned then we give path to folder of respective file/files.
Second thing which is important to notice is that, in other engines except LHD queries folder contains all queries in seperate files, while in LHD all queries are placed in single file. Sample is [here](queries/lhdqueries.txt).
### Odyssey ###
For the case of oddyssey, first you need to extract [project](federatedOptimizer.rar), second step will be to compile the code in code folder, and then you need to run the script(executeQueriesOdyssey.sh) in scripts folder replacing some paths in the script file. For complete instruction you may refer to project readme [file](https://github.com/gmontoya/federatedOptimizer/blob/master/README.md) and [issue page](https://github.com/gmontoya/federatedOptimizer/issues/2), that we posted in order to run the engine successfully.
Source code is available here , Import each engine as seperate project. It contains 5 -- CostFed, LHD, SemaGrow, splendid-test, Odyssey -- java projects. Each project could be compiled and run seperately. Main files are as following (arguments will be same as in jar files discussed before):
//Execute Queries on SemaGrow from
package org.semagrow.semagrow.org.aksw.simba.start.semagrow
public class QueryEvaluation
//Execute Queries on CostFed from
package org.aksw.simba.start
public class QueryEvaluation
//Execute Queries on LHD from
package trunk
public class lhd
//Execute Queries on SPLENDID from
package de.uni_koblenz.west.evaluation
public class QueryProcessingEval
In order to run Odyssey instructions are same as discussed before.
Future SPARQL federation engines can make use of the EvaluationMetric class to generate results for the presented metrics. This class provides methods pertaining to the accuracy of the cardinality estimators of the cost/cardinality-based engines. In particular, it contains methods to calculate the relative error, q-error, and cosine similarity error of the individual triple patterns, joins, and the complete query execution plan generated by the uderlying query processing engines. The complete description of the class is given in the java doc given here.
Similarity results that we have calculated in our experiments are available here
After generating similarity results, these results are loaded into Virtuoso server and then using SPARQL queries we can get the required output using similarity calculation formula we discussed in paper. Our complete evaluation Results are here.
We have compared 5 - CostFed, SPLENDID, LHD, Odyssey, SemaGrow - state-of-the-art SPARQL endpoint federation systems using LargeRDFBench on our proposed metric. Our complete evaluation results can be found here
There is no predecessor paper of this resource, as such there is not direct canonical citations associated with the resource. However, the resource presented an evaluation of the following contributions.
M. Saleem, A. Potocki, T. Soru, O. Hartig, and A.-C. Ngonga Ngomo. Costfed:Cost-based query optimization for sparql endpoint federation. 06 2018.
G. Montoya, H. Skaf-Molli, and K. Hose. The odyssey approach for optimizingfederated sparql queries. In C. d’Amato, M. Fernandez, V. Tamma, F. Lecue,P. Cudr ́e-Mauroux, J. Sequeda, C. Lange, and J. Heflin, editors,The SemanticWeb – ISWC 2017, pages 471–489, Cham, 2017. Springer International Publishing.
A. Charalambidis, A. Troumpoukis, and S. Konstantopoulos. Semagrow: Optimizingfederated sparql queries. InProceedings of the 11th International Conference onSemantic Systems, SEMANTICS ’15, pages 121–128, New York, NY, USA, 2015.ACM.
X. Wang, T. Tiropanis, and H. Davis. Lhd: Optimising linked data query processingusing parallelisation.CEUR Workshop Proceedings, 996, 05 2013.
O. G ̈orlitz and S. Staab. Splendid: Sparql endpoint federation exploiting voiddescriptions. InProceedings of the Second International Conference on ConsumingLinked Data - Volume 782, COLD’11, pages 13–24, Aachen, Germany, Germany,2010. CEUR-WS.org.
We will add the resource results in to RdfStoreBenchmarking same like we did for our other published benchmarking results such as DBpedia SPARQL benchmark, FEASIBL and Federation evaluation.
The work has been supported by the EU H2020 Marie Skłodowska-Curie project KnowGraphs.
@article{qudus2020empirical,
author = {Qudus, Umair and Saleem, Muhammad and Ngonga Ngomo, Axel-Cyrille and Lee, Young-koo},
biburl = {https://www.bibsonomy.org/bibtex/2f89a005fbe4ab3882ee190610de0e5e7/dice-research},
journal = {Semantic Web},
keywords = {dice group_aksw ngonga qudus saleem simba},
number = {Preprint},
pages = {1--26},
publisher = {IOS Press},
title = {An empirical evaluation of cost-based federated SPARQL query processing engines},
url = {http://www.semantic-web-journal.net/system/files/swj2604.pdf},
year = 2021
}
For any further questions or suggestions, please contact: uqudus@mail.uni-paderborn.de
- Umair Qudus (DICE, Paderborn University)
- Muhammad Saleem (AKSW, University of Leipzig)
- Axel-Cyrille Ngonga Ngomo (AKSW, University of Leipzig)
- Young-Koo Lee (DKE, Kyung Hee University)