Skip to content

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

License

Notifications You must be signed in to change notification settings

dice-group/CostBased-FedEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

We present novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate the query planners of five different cost-based federated SPARQL query engines using LargeRDFBench queries

Reproducing Results

Please follow the steps to reproduce our results.

  • First you need to setup LargeRDFBench. The complete details can be found from LargeRDFBench home page
  • Download the runable jar files of the selected cost-based federation engines from here except Odyssey, for Odyssey there are many dependencies involved and classes are run using scripts provided in scripts folder of project zip file. Detailed instructions to run the engine is provided at Odyssey home page, updated code with our metric is available here.

1. Generating Results From Jars

For generating results after above setups, next step is generate the summaries(not needed for engines using VoID descriptions, as it is already provided along with source code) and then run the engine using the jar files, we provided. Running queries on engines will result in producing similarity files which contains information related to Acctual and Estimated cardinalities, and overall similarity values of query plan. You can run the jar files using CLI replacing argumnets with following commands:

### CostFed: Generating summaries ###

java -jar costfed-summaries.jar [path-of-(summary.n3)-file] [path-of-endpoints-text-file-folder]
example:
java -jar costfed-summaries.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/costfed/summaries/summary.n3 /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/endpoints

Note: endpoints file should contain the URLs of all endpoints

### CostFed: Executing Queries and Generating plan similarity and cardinality values ###

java -jar costfed-core.jar [path-of-(costfed.props)-file] [path-of-query-results-folder] [path-of-queries-folder] [path-of-endpoints-file-folder]  [path-of-similarity-results-folder]
example:
java -jar costfed-core.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/costfed/costfed.props /home/MuhammadSaleem/umair/evaluation/experiments/query_results /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries /home/MuhammadSaleem/umair/evaluation/experiments/endpoints  /home/MuhammadSaleem/umair/evaluation/experiments/queries/results

Note: example constfed.props file in source code folder.We should set Relative_Error variable to "true" in costfed.prop file. More details about properties and index files is mentioned on project [page](https://github.com/dice-group/CostFed).

### SemaGrow: Generating summaries ###

java -jar semagrow-summary-1.4.1.jar [path-of-endpoints-file-folder] [path-of-SemaGrow-index-file]
example:
java -jar semagrow-summary-1.4.1.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/semagrow/semagrow4.ttl


### SemaGrow: Executing Queries and Generating plan similarity and cardinality values ###

java -jar semagrow-core-1.4.1.jar [path-to-(results.csv)-file] [path-to-qeruries-file] [path-to-similatiy-error-folder] [path-to-(repository-index.ttl)-file] true 

example:
java -jar semagrow-core-1.4.1.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/results/results.csv /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/queries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/similarityResults /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/semagrow/repositoryindex.ttl true


### SPLENDID: Executing Queries and Generating plan similarity and cardinality values ###

Note: SPLENDID uses VoID statistics

java -jar splendid-orignal.jar [path-to-file-(federation-test.properties)] [path-to-splendid-output-file] [path-to-queries-folder] [path-to-similarity-results-file] [true]

example:
java -jar splendid-orignal.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/splendid/eval/federation-test.properties /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/res/splendid-output.txt /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/queries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/similarityResults true



### LHD: Executing Queries and Generating plan similarity and cardinality values ###

Note: LHD uses VoID 
java -jar LHD.jar [path-to-stats-file] [path-to-queries-file] [path-to-similarity-results-folder] [true]

example:
java -jar LHD.jar /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/index/lhd/stats /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/lhdqueries /home/MuhammadSaleem/umair/evaluation/experiments/LargeRDFBenchQueries/queries/results true


Notice that in arguments, if file is mentioned then we give path to exact file, if folder is mentioned then we give path to folder of respective file/files. 
Second thing which is important to notice is that, in other engines except LHD queries folder contains all queries in seperate files, while in LHD all queries are placed in single file. Sample is [here](queries/lhdqueries.txt).

### Odyssey ###

For the case of oddyssey, first you need to extract [project](federatedOptimizer.rar), second step will be to compile the code in code folder, and then you need to run the script(executeQueriesOdyssey.sh) in scripts folder replacing some paths in the script file. For complete instruction you may refer to project readme [file](https://github.com/gmontoya/federatedOptimizer/blob/master/README.md) and [issue page](https://github.com/gmontoya/federatedOptimizer/issues/2), that we posted in order to run the engine successfully. 

2. Generating Results From Source Code

Source code is available here , Import each engine as seperate project. It contains 5 -- CostFed, LHD, SemaGrow, splendid-test, Odyssey -- java projects. Each project could be compiled and run seperately. Main files are as following (arguments will be same as in jar files discussed before):

//Execute Queries on SemaGrow from 
package org.semagrow.semagrow.org.aksw.simba.start.semagrow
public class QueryEvaluation 

//Execute Queries on CostFed from 
package org.aksw.simba.start
public class QueryEvaluation 

//Execute Queries on LHD from 
package trunk
public class lhd 

//Execute Queries on SPLENDID from 
package de.uni_koblenz.west.evaluation
public class QueryProcessingEval

In order to run Odyssey instructions are same as discussed before.

Reuseability

Future SPARQL federation engines can make use of the EvaluationMetric class to generate results for the presented metrics. This class provides methods pertaining to the accuracy of the cardinality estimators of the cost/cardinality-based engines. In particular, it contains methods to calculate the relative error, q-error, and cosine similarity error of the individual triple patterns, joins, and the complete query execution plan generated by the uderlying query processing engines. The complete description of the class is given in the java doc given here.

Loading results into Virtuoso and calculating similarity errors:

Similarity results that we have calculated in our experiments are available here

After generating similarity results, these results are loaded into Virtuoso server and then using SPARQL queries we can get the required output using similarity calculation formula we discussed in paper. Our complete evaluation Results are here.

Complete Evaluation Results

We have compared 5 - CostFed, SPLENDID, LHD, Odyssey, SemaGrow - state-of-the-art SPARQL endpoint federation systems using LargeRDFBench on our proposed metric. Our complete evaluation results can be found here

Canonical Citations

There is no predecessor paper of this resource, as such there is not direct canonical citations associated with the resource. However, the resource presented an evaluation of the following contributions.

M. Saleem, A. Potocki, T. Soru, O. Hartig, and A.-C. Ngonga Ngomo. Costfed:Cost-based query optimization for sparql endpoint federation. 06 2018.

G. Montoya, H. Skaf-Molli, and K. Hose. The odyssey approach for optimizingfederated sparql queries. In C. d’Amato, M. Fernandez, V. Tamma, F. Lecue,P. Cudr ́e-Mauroux, J. Sequeda, C. Lange, and J. Heflin, editors,The SemanticWeb – ISWC 2017, pages 471–489, Cham, 2017. Springer International Publishing.

A. Charalambidis, A. Troumpoukis, and S. Konstantopoulos. Semagrow: Optimizingfederated sparql queries. InProceedings of the 11th International Conference onSemantic Systems, SEMANTICS ’15, pages 121–128, New York, NY, USA, 2015.ACM.

X. Wang, T. Tiropanis, and H. Davis. Lhd: Optimising linked data query processingusing parallelisation.CEUR Workshop Proceedings, 996, 05 2013.

O. G ̈orlitz and S. Staab. Splendid: Sparql endpoint federation exploiting voiddescriptions. InProceedings of the Second International Conference on ConsumingLinked Data - Volume 782, COLD’11, pages 13–24, Aachen, Germany, Germany,2010. CEUR-WS.org.

Future plan:

We will add the resource results in to RdfStoreBenchmarking same like we did for our other published benchmarking results such as DBpedia SPARQL benchmark, FEASIBL and Federation evaluation.

Acknowledgement

The work has been supported by the EU H2020 Marie Skłodowska-Curie project KnowGraphs.

How to cite

@article{qudus2020empirical,
  author = {Qudus, Umair and Saleem, Muhammad and Ngonga Ngomo, Axel-Cyrille and Lee, Young-koo},
  biburl = {https://www.bibsonomy.org/bibtex/2f89a005fbe4ab3882ee190610de0e5e7/dice-research},
  journal = {Semantic Web},
  keywords = {dice group_aksw ngonga qudus saleem simba},
  number = {Preprint},
  pages = {1--26},
  publisher = {IOS Press},
  title = {An empirical evaluation of cost-based federated SPARQL query processing engines},
  url = {http://www.semantic-web-journal.net/system/files/swj2604.pdf},
  year = 2021
}

For any further questions or suggestions, please contact: uqudus@mail.uni-paderborn.de

Authors

About

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages