-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples running existing algorithms on existing data #6
Comments
Hi Nikos, thanks for pointing these issues out and I think having examples in the readme is an excellent idea! That being said, let me see if I can help you troubleshooting your problem at hand:
Last but not least, Metanome CLI does not support multiple database connections (although technically that should be possible). |
Hi, Thanks for the prompt response. I am now able to run Ducc. Apparently, I had the class name wrong, so the constructor was missing. Corrected it to Also created a pgpass file, and I am able to process relational data. Now, the problem is with BINDERDatabase as, to my understanding, the algorithm needs a set of tables as input, which I can provide by passing the argument to metanome-cli:
The issue here is with the last line of the above. BINDERDatabase requires the table names to be specified using So, I would like to ask whether there is a way to specify multi-valued algorithm parameters, using
Thanks for your support. Best, |
I am glad to hear that DUCC is already running! I had a look at BINDERDatabase to see how it's configured.
Unfortunately, multiple values are not supported in the current version. But I have seen just now that this pull request might introduce the functionality by supporting repeated I would be happy to hear from you whether the PR works fine for you! |
Hi, sorry for hijacking this issue, I didn't know where else to ask this: When running the metanome-cli, I faced two problems: 1.) When running
was returned, despite using the latest algorithm and a (hopefully correct) configuration. I can't quite see how I should change my configuration to fix this. 2.) When running Thanks in advance! |
No worries, let's see if I can help you there.
Feel free to reach out if you have further questions! |
Hello,
|
Hello, My cmd: Is there any configuration problem?
Thank you! |
Thank you! It works! |
Hello sekruse, |
Hi Ryang326 and sorry for the delayed response. Essentially,
So you can try: java -cp metanome-cli-1.1.jar:fun_for_metanome-0.0.2-SNAPSHOT.jar de.metanome.cli.App --algorithm de.uni_potsdam.hpi.metanome.algorithms.fun.Fun --file-key "Relational Input" --files iris.csv |
when use DC algorithm: |
Since this line is crashing, I think it really is the missing result receiver. In fact, it appears that this method is lacking the necessary code to configure a Unfortunately, I don't have the time to fix this. Do you want to send a PR with a fix? |
Ok. After add some code in "configureResultReceiver" method, it can support DC now. |
hello @sekruse, |
@faisal-ksolves – That depends on the algorithm, your hardware, and various dataset properties besides it size. Most often, RAM is the limiting factor, especially for datasets with many columns. Please refer to the research papers of the individual algorithms for a detailed evaluation. |
@sekruse can i get some quick links of those papers? |
@faisal-ksolves – https://hpi.de/naumann/projects/data-profiling-and-analytics/metanome-data-profiling.html should contain most links. The BINDER paper is called Divide & Conquer-based Inclusion Dependency Discovery. |
Hello, |
Hello sekruse,
I found that CFD's Receiver has been implemented in the 1.2 cli version, but the code |
1 similar comment
Hello sekruse,
I found that CFD's Receiver has been implemented in the 1.2 cli version, but the code |
Hi,
I am trying to invoke from Eclipse the main method in
de.metanome.cli.App
, in order to get one of the existing algorithms to produce profiling data about some data I have locally.Looking at the documentation (user/developer guides), it wasn't obvious how to do this, so hopefully this issue will be of help to others as well.
My starting point would be to use DUCC to discover keys in a set of CSV files.
The arguments I tried using look like the following, but no luck yet (I keep getting '
Could not initialize algorithm
.').So, I believe some examples using metanome-cli would be useful, eg. :
a) on top of existing CSV files (say, DUCC, for key discovery) and,
b) on top of a relational backend (say, BINDER-Database, to discover foreign keys in 3 tables. What happens if these are in different schemas/databases?). Not clear how to store database connection settings (is a ProfileDB necessary? if so, how would an example look like?).
Thank you in advance.
Best,
Nikos
The text was updated successfully, but these errors were encountered: