Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similarities calculated by Silk differ for the same two projects #41

Closed
OndrejZamazal opened this issue Mar 1, 2017 · 9 comments
Closed
Labels

Comments

@OndrejZamazal
Copy link

I encounter on strange behaviour of calculation of similarities by Silk. I created the same project (with the same project setting and the default configuration setting of Silk) several times and calculated similarities by Silk were sometimes different. Is this possible?

@skarampatakis
Copy link
Member

I don't think that this is possible. Could you point on how to reproduce it?

@OndrejZamazal
Copy link
Author

This can be seen in comparison of computed similarities for the project LuSe-cz-eu1 and JaZb-cz-eu1. For example, suggested link for Transport. In the first case, the project, LuSe-cz-eu1, has one link (after rerun of calculation similarities there is no link for Transport see below) and in the second case, JaZb-cz-eu1, there are three links for the Transport entity. Those two projects have the same project setting, the same input files and the same default configuration for Silk.

However, these calculated similarities can differ for each run after selection of "calculate similarities". It happens that once there is no calculated similarity for Transport (LuSe-cz-eu1), another time there is just one and another time there are three suggested links.

I also encounter on the situation that that those two projects somehow (accidentally) share the calculated similarities.

@skarampatakis
Copy link
Member

It seems that Silk Blocking feature was responsible for this behaviour. I disabled it by default for now. Please check.

I don't understand the last part of your comment. What do you mean by

I also encounter on the situation that that those two projects somehow (accidentally) share the calculated similarities.

?

@OndrejZamazal
Copy link
Author

I checked it and it seems that it behaves correctly. Thanks. My comment about shared similarities was just an attempt to explain the behaviour. Now, it is not relevant.

Let me ask one question regarding Silk configuration. Is it possible to change minimum threshold? Now it is set up on 0.3. But this is another issue I guess.

@skarampatakis
Copy link
Member

This is related with the ability to change the Silk Configuration or Import custom configurations. I am working on this and I believe I can have a fix by the end of the week. ATM the only way to change configuration is to change the default Silk-LSL file. This is possible on a local installation.

@OndrejZamazal
Copy link
Author

Changing Silk configuration would be very helpful. I prepared some test cases for domain experts and I think that our testing by domain experts should start after this new feature is available. Is there any estimation when this is available?

@skarampatakis
Copy link
Member

This would be separated in two main use cases.

  1. Users develop a Silk LSL settings file on Silk workbench or manually and then just import it on Alignment. This is the easy part, we can have it today. Silk already provides a nice and user friendly environment for developing such configuration files.

  2. Users develop a Silk LSL settings file from within Alignment or be able to edit it from there and change some features. For instance copy default settings and change stop words or comparison algorithm. This is partially already implemented with some features missing as it was just ported from the initial version of Alignment, about a year ago. It is a bit tricky to achieve and have a user-friendly result.

If 1 can is enough for your use case I will try to have it ready and tested ASAP. For 2 I will need more time to be fully functional.

@skarampatakis
Copy link
Member

This is related to #3 and #4.

@OndrejZamazal
Copy link
Author

I think that case 1) (import) should be enough for us now. I can work with Alignment and continue with our test case at the end of this week or at the beginning of the next week. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants