Skip to content
This repository has been archived by the owner on Jun 20, 2019. It is now read-only.

Sample Data for Visualisation Groups #21

Closed
ansjin opened this issue Mar 23, 2017 · 12 comments
Closed

Sample Data for Visualisation Groups #21

ansjin opened this issue Mar 23, 2017 · 12 comments
Assignees

Comments

@ansjin
Copy link
Member

ansjin commented Mar 23, 2017

Find below attached, the results of different algorithms in json format(rename from .txt format to .json format). But most of the data is kinda junk as sentences are picked up directly from wiki page.

In the others file there are some useful results which we got after training one of the algorithm on some training data. But in it also there exists false relationships 🤐

This complete data/relationships is about Mozart

We will keep posting here the updated outputs!!! Do tell us if you need anything else.
@MusicConnectionMachine/group-5 @MusicConnectionMachine/group-6

input.txt
ollie.txt
open_ie.txt
other.txt

@martomi
Copy link

martomi commented Mar 27, 2017

Thanks!

You've probably noticed, there are relationships which are expressing the same but in different words. One example is "was born in" and "was born at". Maybe this is already in your feature backlog, but we would also need you to merge those kinds of similar relationships. One simple heuristic that would work in most cases is to compare the entities in the relationships - if those entities are the same across two different relationships, that would mean that most likely the relationships are the same too.

Also I'd like to point out, there are quite a few odd bugs in the outputs: Sometimes the quality metric is not a number but some random string, sometimes sentences don't correspond to the extracted relationships, sometimes said sentences are a single letter. You've probably noticed it too, I realize it's a first output, please keep us posted with new versions 👍

@ansjin
Copy link
Member Author

ansjin commented Mar 27, 2017

Yes true, I am currently trying to merge those relationships into their normalized relationships so that should solve the issue of getting the similar type of relationships.
Also we could not simply compare the entities across different relationships and merge them. For example

<Entity1, Composed a piece for, Entity2>
<Entiy1, Taught violin to, Entity2>

Like these, there could be multiple different relationships between the two entities, so simply merging based upon the entities will be a problem.

And yes, thanks for pointing out the other mistakes. Actually the text used was direct copy and paste of Wikipedia text without any per-processing, so that's why there were some sentences of a single word and even could be of some letters only. We are building per-processing part for the text so that should solve it.
Also regarding the random string in the quality, yes that should not be there. I will check and see it!

Hopefully in a day or so you will get another version of the relationships! :)

@kordianbruck
Copy link
Contributor

kordianbruck commented Mar 28, 2017

@ansjin please supply the sample data till the deadline tomorrow at the very latest - we need this urgently! ⏩ 💨

@lustoykov
Copy link

See #27

@ansjin
Copy link
Member Author

ansjin commented Mar 29, 2017

Provided with sample timeline data here MusicConnectionMachine/api#22
Working to get the other relationships data also!

@vviro
Copy link

vviro commented Apr 4, 2017

How is this issue progressing? Do you have any unmet dependencies blocking this work? How much data can you provide to the visualization groups today or tomorrow?

@ansjin
Copy link
Member Author

ansjin commented Apr 4, 2017

@vviro Currently as part of algorithm everything is there.

It is getting delayed because of database connection part, (get the data file -> provide it to algorithm-> store back the result of algorithm). This is the part we currently are working on.

Also Hosting the algorithm on the cloud is pending, but it will not take much time as the image of the algorithm is already built so it needs to be pulled and start on the VM.

Once the connection part is done we can provide complete data to them!

@vviro
Copy link

vviro commented Apr 4, 2017

@ansjin good! Are there any issues blocking the connection part from working or is it simply a matter of implementing it?

@Henni
Copy link
Contributor

Henni commented Apr 4, 2017

It's mostly a matter of implementing it. Take a look at our issue tracker in the new Relationships project to see our current progress.

@chaoran-chen
Copy link
Member

@kordianbruck Why waiting? Nobody needs sample data anymore..?

@pfent
Copy link

pfent commented Apr 25, 2017

I think this issue is outdated anyways…

Closing for now

@pfent pfent closed this as completed Apr 25, 2017
@kordianbruck
Copy link
Contributor

Yea, I wasn't sure about it. Thanks for closing it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants