[dataset] WSDM2012 wrapper #46

RicardoUsbeck · 2014-11-04T15:58:47Z

Write a wrapper for the WSDM2012 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

TortugaAttack · 2016-06-30T10:55:29Z

The dataset is not available anymore at the site:
http://ilps.science.uva.nl/resources/wsdm2012-adding-semantics-to-microblog-posts/
With archive.org (https://web.archive.org/web/20120331023708/http://ilps.science.uva.nl/resources/wsdm2012-adding-semantics-to-microblog-posts/) i found out that there was the dataset http://ilps.science.uva.nl/sites/default/files/wsdm2012-adding-semantics-microblog-posts-annotations.zip
But there is no dataset.
Does anyone have the dataset and if so if it is not publicly available should we put it in?

MichaelRoeder · 2016-06-30T11:14:48Z

Maybe you can find some infos at http://edgar.meij.pro/dataset-adding-semantics-microblog-posts/?utm_source=bit.ly&utm_medium=linked&utm_campaign=myblog

TortugaAttack · 2016-06-30T11:48:21Z

Yup they referenced to the link i stated above. :/

MichaelRoeder · 2016-06-30T12:07:57Z

@RicardoUsbeck in your role of the project leader, you might want to write a mail to http://edgar.meij.pro/contact/ asking for the dataset.

RicardoUsbeck · 2016-11-02T10:01:53Z

final.zip
wsdm2012_annotations.txt

TortugaAttack · 2016-11-21T13:56:06Z

this dataset is just a huuuuge pain in the a#!
so do i miss something or do they link against some kb, but do not provide uris and not even the markings in the tweet, but the already linked markings.

For example: Arab countires, is annotated as Arab World etc.
This makes it very difficult to match where the annotation starts where it ends. (This is an example which i can handle but there are worse!)

Anybody an Idea to match the annotation with start and length in the actual tweet properly?

RicardoUsbeck · 2016-11-21T13:58:45Z

Does the second-last paragraph here help http://edgar.meij.pro/dataset-adding-semantics-microblog-posts/?utm_source=bit.ly&utm_medium=linked&utm_campaign=myblog ?

TortugaAttack · 2016-11-21T14:05:48Z

okay, so i can get the wiki uri. Thats cool.
But the problem with the marking remains.
I can not get "correct" markings (start, length) out of the tweets to create NamedEntities :/

RicardoUsbeck · 2016-11-21T14:19:44Z

Than it is only suitable for the C2KB task, if it still exists @MichaelRoeder ?

MichaelRoeder · 2016-11-21T15:24:53Z

Yes, looks like C2KB to me.

TortugaAttack · 2016-11-22T10:29:34Z

Never done C2KB only before, do i use NamedEntity as well?
if so. the dataset will be finished in no time ;)

MichaelRoeder · 2016-11-22T12:22:03Z

No, please use org.aksw.gerbil.transfer.nif.data.Annotation objects for that. They don't have a position. A description of the markings that are available can be found in the wiki article "Document Markings in gerbil.nif.transfer".

fixed

TortugaAttack · 2016-11-22T21:56:56Z

done

RicardoUsbeck added the type:enhancement label Nov 4, 2014

RicardoUsbeck added this to the Version 2 - new core and better logging milestone Nov 4, 2014

MichaelRoeder removed this from the Version 1.2 - new core and better logging milestone Nov 9, 2015

MichaelRoeder assigned TortugaAttack May 17, 2016

TortugaAttack pushed a commit that referenced this issue Nov 22, 2016

#46 WSDM Dataset added, senseval properties fixed, minor spelling error

6a3d59d

fixed

TortugaAttack closed this as completed Nov 22, 2016

MichaelRoeder unassigned TortugaAttack Jul 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dataset] WSDM2012 wrapper #46

[dataset] WSDM2012 wrapper #46

RicardoUsbeck commented Nov 4, 2014

TortugaAttack commented Jun 30, 2016

MichaelRoeder commented Jun 30, 2016

TortugaAttack commented Jun 30, 2016

MichaelRoeder commented Jun 30, 2016

RicardoUsbeck commented Nov 2, 2016

TortugaAttack commented Nov 21, 2016

RicardoUsbeck commented Nov 21, 2016

TortugaAttack commented Nov 21, 2016

RicardoUsbeck commented Nov 21, 2016

MichaelRoeder commented Nov 21, 2016

TortugaAttack commented Nov 22, 2016

MichaelRoeder commented Nov 22, 2016

TortugaAttack commented Nov 22, 2016

[dataset] WSDM2012 wrapper #46

[dataset] WSDM2012 wrapper #46

Comments

RicardoUsbeck commented Nov 4, 2014

TortugaAttack commented Jun 30, 2016

MichaelRoeder commented Jun 30, 2016

TortugaAttack commented Jun 30, 2016

MichaelRoeder commented Jun 30, 2016

RicardoUsbeck commented Nov 2, 2016

TortugaAttack commented Nov 21, 2016

RicardoUsbeck commented Nov 21, 2016

TortugaAttack commented Nov 21, 2016

RicardoUsbeck commented Nov 21, 2016

MichaelRoeder commented Nov 21, 2016

TortugaAttack commented Nov 22, 2016

MichaelRoeder commented Nov 22, 2016

TortugaAttack commented Nov 22, 2016