Microposts are a highly popular medium to share facts, opinions or emotions. They are an invaluable wealth of data, ready to be mined for training predictive models. Following the success of the previous three years, we are pleased to announce the NEEL challenge which will be part of the #Microposts2016 Workshop at the World Wide Web 2016 conference.
The task of the challenge is to automatically recognise entities and their types from English microposts, and link them to the corresponding English DBpedia 2014 resources (if the resources exist) or NIL identifiers. Participants will have to automatically extract expressions that are formed by discrete (and typically short) sequences of words (e.g., Obama, London, Rakuten) and recognise their types (e.g., Person, Location, Organisation) from a collection of microposts. In the linking stage, the aim is to disambiguate the spotted entity to the corresponding DBpedia resource, or to a NIL reference if the spotted named entity does not match any resource in DBpedia.
We welcome participants from the NEEL Challenge, TREC, TAC KBP, ERD shared tasks to participate in this year's challenge.
The final dataset (gold standard) is available for download.
The dataset consists of tweets extracted from a collection of over 18 million tweets. The dataset includes event-annotated tweets provided by the Redites project (http://demeter.inf.ed.ac.uk/redites/) covering multiple noteworthy events from 2011, 2013 (including the death of Amy Winehouse, the London Riots, the Oslo bombing and the Westgate Shopping Mall shootout), tweets extracted from the Twitter firehose from 2014 and 2015 via a selection of hashtags. Since the task of this challenge is to automatically recognise and link entities, we have built our dataset considering both event and non-event tweets. While event tweets are likely to contain entities, non-event tweets enable us to evaluate the performance of the system in avoiding false positives in the entity extraction phase. The training set is built on top of the entire corpus of the NEEL 2014 and 2015 Challenges.
The training set will be released as tsv following the TAC KBP format, where each line contains the following columns:
Tokens are separated by TABs. We will advertise the release of the data sets on the workshop mailing list. To be informed, please subscribe to https://groups.google.com/forum/#!forum/neelchallenge.
Participants are allowed to submit up to 3 runs of their system as TSV files. An example of the submission format will be released with the development set. We encourage participants to make available their system to the community to facilitate reuse and we will acknowledge the systems that shared their source code or were otherwise made accessible for reuse otherwise.
We will use the TAC KBP scorer (https://github.com/wikilinks/neleval/wiki/Evaluation) to evaluate the results and in particular we will focus on:
An award of €750, generously sponsored by the FREME Project, is to be awarded to the best submission. |
A paper of 3 pages describing your approach, how you tuned/tested it using the training data, and your results on the dev set. All submissions must be in English. Submissions should be prepared according to the ACM SIG Proceedings Template (see http://www.acm.org/sigs/publications/proceedings-templates), and should include author names and affiliations, and 3-5 author-selected keywords. Along with the paper, authors will submit up to 3 runs of their systems computed over the test set. The submission should be made as a single, unencrypted zip file that includes a plain text file listing its contents. Submission is via EasyChair, at: https://easychair.org/conferences/?conf=microposts2016. Each submission will receive at least 2 peer reviews.
We aim to publish the #Microposts2016 proceedings via CEUR as a single volume containing all three tracks.
Giuseppe Rizzo, Istituto Superiore Mario Boella, Italy
Marieke van Erp, Vrije Universiteit Amsterdam, The Netherlands