#Microposts2016 — 6th Workshop on Making Sense of Microposts

Named Entity rEcognition and Linking (NEEL) Challenge

Motivation

Microposts are a highly popular medium to share facts, opinions or emotions. They are an invaluable wealth of data, ready to be mined for training predictive models. Following the success of the previous three years, we are pleased to announce the NEEL challenge which will be part of the #Microposts2016 Workshop at the World Wide Web 2016 conference.

The task of the challenge is to automatically recognise entities and their types from English microposts, and link them to the corresponding English DBpedia 2014 resources (if the resources exist) or NIL identifiers. Participants will have to automatically extract expressions that are formed by discrete (and typically short) sequences of words (e.g., Obama, London, Rakuten) and recognise their types (e.g., Person, Location, Organisation) from a collection of microposts. In the linking stage, the aim is to disambiguate the spotted entity to the corresponding DBpedia resource, or to a NIL reference if the spotted named entity does not match any resource in DBpedia.

We welcome participants from the NEEL Challenge, TREC, TAC KBP, ERD shared tasks to participate in this year's challenge.

Dataset

The final dataset (gold standard) is available for download.

The dataset consists of tweets extracted from a collection of over 18 million tweets. The dataset includes event-annotated tweets provided by the Redites project (http://demeter.inf.ed.ac.uk/redites/) covering multiple noteworthy events from 2011, 2013 (including the death of Amy Winehouse, the London Riots, the Oslo bombing and the Westgate Shopping Mall shootout), tweets extracted from the Twitter firehose from 2014 and 2015 via a selection of hashtags. Since the task of this challenge is to automatically recognise and link entities, we have built our dataset considering both event and non-event tweets. While event tweets are likely to contain entities, non-event tweets enable us to evaluate the performance of the system in avoiding false positives in the entity extraction phase. The training set is built on top of the entire corpus of the NEEL 2014 and 2015 Challenges.

The training set will be released as tsv following the TAC KBP format, where each line contains the following columns:

1st: tweet identifier [alphanumeric]
2nd,3rd: start/end offsets expressed as the number of UTF8 characters starting from 0 (the beginning of the tweet), space is counted too [integer]
4th: link to DBpedia resource or NIL (it may exist different NIL in the corpus. Each NIL may be reused if there are multiple mentions in the text which represent the same entity) [alphanumeric]
5th: salience (confidence score). This field can be assigned randomly, since it will not be used to rank the submissions [double]
6th: type [alphanumeric]

Tokens are separated by TABs. We will advertise the release of the data sets on the workshop mailing list. To be informed, please subscribe to https://groups.google.com/forum/#!forum/neelchallenge.

Evaluation

Participants are allowed to submit up to 3 runs of their system as TSV files. An example of the submission format will be released with the development set. We encourage participants to make available their system to the community to facilitate reuse and we will acknowledge the systems that shared their source code or were otherwise made accessible for reuse otherwise.

We will use the TAC KBP scorer (https://github.com/wikilinks/neleval/wiki/Evaluation) to evaluate the results and in particular we will focus on:

[tagging] strong_typed_mention_match (check entity name boundary and type)
[linking] strong_link_match
[clustering] mention_ceaf (NIL detection)

Award Sponsor

An award of €750, generously sponsored by the FREME Project, is to be awarded to the best submission.

Paper Submission

A paper of 3 pages describing your approach, how you tuned/tested it using the training data, and your results on the dev set. All submissions must be in English. Submissions should be prepared according to the ACM SIG Proceedings Template (see http://www.acm.org/sigs/publications/proceedings-templates), and should include author names and affiliations, and 3-5 author-selected keywords. Along with the paper, authors will submit up to 3 runs of their systems computed over the test set. The submission should be made as a single, unencrypted zip file that includes a plain text file listing its contents. Submission is via EasyChair, at: https://easychair.org/conferences/?conf=microposts2016. Each submission will receive at least 2 peer reviews.

We aim to publish the #Microposts2016 proceedings via CEUR as a single volume containing all three tracks.

Willing to Join the Challenge?

register your team at http://goo.gl/forms/2R7zagtUJZ
download the agreement https://goo.gl/idFdyP, sign it, and send the pdf to giuseppe.rizzo@ismb.it and marieke.van.erp@vu.nl
download the challenge guidelines https://goo.gl/XGmpuY Shortly after, you will receive the instructions on how to obtain the database
check out the challenge timeline and follow up

Important Dates

Release of training: from 7 Dec 2015
Release of dev set: 30 Dec 2015
Release of test set: 31 Jan 2016
Submission of results: 7 Feb 2016
Submission of reports: 7 Feb 2016
Challenge Notification: 18 Feb 2016
Challenge camera-ready deadline: 28 Feb 2016
Workshop: 11/12 Apr 2016 (Registration open to all)

Contact

Mailing list: https://groups.google.com/forum/#!forum/neelchallenge
Twitter hashtags: #neel #microposts2016
Twitter account: @Microposts2016
W3C Microposts Community Group: http://www.w3.org/community/microposts

Challenge Chairs

Giuseppe Rizzo, Istituto Superiore Mario Boella, Italy
Marieke van Erp, Vrije Universiteit Amsterdam, The Netherlands

Challenge Committee

Ebrahim Bagheri, Ryerson University, Canada
Pierpaolo Basile, University of Bari, Italy
David Corney, Signal Media, UK
Grégoire Burel, KMi, Open University, UK
Milan Dojchinovski, Leipzig University, Germany/Czech Technical University, Czech Republic
Guillaume Erétéo, Vigiglobe, France
Anna Lisa Gentile, Universität Mannheim, Germany
Filip Ilievski, Vrije Universiteit Amsterdam, The Netherlands
José M. Morales del Castillo, El Colegio de México, México
Enrico Palumbo, Istituto Superiore Mario Boella, Italy
Bianca Pereira, Insight Centre for Data Analytics, NUIG, Ireland
Bernardo Pereira Nunes, PUC-Rio / UNIRIO, Brazil
Julien Plu, EURECOM, France
Giles Reger, The University of Manchester, UK
Irina Temnikova, Qatar Computing Research Institute, Qatar
Victoria Uren, Aston University, UK

Making Sense of Microposts (#Microposts2016)

Big things come in small packages