This task focuses on evaluating the effectiveness of WSD in IR. The participants will be provided with a collection of 150 documents and 25 topics containing a title, description and narration. Every content word in the documents as well as the topics will be sense marked using the contextually appropriate synset from Princeton Wordnet. The participants are expected to develop algorithms that make use of this additional information (i.e., sense labels) to improve the performance of IR. In the current offering, we will focus only on one language i.e., English. In the subsequent years, we will add documents and topics from other languages also.

Important Dates

Corpus Release Aug 01 2011 Download
Query Release Aug 16 2011 Download
Run Submission Sep 01 2011
Qrel Release Nov 15 2011
Working Note Due Nov 28 2011


Please download the corpus from the following URL. Each sentence in the corpus is sense marked using sense ids from Princeton Wordnet 2.1. Each word in the corpus has the following format
e.g., city_18406385. Here the first digit after the underscore indicates the pos tag and the remaining digits indicate the sense id. The pos tag can take the following 4 values.

  • 1 = Noun
  • 2 = Verb
  • 3 = Adverb
  • 4 = Adjective


Please mail to for any queries.


Each submission file should contain 150 documents per topic, ranked 0-149, in the usual TREC / CLEF submission format, i.e. each line in the file should have the following fields:

<Query id> Q0 <DOCNO> <RANK> <SIMILARITY> <Run-ID>

Participants will need to submit a gzipped file containing retrieval results in the above format.

All participants are required to submit at least one run that uses only the title and description fields (no narrative) of the topics. There is no upper limit on the number of submitted runs. However, please assign a priority to each of your submissions. Runs will be included in the pooling process in order of priority.

Please mail your submissions to miteshk [at]