Overview of BioCreAtIvE task 1B: Normalized Gene Lists

By Marc Colosimo , Alexander Morgan , Alexander Yeh

Our goal in BioCreAtIve has been to assess the state of the art in text mining, with emphasis on applications that reflect real biological applications.

Download Resources


PDF Accessibility

One or more of the PDF files on this page fall under E202.2 Legacy Exceptions and may not be completely accessible. You may request an accessible version of a PDF using the form on the Contact Us page.

Our goal in BioCreAtIve has been to assess the state of the art in text mining, with emphasis on applications that reflect real biological applications. To this end, we have focused on the curation process for model organism databases. This paper summarizes the BioCreative task 1B, the "Gene Identifier List" task, which is inspired by the gene list typically supplied for each curated paper in a model organism database. For the assessment, systems were given a set of abstracts from each of three model organism databases (Yeast, Fly, and Mouse), along with synonym lists for these organisms that define the correspondence between unique gene identifiers and the mentions of these genes and gene products in the curated literature. The systems were evaluated on their ability to produce the correct list of unique gene identifiers for the genes and gene products mentioned in the abstracts for each organism. For the evaluation, we prepared a training data set of 5000 abstracts per organism with (noisy) gene lists derived automatically from the gene lists for the full text articles; a development test data of 100-200 abstracts per organism with hand-corrected gene lists; and a blind test set of 250 abstracts per organism with carefully annotated gene lists.