New algorithm ranks scientific literature

Keeping up with current scientific literature is a daunting task, considering that hundreds to thousands of papers are published each day. Now researchers from North Carolina State University have developed a computer program to help them evaluate and rank scientific articles in their field.

The researchers use a text-mining algorithm to prioritize research papers to read and include in their Comparative Toxicogenomics Database (CTD), a public database that manually curates and codes data from the scientific literature describing how environmental chemicals interact with genes to affect human health.

To help select the most relevant papers for inclusion in the CTD, Thomas Wiegers, a research bioinformatician at NC State and the other co-lead author of the report, developed a sophisticated algorithm as part of a text-mining process. The application evaluates the text from thousands of papers and assigns a relevancy score to each document.

But how good is the algorithm at determining the best papers? To test that, the researchers text-mined 15 000 articles and sent a representative sample to their team of biocurators to manually read and evaluate on their own, blind to the computer’s score. The biocurators concurred with the algorithm 85 percent of the time with respect to the highest-scored papers.

Using the algorithm to rank papers allowed biocurators to focus on the most relevant papers, increasing productivity by 27 percent and novel data content by 100 percent.

There are always outliers in these types of experiments: occasions where the algorithm assigns a very high score to an article that a human biocurator quickly dismisses as irrelevant. The team that looked at those outliers was often able to see a pattern as to why the algorithm mistakenly identified a paper as important.

(The paper, “Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the Comparative Toxicogenomics Database,” was published online April 17 in PLOS ONE. Co-authors are Dr. Cindy Murphy, a biocurator scientist at NC State; Dr. Carolyn Mattingly, associate professor of biology at NC State; and Drs. Robin Johnson, Jean Lay, Kelley Lennon-Hopkins, Cindy Saraceni-Richards and Daniela Sciaky from The Mount Desert Island Biological Laboratory.)

New model for speech and sound recognition

People are adept at recognizing sensations such as sounds or smells, even when many stimuli appear simultaneously. But how the association works between the current event and memory is still poorly understood. Scientists at the Bernstein Center and the Ludwig-Maximilians Universität (LMU) München have developed a mathematical model that accurately mimics this process with little computational effort and may explain experimental findings that have so far remained unclear. (PLoS ONE, September 14, 2011)

The so-called ‘cocktail party-problem’ has already kept scientists busy for decades. How is it possible for the brain to filter familiar voices out of background noise? It is a long-standing hypothesis that we create a kind of sound library in the auditory cortex of the brain during the course of our lives. Professor Christian Leibold and Dr. Gonzalo Otazu at LMU Munich who are also members of the Bernstein Center Munich now show in a new model how the brain can compare stored and perceived sounds in a particularly efficient manner. Figuratively speaking, current models operate on the following principle: An archivist (possibly the brain region thalamus) compares the incoming sound with the individual entries in the library, and receives the degree of matching for each entry. Usually, however, several entries fit similarly well, so the archivist does not know which result is actually the right one.

The new model is different: as previously the archivist compares the sound with the library entries, this time getting back only a few really relevant records and information about how much the archived and heard elements differ. Therefore, only in the case of unknown or little matching inputs are large amounts of data sent back.

The researchers now want to incorporate their findings into other models that are more biologically detail-oriented, and finally test it in psychoacoustic experiments.