[info] eurekalert: 'most representative' estimators
Alejandro Dubrovsky
<alito at organicrobot.com> on
Sun Mar 2 06:02:59 UTC 2008
(
these things tend to be important. could someone that knows/understands
the details of what is actually new here care to expand?
http://www.brown.edu/Administration/News_Bureau/2007-08/07-109.html
)
Brown Mathematicians Prove New Way To Build a Better Estimate
Brown University applied mathematicians have found a new way to sift
through mountains of data and draw reliable inferences from it – a Holy
Grail in science and technology. Their pioneering work, the development
of a new class of statistical estimators, could lead to better methods
for analyzing the large data sets that are increasingly common in fields
from biology to business. Results are published in the Proceedings of
the National Academy of Sciences.
Brown University Home
Media Relations Home
2007-08 Release Index
e-Subscribe
PROVIDENCE, R.I. [Brown University] — How do you sift through hundreds
of billions of bits of information and make accurate inferences from
such gargantuan sets of data? Brown University mathematician Charles
“Chip” Lawrence and graduate student Luis Carvalho have arrived at a
fresh answer with broad applications in science, technology and
business.
In new work published in the Proceedings of the National Academy of
Sciences, Lawrence and Carvalho describe a new class of statistical
estimators and prove four theorems concerning their properties. Their
work shows that these “centroid” estimators allow for better statistical
predictions – and, as a result, better ways to extract information from
the immense data sets used in computational biology, information
technology, banking and finance, medicine and engineering.
“What’s exciting about this work – what makes it every scientist’s dream
– is that it’s so fundamental,” Lawrence said. “These new estimators
have applications in biology and beyond and they advance a statistical
method that’s been around for decades.”
For more than 80 years, one of the most common methods of statistical
prediction has been maximum likelihood estimation (MLE). This method is
used to find the single most probable solution, or estimate, from a set
of data.
But new technologies that capture enormous amounts of data – human
genome sequencing, Internet transaction tracking, instruments that beam
high-resolution images from outer space – have opened opportunities to
predict discrete “high dimensional” or “high-D” unknowns. The huge
number of combinations of these “high-D” unknowns produces enormous
statistical uncertainty. Data has outgrown data analysis.
This discrepancy creates a paradox. Instead of producing more precise
predictions about gene activity, shopping habits or the presence of
faraway stars, these large data sets are producing more unreliable
predictions, given current procedures. That’s because maximum likelihood
estimators use data to identify the single most probable solution. But
because any one data point swims in an increasingly immense sea, it’s
not likely to be representative.
Lawrence, a professor of applied mathematics and a faculty member in the
Center for Computational Molecular Biology at Brown, first came upon
this paradox and a potential way around it while working on predicting
the structure of RNA molecules. If you want to predict the structure of
these molecules – how the molecule will look when it folds onto itself –
you’d have billions and billions of possible shapes to choose from.
“Using maximum likelihood estimation, the most likely outcome would be
very, very, very unlikely,” Lawrence said, “so we knew we needed a
better estimation method.”
Lawrence and Carvahlo used statistical decision theory to understand the
limitations of the old procedure when faced with new “high-D” problems.
They also used statistical decision-making theory to find an estimation
procedure that applies to a broad range of statistical problems. These
“centroid” estimators identify not the single most probable solution,
but the solution that is most representative of all the data in a set.
Lawrence and Carvahlo went on to prove four theorems that illustrate the
favorable properties of these estimators and show that they can be
easily computed in many important applications.
“This new procedure should benefit any field that needs to reliably make
predictions of large-scale, high-D unknowns,” Lawrence said.
The U.S. Department of Energy and the National Institutes of Health
funded the work.
Editors: Brown University has a fiber link television studio available
for domestic and international live and taped interviews and maintains
an ISDN line for radio interviews. For more information, call the Office
of Media Relations at (401) 863-2476.
######
More information about the info
mailing list