Data drives the digital world. Much as been written about the perva- siveness of technology in the world
and the promise of big data. We’ve all heard
All speculation aside, the growing
volume of data is a fact, and one that
can’t be ignored. International Data
Corporation (IDC) estimates the
amount of data in the world will
reach 163 trillion gigabytes by
2025. Every industry—from transportation to manufacturing, health-care to consumer products, financial
services to research and development
and all the others—is looking for new
ways to harness and use this growing
volume of data.
Scientists understand that data is
the fuel that powers insight, discovery
and innovation. The Institute of Cancer Research (ICR), for example, says
big data analytics plays an important
role in the discovery of cancer drugs.
Scientists are analyzing vast amounts of data—from patient
samples, genomic sequencing, medical images, lab results, experimental data, pharmacological data, and many other sources—to
help in their efforts.
According to Bissan Al-Lazikani, head of data science at ICR,
more data is better.
“The more data we are gathering,” he says, “the more patients
we are profiling, the smarter the computer algorithms: the better
we are becoming at discovering drugs for cancer.”
According to Illumina, a manufacturer of DNA sequencing solutions, it cost $300,000 to sequence a human genome
in 2006. Today, with their high-end sequencers, the cost has
dropped to $1,000 and with their new generation of machines it
could eventually drop to as little as $100.
As genomic sequencing has become faster and
more affordable, researchers are running more
sequencing operations and generating more data.
For example, the Swiss Institute of Bioinformatics
(SIB), a non-profit founded in 1998, is a leading
research organization in Switzerland. Comprising
60 bioinformatics research and services groups and
approximately 700 scientists from Swiss schools
of higher education and research institutes, the
organization is a leader in applying compu-
tational methodologies and large-scale
data analysis to genomic, proteomic
and other bioinformatic research.
SIB supports projects from
active research teams (about
300 currently) at their six
different sequencing centers.
The organization handles
about five separate proj-
ects in a week. Data grows
rapidly with sequencing runs
generating up to 30 terabytes
In another example,
GWDG (Gesellschaft für wissenschaftliche Datenverarbeitung
mbH Göttingen), a computing center shared by the University
of Göttingen and the Max Planck Society, has seen data volumes
steadily grow over the years. Today, the center supports some
40,000 users engaged in research and training, manages billions
of files, and stewards about 7 petabytes of data.
For research organizations, the ability to collect and analyze
more data is essential to finding breakthrough discoveries. But
handling more data has its challenges.
Operating at petabyte levels
Data is not stagnant. It has a lifecycle; it grows and ages. In
addition, it must be managed. Once data is created, it must be
stored, accessed for computational analysis and collaboration,
archived for future use, and protected at every step against the
risk of loss. As the amount of scientific data at research insti-
Managing the Growth
of Scientific Data
The ability to collect and analyze more data is essential to
breakthrough discoveries, but it doesn’t come without challenges.
by Mark Pastor, Director, Product and Solution Marketing, Quantum
Data powers insight, discovery and innovation in the sciences—even
when there’s an excess of it.