FRAMINGHAM (10/06/2003) - Sarah, a 60-year-old woman in remission from colon cancer, moves from San Francisco to Boston. She celebrates reuniting with her grandkids by eating a delicious Indian dinner. But it's a little spicy. During the next few days, she suffers from a dull pain in her lower abdomen. Her son, concerned, sets up an appointment with a new internist.
After hearing Sarah's complaint and history, the doctor orders a series of tests including a CT scan. The image shows a 2-centimeter, ill-defined nodule in her pelvic area. Other than that, the scan is unremarkable.
The radiologist compares the new scan with a series of digitally stored CT scans from the office of Sarah's San Francisco-based oncologist. The internist is relieved to see that the nodule is unchanged. Furthermore, a series of gene chip analyses that indicate which genes are over-expressed and which are under-expressed show that Sarah's loss of imprinting markers are low, indicating that cancer has most likely not returned. After considering an ever-larger data set, the internist sits Sarah down and makes his only suggestion: Avoid spicy foods.
A few years ago, Sarah probably would have had to endure further treatments and perhaps exploratory surgery. Now, with a rich and accessible trove of data from a variety of digital and genomic sources, health-care providers can make better decisions faster. This is good. It is not, however, the whole story.
For most of medicine's history, the average patient has generated very little data. Most patients see their doctors a couple of times per year, if that, and their diagnoses and treatment regimes are simply written down. A typical patient file consists of about 1.5 pounds of paper. This includes not just clinical data but also insurance, pharmaceutical and other administrative information. Except for people who end up in intensive care and are hooked up to monitors, individual medical records still consist of relatively small data sets. Now, however, new technologies are generating massive and increasingly personalized health-care data sets.
Simply sequencing one version of the human genome required that a new company build the world's largest private computer and maximize the parallel processing capacity of Hewlett-Packard's Alpha chips. Genomic code is four letters on two dimensions. Proteins have a 21-letter code that has to be modeled in three dimensions. Never mind imaging or modeling whole organs.
The volume of data that will flow through a doctor's office in the not-too-distant future will explode, as will the need to coordinate treatments and specialists. Hospitals, pharmaceutical companies, doctors' practices and all types of IT companies will increasingly seek life-science-literate CIOs.
And once we can modify the genomic code for medical purposes, we can apply the same technologies to various other industries including chemicals, cosmetics, food, drinks, energy, insurance, IT and military applications. Bioinformatic skills will be required across major swaths of the global economy.
We get a sense of how this could occur by looking at the expansion of data in historic terms. A project at the University of California at Berkeley attempted to measure the impact of going digital on the volume of data being generated by humans. It estimated that in 1999, the total of all human knowledge, music, images and words amounted to about 12 exabytes.
About 1.5 of those exabytes were generated during 1999 alone.
Some have argued that the study overstates the speed with which data accumulates. But even if that were true, the volume of bytes is growing so quickly that it would take only a few more years to achieve the data volume predicted by the study. As the cost of computing drops to 1-trillionth of what it was in 1940, and as evermore powerful machines come to market, we could soon record everything we read, hear or see throughout a lifetime.
The massive expansion of databases is not word-driven. Storing and transmitting paragraphs, even books, is a relatively low-bandwidth affair. A small book can be stored in one diskette. Music represents a leap in storage and transmission requirements. You can see the impact with the rise of music swapping programs. Few colleges were prepared for Napster; by the time 1 million downloads had occurred nationally, some colleges had used up all their bandwidth.
High-resolution photographs are another order of magnitude. They say a picture is worth a thousand words. Actually a 4MB picture can be worth 400,000 words.
This is a significant change in terms of data flow. To put this new data volume in context, according to the Berkeley study, in 1999, the upper estimate of books being digitized annually was 8 terabytes, periodicals 12 terabytes, newspapers 25 terabytes, and all office documents 195 terabytes. This is trivial compared to the upper estimate of photographs; if digitized, the 80 billion photographs taken in 2000 would require about 400 petabytes of storage space. That is around 1,700 times the storage required for all text generated during the same period.
The life sciences will top that. Until very recently, there was little need to digitize and manage a lot of bits per patient. High-density, data-intensive applications, such as X-ray images, were printed rather than digitized. Unlike Sarah, patients had to seek and then carry the images physically from place to place.
But as the number of specialties grows, and as people increasingly move and change health plans, it becomes harder to know a patient's history. A typical hospital patient now has 11 medical charts and takes 14 medicines. Keeping track of even this relatively trivial volume of data is overwhelming doctors. Medical errors are proliferating and killing thousands.
As data from genomic and proteomic applications migrate from researchers' lab benches and become standard patient treatment protocols, we can expect to see much more tailored medical diagnosis, prescription and treatment profiles. Silicon chips covered with little strands of DNA sometimes are able to show which genes are turned on or off during various normal and disease states. Their use is growing 65 percent per year. Given that many experiments have to be continuously repeated because findings are dependent on age, time and environment, and given that a single experiment can generate a half billion data points, a great deal of data can accumulate very quickly and overwhelm researchers if they lack strong IT-analysis support.
Hospitals, pharmacists, doctors and patients will have to manage and triage ever-growing volumes of complex data in cost-effective ways. As profiles become more predictive and personalized, privacy will become a key issue. We will face complex questions as large databases go online and become more accessible. Who should have access to data about potentially disabling diseases, ones that we may contract at some time in the future, but that have not yet appeared within our bodies? For instance, if you have a BRCA-1 or BRCA-2 gene, you may be more predisposed than someone else to breast cancer. Do you want your employer and your insurance company to know that? Probably not. So who should manage and have access to your data? And what do you do if your risk profile changes significantly because of tests you have already taken but that had not yet been seen as key predictors given current genetic knowledge?
Until quite recently, life sciences had not been large-scale drivers of global data generation and storage. However, several computer, Internet, software and storage companies are beginning to see life sciences as a key opportunity for growth. Those who get literate in life code are likely to have many more job opportunities, and they may be equipped to make more intelligent decisions in areas as disparate as portfolio management, personal health, insurance and computer networking possibilities.
Are you ready?
What He Thinks About: He is recognized as one of the world's leading authorities on the economic and political impacts of life sciences.
Where He Thinks: He is currently chairman and CEO of Biotechonomy, a company that funds enabling technologies in the areas of genomics, proteomics and medical devices.
What He's Written: As the Future Catches You (2001).
Where He Is On the Web: His webpage
Bio Bit: He was the founding director of the Harvard Business School Life Sciences Project.