When Computers Meet Cell Biology

The sequencing of the human genome has resulted in the emergence of an enormously important new branch in the biotechnological sciences. The most common terms for this field are bioinformatics or computational biology.

You may have read about the discovery, recently, of a new and radically more effective mosquito repellent. Based on molecules found in black pepper, it was not discovered using traditional laboratory methods. Instead, it came about through computer simulations based on knowledge of mosquito cell biology. This is just the tip of the bioinformatics iceberg.

Until recently, cell biology has been something of a “black box.” We could observe how cells functioned, but had little insight into the actual mechanisms. Now, though, scientists are learning how cells work on the molecular level.

Using mathematical models and new technologies for detecting molecular processes, researchers are extracting raw data from DNA and modeling the ways genes work and interact. To understand this field, you should view your own genome as a giant software program for manufacturing proteins.

The process of unraveling and decoding the DNA software involves massive amounts of data collection. Then, once collected, correlation and other forms of computer analysis are performed on those data to figure out cause and effect. How big is this challenge?

Consider this: Each human cell contains about 3 gigabytes (3 billion bytes) of pure data and instructions. If this information were written in book form, it would require 5,000 volumes, each 300 pages long. That’s 120 times larger than the kernel of the Windows operating system, which is about 25 megabytes of code. This data resides, of course, in each cell’s pinpoint-sized nucleus. The human body, in turn, has approximately 100 trillion of these 3-gig cells.

Add to this complexity about 5,000 different proteins expressed by each cell. Different cells, however, express different proteins. These proteins, the proteome, behave as computer commands and serve to communicate between cells.

The decoding of all these systems is, obviously, a huge computational challenge. It has only just begun and it would not be possible, in fact, without recent advances in computer technologies. As more powerful computing comes online, the pace of bioinformatics discovery will accelerate. Quantum computing, because it is particularly suited to sorting out cell biology, will enable a “quantum” leap in understanding.

Today, there are three main areas of research in computational biology. These are genome analysis, protein structure prediction and drug design.

  • Genomic analysis is, as you would expect, the statistical analysis of genes. As more and more DNA is analyzed in conjunction with individual medical information, more is known. Among other reasons for performing this analysis, scientists are looking for the genes that cause or contribute to diseases.
  • Protein structure predictions are based on computer models that integrate information about the function of these proteins. This is an immense task, as there are tens of thousands of proteins. Ultimately, understanding the proteome will enable truly personalized medicine, with minimal side effects for patients.
  • With the knowledge gained from understanding the genome and proteome, computer models of target proteins can be created. Using these virtual proteins, drugs can be designed and tested using in silica simulations before testing in the lab.

The development of these virtual molecules, the heart of computational biology, is ending the practice of shooting blindfolded while hunting for drug candidates. Instead of randomly testing different drug candidates and analyzing the results, the field of candidates can be significantly narrowed using simulations. This radically improves the “hit rate,” increasing the speed of drug discovery and lowering costs.

Moreover, computer cell simulations improve as additional data are collected and integrated back into the models. Significant advances have already taken place in this transformational space. Medicine, incidentally, is only one area that is benefiting from bioinformatics. Many of the benefits are taking place in the agricultural sector. The genetic engineering of microorganisms is another area of enormous potential.

This new science of building and experimenting on virtual molecules may be the most important new experimental tool since the scientific method was codified by John Stuart Mill in the 1840s. As Moore’s law (the exponentially increasing power and cost-effectiveness of computers) continues to prove true, so will the power and importance of bioinformatics.

Regards,

Patrick Cox
for The Daily Reckoning

The Daily Reckoning