Computational Biology for Discovering Protein Function – as of 2008

“The vast majority of known proteins have not yet been characterized experimentally, and there is very little that is known about their function.” [1]

A paper just published (Nov 2008) in PLoS Computational Biology describes the fundamental problem of proteins in biology. It is “the dogma” that the DNA sequence is transcribed and translated to protein sequence; related to this, the protein sequence dictates a protein structure, and the protein’s structure (physical shape) dictates the protein’s function. Want something biological to work? Find the right protein shape that, for example, physically fits to the drug.

Except there are big problems here: the structure can’t be easy predicted from the sequence, similar sequences can have vastly different structures (homologous -vs- non-homologous), some proteins have multiple functions in different environments even with the same structure, similar structures can have vastly different functions, and the different functions are often related to very small changes in structure! Meanwhile, the catalog of proteins is growing every day, and the lab’s can’t keep up with experiments which highlight a protein’s function. So what’s a bioengineer to do?

The article, The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function, is a good summary of the issue.

“the most common way to infer homology is by detecting sequence similarity”
Sequence similarity is usually done with sequence alignment.
“homology (both orthology and paralogy) does not guarantee conservation of function”
“databases contain incorrect annotations, mostly caused by erroneous automatic annotation transfer by homology”
“homology between two proteins does not guarantee that they have the same function, not even when sequence similarity is very high”
“a relatively small sequence signature may suffice to conserve the function of a protein even if the rest of the protein has changed considerably”
“Residues that have similar function in different proteins are likely to possess similar physicochemical characteristics.”

Biology research also currently can’t figure out why nature has employed some proteins but not others, even when it has been experimentally verified that artifically created (synthsized) proteins can be substituted into natural processes successfully [2]. So if no rules dictate the mapping between protein and function, then how can functions or structures be reliably predicted?

The current shotgun method is to use multiple methods of determining similarity (sequence, structure, binding sites). Although not mentioned in [1], I assumed electrostatic mapping might also be used, though maybe this hasn’t provided useful results.

[1] Punta M, Ofran Y 2008 The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function. PLoS Computational Biology 4(10): e1000160 doi:10.1371/journal.pcbi.1000160

[2] Chiarabelli, C.; De Lucrezia, D., Question 3: The Worlds of the Prebiotic and Never Born Proteins, Origins of Life and Evolution of Biospheres 2007, 37, 357-361. See also, The Emergence of Life: From Chemical Origins to Synthetic Biology by Pier Luigi Luisi.

88 Proof Synth Bio Blog

Genetically Engineered Organisms, Systems Biology, and Synthetic Biology from an Engineer's Viewpoint