As a specialized field, Bioinformatics is rather young. It can be difficult to find universities which teach bioinformatics. Bioinformatics can refer to many different types of tasks — from using programs and data without any computer science knowledge, to implementing database or web software, to writing data conversion programs which modify file formats between database storage methods, to writing algorithms for modeling and visualizing research problems. Most of the work is described best as “computational biology”.
In the context of Perl (the famous computer language which runs underneath most web pages), Bioinformatics means computing text data retreived from biological databases.
The book, Beginning Perl for Bioinformatics, by James Tisdall, is for learning introductory software techniques in Perl, with a very brief biology review. For biologists who have rarely programmed and need a starting language or need to learn Perl, this is a good place to start. For technologists, note the copyright date on the book, to see how dated the information may be; since bioinformatics is still a young field, standards and technology are evolving rapidly.
Tisdall: “A large part of what you, the Perl bioinformatics programmer, will spend your time doing amounts to variations on the same theme as Examples 4-1 and 4-2. You’ll get some data, be it DNA, proteins, GenBank entries, or what have you; you’ll manipulate the data; and you’ll print out some results.” (Chapter 4)
For software engineers or computer programmers, the biology field is also a completely new realm which is tough to get a handle on, and has it’s own language: Biology as a field (at least to me) has not yet differentiated itself between “soft, life science” and an engineering science. For example, as a software engineer, the most basic software question is, “I need to write a look-up table for these elements, what are the all the possible strings for the field values?” Yet this simple question can be very difficult to answer by consulting a biology textbook. It is important to keep in mind that data manipulation for biology can involve massive amounts of information: also known as, very, very large strings; the strings represent DNA sequences which may range in practical usage from 10k to 100k.
Perl Bioinformatics Introductory Examples
The author states,
Tisdall: “How do you start from scratch and come up with a program that counts the regulatory elements in some DNA? Read on.”
In chapter 4, there are the first simple Perl examples: convert the DNA sequence to the corresponding RNA sequence. In biology, the DNA uses A, T, G, C (representing the chemical names, of course); whereas RNA uses U instead of T. Simple string manipulation provides the answer: s/T/U/g;