Month: November 2008

Average Americans are Scared of “Synthetic Biology”

Posted by – November 20, 2008

Yes, believe it, non-synthetic biologists have a poor, even fearful, associations when synthetic biology is described to them:

Q: How do the descriptions of these technologies [synthetic biology] make you feel?

Female Respondent: I really thought of sci-fi movies, where, um, something is created in a laboratory, and it always seems great in the beginning, um, but, down the line, something goes wrong because they didn’t think about this particular situation or things turning this way.

Male Respondent: The “Jurassic Park” movie came to mind.

Female Respondent: It’s scary, why do we need to have new organisms? Why do we need to have, you know, you know, genetic engineering? Does it really help with anything? It’s really, it’s not going to help a common person like us. I don’t think, it’s not going to be for helping any of us.

Watch the video for yourself — promise, though, that you won’t throw your mouse at your screen:
Nanotech and Synbio: Americans Don’t Know What’s Coming: “This survey was informed by two focus groups (video – focus groups) conducted in August [2008] in suburban Baltimore [by The Project on Emerging Nanotechnologies Synbio Poll]. This is the first time—to the pollsters’ knowledge—that synthetic biology has been the subject of a representative national telephone survey.”

One of the men states he’s a biologist, and later says, “Who’s playing god here? Who are we as humans to think we can design or redesign life? It’s nice to be able to do it but is it right?”

While watching the video, keep in mind the benefits and limitations of focus groups (wikipedia: Focus groups).

Genetically Engineer Bacteria and/or Yeast using Sound (Ultrasound, Sonoporation)

Posted by – November 20, 2008

Almost everyone in the BioBricks realm seems to use a standard method for modifying their organisms: chemical transformation. Yet there is another method which is very promising.

In chemical transformation [3], some standard bacteria is grown, purified, mixed with some chemicals which cut open the bacteria, the new DNA plasmid is added to create some modified bacteria, the new DNA plasmid flows through the cut into the bacteria, everything is mixed with some more chemicals, allowed to heal & grow, and purified. (My naive translation of the process)

Note all the chemicals used? Chemicals can be expensive. And the amount of modified bacteria which results from this process is pretty low.

There’s an alternative method used more for yeast than bacteria: voltage-based transformation, electroporation [1]. With electroporation, some standard bacteria is grown, mixed with some simple chemicals, the new DNA plasmid is added, everything is given a quick high voltage zap (like lightening) which cuts open the bacteria, the new DNA plasmid flows through the cut into the bacteria, the bacteria is allowed to heal & grow, and purified.  (Again, my naive translation of the process)

This eliminates some chemicals, although the process still requires some specialized equipment which can be troublesome (and expensive) — the voltage can be as high as 5 kV at 20 mA. (As high as the internal components of a CRT television, which, if accidentally touched, can be easily fatal.)

There’s another method though, that I haven’t seen mentioned: sonic transformation, sonoporation [2]. In sonic transformation, some standard bacteria is grown, some chemicals are added, optionally producing small bubbles, the new DNA plasmid is added, everything is given a loud blast of ultrasound (for example, at 40 kHz) which cuts open the bacteria, the new DNA plasmid flows through the cut into the bacteria, the bacteria is allowed to heal & grow, and purified.

In the research quoted below, sonoporation has shown to be much more effective at modifying bacteria than either chemical transformation or electroporation; plus, this is done without the expensive chemicals necessary for chemical transformation and without high voltage equipment necessary for electroporation.

From [2]:
More

Skunkworks Bioengineering — Prerequisites to Success?

Posted by – November 13, 2008

“Despite all the support and money evident in the projects, there is absolutely no reason this work could not be done in a garage. And all of the parts for these projects are now available from the Registry.” Rob Carlson, iGEM 2008: Surprise — The Future is Here Already, Nov 2008.

The question which should be posed is:

  • What does it really take to actually do this in a garage?

Of course I’m interested in the answer.  I actually want to do this in my garage.

(Let’s ignore the fact for a moment, that many of the iGEM competition projects don’t generate experimental results due to lack of time in the schedule, thus actual project results don’t mirror the project prospectus.)

Here is my short list of what is required:

  • Education (all at university level)
  • Experience
    • 1 year of industry or grad-level engineering lab research & design
    • 1 year of wet lab in synthesis
    • 2 more years of wet lab in synthesis if it’s desired to have a high probability of success on the project (see my SB4.0 notes for where this came from)
  • Equipment
    • Most lab equipment is generally unnecessary, since significant work can be outsourced.
    • Thermocycler
    • Incubator
    • Centrifuge
    • Glassware
    • Example setup: See Making a Biological Counter, Katherine Aull, 2008 (Home bio-lab created for under $500.)
    • Laptop or desktop computer
    • Internet connection
  • Capital
    • About $10k to $20k cash (?) to throw at a problem for outsourced labor, materials, and equipment (this cost decreases on a yearly basis).
  • Time (Work effort)
    • Depends on experience, on the scope of the problem, on project feasibility — of course.
    • 4 to 7 man-months to either obtain a working prototype or scrap the project.

Although some student members of iGEM teams are random majors such as economics or music, somehow I’m not sure they qualify towards the “anyone can do this” mantra.  Of the iGEM competition teams who placed well for their work, all of the members were 3rd year or 4th year undergrads or higher.  The issue isn’t the equipment or ability to outsource — it’s the human capital, the mind-matter, that counts: education and experience.  (Which, in the “I want to DIY my Bio!” crowd, is a rare find.)

With all that covered, it seems anyone can have their very own glowing bacteria.

“Biology is hard, and expensive, and most people trained enough to make a go of it have a lab already — one that pays them to work.”   — Katherine Aull (see above ref.)

Modifying Yeast for Drug Production in Beer – BioBeer

Posted by – November 13, 2008

How synthetic biology gets done in iGEM competition:


Jam08 Live: Rice – BioBeer from mac cowell on Vimeo. [1]

Before getting too excited though, keep in mind:

  • The experiment hasn’t been verified to work. The yeast “seems to be consuming some intermediate products” however the drug production hasn’t been verified.
  • The benefits of resveritrol may be dramatically overstated. It may take very large quantities of resveritrol to have any health benefits [2].
  • The public-at-large has responded very enthusiastically to this idea (same with the modified yeast for Bio Yogurt) — which may signal the tempering of the typical U.S. “No GMO!” paranoia. Random people have proclaimed: “I want to drink this beer!”, casting aside concerns of consuming genetic engineered products.
  • The most remarkable health benefits in both wine and beer may be due to the alcohol (reducing psychological stress); it seems no one (?) has a good study on this because no one studies non-alcoholic wine or non-alcoholic beer.
[1] Jam08 Live: Rice – BioBeer from mac cowell on Vimeo. Filmed by http://www.vimeo.com/macowel
[2] Beer: The Best Beverage in the World. Charlie Bamforth, Ph.D., D.Sc. of University of California, Davis, at PARC Forum. March 22, 2007. Watch the Video as wmv  Charlie Bamforth is Fellow of the Institute of Brewing & Distilling and Fellow of the Institute of Biology, Editor in Chief of the Journal of the American Society of Brewing Chemists and has published innumerable papers, articles and books on beer and brewing.

Computational Biology for Discovering Protein Function – as of 2008

Posted by – November 12, 2008

“The vast majority of known proteins have not yet been characterized experimentally, and there is very little that is known about their function.” [1]

A paper just published (Nov 2008) in PLoS Computational Biology describes the fundamental problem of proteins in biology.  It is “the dogma” that the DNA sequence is transcribed and translated to protein sequence; related to this, the protein sequence dictates a protein structure, and the protein’s structure (physical shape) dictates the protein’s function.  Want something biological to work?  Find the right protein shape that, for example, physically fits to the drug.

Except there are big problems here:  the structure can’t be easy predicted from the sequence, similar sequences can have vastly different structures (homologous -vs- non-homologous), some proteins have multiple functions in different environments even with the same structure, similar structures can have vastly different functions, and the different functions are often related to very small changes in structure!  Meanwhile, the catalog of proteins is growing every day, and the lab’s can’t keep up with experiments which highlight a protein’s function. So what’s a bioengineer to do?

The article, The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function, is a good summary of the issue.

  1. “the most common way to infer homology is by detecting sequence similarity”
  2. Sequence similarity is usually done with sequence alignment.
  3. “homology (both orthology and paralogy) does not guarantee conservation of function”
  4. “databases contain incorrect annotations, mostly caused by erroneous automatic annotation transfer by homology”
  5. “homology between two proteins does not guarantee that they have the same function, not even when sequence similarity is very high”
  6. “a relatively small sequence signature may suffice to conserve the function of a protein even if the rest of the protein has changed considerably”
  7. “Residues that have similar function in different proteins are likely to possess similar physicochemical characteristics.”

Biology research also currently can’t figure out why nature has employed some proteins but not others, even when it has been experimentally verified that artifically created (synthsized) proteins can be substituted into natural processes successfully [2].  So if no rules dictate the mapping between protein and function, then how can functions or structures be reliably predicted?

The current shotgun method is to use multiple methods of determining similarity (sequence, structure, binding sites).  Although not mentioned in [1], I assumed electrostatic mapping might also be used, though maybe this hasn’t provided useful results.

[1] Punta M, Ofran Y 2008 The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function. PLoS Computational Biology 4(10): e1000160 doi:10.1371/journal.pcbi.1000160
[2] Chiarabelli, C.; De Lucrezia, D., Question 3: The Worlds of the Prebiotic and Never Born Proteins, Origins of Life and Evolution of Biospheres 2007, 37, 357-361.  See also, The Emergence of Life: From Chemical Origins to Synthetic Biology by Pier Luigi Luisi.

Share-Alike Genetic Engineering Intellectual Property Licenses

Posted by – November 9, 2008

A draft legal license for BioBricks was created early in 2008, though as far as I know, it has not been “tested” by industry use of the intellectual property (anyone know?).  Surprisingly, to me, the draft BioBrick license doesn’t contain any liability statements.  The BioBrick license attempts to solidify the “open source”ness of biological components.

Compare the BioBrick license to the original open source software license from MIT, below.

MIT License for Software (circa 1992?)

Copyright (c) [year] [copyright holders]

Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following
conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

The BioBricks license is simple and mandates only the sharing intent of the license.  Whereas the original MIT and GNU Copyleft licenses contain a significant statement of liability-reduction (which, as far as I know, hasn’t actually been tested in court, though is generally accepted), the BioBricks discussions don’t seem to mention liability at all.

Draft of the BioBricks Legal Scheme (January 2008)

1. You are free to modify, improve, and use all BioBrick parts, in systems with other BioBricks parts or non-BioBrick genetic material.
2. If you release a product, commercially or otherwise, that contains BioBrick parts or was produced using BioBrick parts, then you must make freely available the information about all BioBrick parts used in the product, or in producing the product, both for preexisting BioBrick parts and any new or improved BioBrick parts. You do not need to release information about any non-BioBrick material used in the system.
3. By using BioBrick parts, you agree to not encumber the use of BioBrick parts, individually or in combination, by others.

The BioBrick license seems similar to the Creative Commons Share-Alike license.  The legal scheme is based on the latest legal meetings organized by the BioBrick Foundation:

BioBrick Foundation / Samuelson Clinic Materials from March 2008 UCSF Workshop

  1. Legal Options Backgrounder & Draft BBF Legal Scheme: PDF
  2. Executive Summary of Findings: PDF
  3. Slides from March UCSF Workshop: PPT

Further BioBrick related legal documents are at Open Wet Ware: http://openwetware.org/wiki/The_BioBricks_Foundation:Legal

Open questions as of this writing:

  • Has liability been addressed in Biobricks?  (Especially considering the implications of biosafety that surrounds the field.)   By liability, this means a license term which boils down to: “The author of this BioBrick is not responsible if anything bad happens when/if anyone creates/clones/uses/modifies it.”
  • Has industry brought BioBrick technology to market which would “test” the BioBrick license?
  • What is the roadmap for future license drafts/official versions?

2008’s Thinking on Biological Engineering Business

Posted by – November 8, 2008

One set of perspectives on systems biology startup business for 2008.

Institute of Biological Engineering’s

Bio-Business Nexus 2008

From OpenWetWare

Presenter Title Presentation
Dr. Rob Whitehead North Carolina State University Office of Technology Transfer-putting ideas to work Media:1.Whitehead – IBE NCSU March2008.pdf
Michael Batalia, Ph.D. Avant-Garde Technology Transfer Leading Innovation at Wake Forest University Health Sciences Media:2. Batalia – 2008 IBE BioBusiness Nexus_MAB.pdf
John C. Draper, President, First Flight Venture Center Business Incubation, A Research Triangle Park Resource Media:3. Draper – IBE 13thAnnualConf-03062008c.pdf
Lister Delgado NC IDEA Grants Program Media:4. Delgado – NCIDEA Grants Program Overview – IBE Conference.pdf
Rob Lindberg, PhD, RAC The North Carolina Biotechnology Center Media:5. Lindberg – IBE 2008 BTD presentation 030708.pdf

Links

2007’s Thinking on Biological Engineering Business

Posted by – November 7, 2008

The presentations below were given at the  Institute of Biological Engineering annual meeting March 30, 2007 in St. Louis, Missouri, under the topic of BioBusiness.

The Mellitz presentation is very good reading.

BioBusiness Nexus Presentations 2007

Mellitz presentation: Commercialization of University IP: Translational Research in BME Leading to Company Formation

Nidus Center presentation

BioGenerator presentation: Bridging the Gap Between Technologies and Viable Companies

Akermin presentation: Biofuel Cells for Portable Electronic Applications

Chlorogen presentation: Production of a Human TGF-beta Family Protein with Potential as an anti-Cancer Therapeutic Protein From Plant Chloroplast

Kereos presentation: Targeted Imaging / Targeted Therapy

Apath presentation: Automated Antiviral Drug Screening Using Engineered Replication Systems

Orion Genomics presentation: DNA Methylation & Cancer

Sequoia Sciences presentation: Bringing Back Nature to Drug Discovery Natural Molecules in an Antibacterial Program

Somark Innovations presentation: BIOCOMPATIBLE RFID INK TATTOO

Web Seminar on Systems Biology: New Approaches, New Tools

Posted by – November 5, 2008

A 90-minute webinar (web seminar) on Systems Biology as a “new approach” in solving today’s health problems — includes summary of what systems biology is, how it is emerging as a current technology, and methods for addressing “systems medicine”.

Systems Biology: New Approaches, New Tools and Implications for Human Health Care

The original event was broadcast on:
Date: Thursday, October 30, 2008
Time: 1:00 PM EDT
Duration: 90-minutes

http://w.on24.com/r.htm?e=124050&s=1&k=7AE6968040C980E22D45F3994A29F4DA

Presentation is performed by the Institute for Systems Biology.  Partially sponsored by Agilent Technologies.

My brief overview

Initial introduction includes overview of technical & market drivers, and convergence of science & technology to create the emerging field.

“P4” Medicine: predictive, preventive, participatory and personalized medicine.

What are the top revolutionary technologies today which will drive the next several years of research?

  • High throughput and inexpensive DNA sequencing
  • Using nanotechnology to measure proteins from small amounts of blood
  • Information technology

System Biology’s largest “unsolved problems”

  • Validated datasets are unavailable – the massive amount of data is not fully used and systems for analysis are not fully developed
    • Open-to-the-public databases for research information is very important

      • “it must be freely and publically available to all, including industry. […] Those publishing articles must take the extra step to continue improving the data after simply being published” – Akhilesh Pandey, M.D., Ph. D., Associate Professor, John Hopkins University
  • Proteomics is still in its infancy
  • Sharing data is difficult
  • Building blocks are still being built; building blocks means specific databases for specialized data
  • Drug industry develops many new drugs; “but what we really need is” to identify specific drug targets for specific diseases/conditions, for screening
  • Lack of qualified people working in the field
    • “Biologists have to take second major in an engineering science” — Leroy Hood, M.D., Ph. D., President, Institute for Systems Biology

A typical Proteomics and Metabolomics workflow

  • Reduce complexity – Fractionation (mRP/UV); immunodepletion; MRP, OGE; sample preparation for membrane proteins
  • Profiling differences – Profiling (ToF); glycan profiling; metabolite profiling; biomarker discovery
  • Identifying Compounds – Identification (IT, QToF); PTM; glycan ID; metabolite ID; biomarker discovery
  • Characterizing differences – Characterization (QToF); intact protein; de-novo sequencing; protein complexes; membrane proteins; metalloproteins
  • Targeted quantitation – Validation (QQQ); biomarker screening by MRM; metabolite screening (UV, GC/MS, QQQ, ToF, QToF, IT, CE-MS, ICP-MS)

Discussed web site links

Towards a Market Model for Synthetic Biology

Posted by – November 4, 2008

If you ask most incumbents in the field of biology, they’ll likely say: “What exactly is synthetic biology?”

Maybe they should watch Drew Endy’s video on YouTube.

However, really, synthetic biology is the simple extension of modern biology.  Not too long ago, it wasn’t possible to “make” biology.  Now, it is possible (also known as: synthesis).  And the cost of synthesis keeps getting lower every year.  Some say the drop in the cost of synthesis looks curiously like the curves to Moore’s Law: doubling in technological capability every X months (where X is sometimes debated, usually quoted at 18 months, often misquoted as “every year”).

Synthetic biology is often compared to the computer industry, to leverage the historical perspective.

In the computer industry, there are three big pieces of the pie (usually seen as two; I want to purposely highlight as three).

  • Hardware companies
  • Software companies that sell source code (“source software companies” for the purposes of this article)
  • Software companies that sell binaries (“binary software companies” for the purposes of this article)

In the early days of the personal computer revolution, some bright guys saw that the hardware companies had a great product.. but software could be a much, much more profitable product:  with software, the cost of manufacturing is ZERO.  With hardware, the cost of manufacturing weighs down profits, so the maximum margin might be 20% to 30% for very glamorous products, and maybe 5% to 10% for less glamorous products.  These bright guys immediately bluffed their ways into IBM’s business center and negotiated what turned out to be one of the most profitable deals (if not the most profitable deal!) in the history of the world (Microsoft’s model).  In parallel to this, some other bright guys decided that they could instantly boost their overall profits by both building hardware and including all the fundamental software: hence, the first “personal computer systems company” (hardware plus all necessary software) was created (Apple’s model).

It’s worth keeping in mind at all times that the computer revolution existed before the “personal” computer revolution.  At that time, there were only mainframes (IBM: “big blue”).  During that time, though I’m not totally sure, I believe the market likely segmented like this:

  • Mainframe system companies (hardware + software)
  • Mainframe service companies (people required to run & maintain the machines)

Mainframe system companies charged heafty prices because they could: the only purchasers were governments and incredibly large (deep pocket) companies.  Yet the mainframe hardware business was killed by the personal computer market, which offered enough technology to the mass market to undercut most of the need for mainframes.  Of course, a mainframe company would never want to make a personal computer — it would erode their own profit potential (eventually, IBM caved in and created the IBM PC, but it was originally unsuccessful and only the reverse-engineered clones from other companies were accepted by the market).

The innovation in computer technology occurred so rapidly that unhealthy monopolies were created as a result. (Microsoft, AT&T, IBM)  In the case of AT&T, they were forced to split into different operations and allow more market competition (both the short and long term benefits of this forced split are still debated).  Microsoft avoided being split through government ignorance, entrenchment, lawyers, and luck.

Biology is a different from the story above. Biology does have “soft” ware, of a sort — it’s DNA.  The software is sometimes distributed as “source” code, of a sort — it’s as genes, protocols, primers and vectors.  The software is sometimes distributed as “binary” code, of a sort, too — it’s the modified microbes that “just run” when placed in the right environment.  But after this, the analogy kind of breaks down; the cost of manufacturing is never near zero.  Additionally, the fundamental “source” code can’t be protected under copyright, because it’s DNA.  And, the goverment has a heavy hand in determining what “software binaries” you can get ahold of in order to run.

Of course, I’m still a rank amateur at biology, though, currently, this is what others seem to see in biology.  And of course, I’m predicting the future, so maybe no one can definitely claim I’m incorrect.

  • Hardware companies, supplying machines and tools.
  • “Software” companies: supplying digital DNA sequences, cellular models (like BioBricks), and bioinformatics programs which simulate & verify the cellular models for fabrication.  Additionally, much of the intellectual property here will be public domain or Share-Alike licensed.
  • Fabrication companies: supplying physical biological material based on the digital sequences.  Most people will outsource fabrication to these companies and only the “large pharmas” will perform fabrication in-house.

Does this fit reality?  I say, no.  The fabrication companies will quickly starve, since the prices continue to fall — just like the DRAM computer companies closed with the falling prices of the transistor and transistor memory (Intel bailed out of manufacturing DRAM as Moore’s Law eroded their profits beyond repair).  The idealized “Software” companies can’t actually operate in the prescribed manner, because biology consists of chemicals, and such a company is not set up as a physical laboratory; the Share-Alike licensing will remove profit potential; and the company that sells the chemicals isn’t even on the map.

Here’s what seems to mirror the current market more closely.

  • Hardware companies: supply machines and lots of glass hardware.  Presumably lower profit margin except for large equipment sold to big pharma.
  • Wet Lab companies (biological engineers): supplying primers, enzymes, reagents, chemicals.  High profit margins, due to patent protection and high barrier to entry (requires highly specialized education and some number of years of experience).
  • Dry Lab companies (bioinformatics engineers): Design and supply digital DNA and cellular models, via computational models, and design bioinformatics progams and wet lab protocols for use.  Funky profit margin, because, if design is made Share-Alike, then profits don’t exist; if design is kept secret, then standards may not evolve well; and, the DNA intellectual property is already mandated as public domain.
  • Fabrication service companies: encompass limited rage of Wet Lab + Dry Lab, but don’t create their own protocols.  Margins vary, depending on level of the service.

The big winner right now seems to be the Wet Lab guys and the Hardware guys.  By leveraging patent protection, the Wet Lab competition is locked out of competing.  Although no one in the industry has anything nice to say about patents, everyone files them, and all investors demand them.  The Hardware guys currently have big profits, high prices, and little competition, as no one is forcing the prices down — sound familiar?  This should; it’s the same phenomenon that occurred in the mainframe days.

The shakeout seems to be that the Dry Lab guys, the Hardware guys, and the Fabrication guys will need to get together in some way.

Yet, there’s another interesting aspect of biology: organisms are different.  Each organism has it’s own unique pathways and in-compatibilities.  It is not possible, in general, to run “software” from one genetically engineered machine on another genetically engineered machine.  In fact, that’s why biologists usually argue against synthetic biology, claiming it will never work.

So rather than the universal “PC platform” that exists in the computer world (a derivative of both unhealthy monopolistic practices and the market requiring a common environment), the biological environments will number in the thousands.  Yeast grows differently than e. Coli, and both Hardware and Dry Lab are customized to individual species.  That could be the market segmentation: biological compatibility itself, creating multiple competitive hardware and “software” markets, with some market segments Share-Alike, and some not.

If someone has a crystal ball, let me borrow it for a second.

In-Depth Review, Part 3 of 5: “Beginning Perl for Bioinformatics” by James Tisdall

Posted by – November 3, 2008

In my previous write-ups of Part 1 and Part 2, I traced the Perl code and examples in the first half of the book, Beginning Perl for Bioinformatics, by James Tisdall, highlighting different approaches to bioinformatics in Perl.  As I mentioned before, Perl provides many different (and often stylistic) methods to solving a software problem.  The different methods usually differ in execution speed, code size, code scalability, readability / maintainability, simplicity, and advanced Perl symantics.  Since this is a beginning text, the advanced Perl isn’t covered.. that means templates, which could be useful for parsing bioinformatics data, are one of the topics not included here.

Often, the fastest code is the smallest code, and contains subtle code tricks for optimization. This is a perfect setup, because, in Chapter 8, Tisdall starts parsing FASTA files.  With Perl’s parsing engine, the subtly of the tricks leaves a lot of room for optimizing software.

FASTA & Sequence Translation

Tisdall offers a software problem based on the FASTA data, so time to solve it:

Tisdall: When you try to print the “raw” sequence data, it can be a problem if the data is much longer than the width of the page. For most practical purposes, 80 characters is about the maximum length you should try to fit across a page. Let’s write a print_sequence subroutine that takes as its arguments some sequence and a line length and prints out the sequence, breaking it up into lines of that length.

Compare his solution to mine:

# Solution by Tisdall
# print_sequence
#
# A subroutine to format and print sequence data

sub print_sequence {

    my($sequence, $length) = @_;

    use strict;
    use warnings;

    # Print sequence in lines of $length
    for ( my $pos = 0 ; $pos < length($sequence) ; $pos += $length ) {
        print substr($sequence, $pos, $length), "\n";
    }
}

The above is a straightforward, strings-based approach. I chose a regex approach, which took a couple minutes to work out, though should be faster during run-time:

sub dna_print {
  my $str = $_[0];
  do {
    $str =~ s/^([\w]{0,25})//;
    print "$1\n";
  } until (!length($str));
}

The above relies on the following method:

More