Category: Bioinformatics

“ELISA Redux” 96-Well Plate Cryptography Challenge

Posted by – November 11, 2009

The publication GEN is running a contest, with $1,500 plus fancy biotechnology equipment as the prize, for the first one who can decode the cryptographic message hidden in this 96-well plate:

GEN's ELISA Redux contest well plate

"A message has been encrypted into the ELISA plate image, called ELISA Redux, based on the color of each well."

More

Add Streaming Video to any Bio-lab!

Posted by – October 16, 2009

Combining an inexpensive (under $15) USB webcam with free VLC media player software, it is simple to add password-protected internet streaming video for remote users to any lab.  VLC includes the ability to capture from a local webcam, transcode the video data, and stream the video over the web.  It’s available for OS/X, Unix, Linux, and Microsoft systems.

Hint: Video formats are confusing.  Even video professionals have a tricky time figuring out the standards and compatibility issues.  Today’s web browsers also have limitations in what they can display (mime types and such) — which simply means both sides need to use VLC.  Figuring all this out using the VLC documentation takes some work.  Transcoding the video is required and a proper container must be used to encapsulate both video and audio.  Once debugged, it’s good to go.

Here’s how it worked in the lab:

Webcam for Biotech Lab Automation

See the setup below to get it running.

More

More on Bio-lab Automation – Software for Controlling FIAlab Devices for Microfluidics

Posted by – October 9, 2009

Perl software to control lab syringe pump and valve device, for biology automation, initial version finished today. Works great.  Next, need to add the network code, it can be controlled remotely and in synchronization with other laboratory devices, including the bio-robot.  This software will be used in the microfluidics project.  The software is also part of the larger Perl Robotics project, and a new release will be posted to CPAN next week.
FIAlab MicroSIA Valve and Syringe System FIAlab MicroSIA Experimental Setup

More details on the software follow:

More

Software for Biohackers

Posted by – July 30, 2009

Some open source software collections of biology interest are noted here. I’ll update this list as time goes on. If you would like to have your project listed too, leave a comment with all the fields of the table and I’ll add your project. If any of these links do not work, let me know too.

Name Status Field Language Description
Eclipse Stable Programming, editing, building, debugging Java, C, C++, Perl, .. Eclipse is the most widely adopted software development environment in terms of language support, corporate support, and user plugin support. It is open source. It’s the “Office” suite for programming.
BioPerl Stable Bioinformatics Perl, C BioPerl has many modules for genomic sequence analysis/matching, genomic searches to databases, file format conversion, etc.
BioPython Stable Bioinformatics Python, C BioPython has many modules for computational biology.
BioJava Stable Bioinformatics Java BioJava has many modules for computational biology.
BioLib Stable Bioinformatics C, C++ BioLib has many modules for file format conversion, integration to other Bio* language projects, genomic sequence matching, etc.
Bio-Linux Stable Operating System with Bundled Bioinformatics Applications Many “A dedicated bioinformatics workstation – install it or run it live”
DNA Linux Stable Operating System with Bundled Bioinformatics Applications Many “DNALinux is a Virtual Machine with bioinformatic software preinstalled.”
Several Synthetic Biology editors, simulators, or suites, listed at OpenWetWare Computational Tools, such as:
Synthetic Biology Software Suite (SynBioSS), BioJADE, GenoCAD, BioStudio, BioCad,TinkerCell, Clotho
Work In Progress Synthetic Biology Moslty Java, some Web based, some Microsoft .NET Pathway modeling & simulation for synthetic biology genetic engineering, editing, parts databases, etc
APE (A Plasmid Editor) Stable Genetic engineering Java DNA sequence and translation editor

3G Cellphone as Biotech Tool: “Cellular Phone Enabled Non-Invasive Tissue Classifier”

Posted by – July 5, 2009

A recent paper in PLoS ONE describes a diagnostic system which uses a common 3G cellphone with bluetooth to assist in point-of-care measurement of tissues, from tissue samples previously taken, with remote data analysis [1].  The hope, of course, is that this could be used for detecting cancer tissue vs. non-cancer tissue.  In general this technological approach is important for the following reasons: it allows data analysis across large populations with server-side storage of the data for later refinement; not all towns or cities will have expert medical staff to classify tissues at a hospital; and sending the sample to another city for classification takes time and creates measurement risk (mishandling, contamination, data entry error, biological degredation, etc).  Since the tissues are measured by a digital networked device, the results can be quickly sent to a central database for further analysis, or as I hint below, for geographically mapping medical data for bioinformatics.

From my interpretation, the complete system looks like this:

The probe electronics are described in [2]; unfortunately that article is not open access, so I can’t read it.  The probes located around the sample are switched to conduct in various patterns and a learning algorithm is used to isolate the probe pair with the optimal signal.  The sample is placed at the center of the petri dish and covered in saline.

Sending the raw data to a central server for analysis allows for complex pattern recognition across all samples collected; thus, the data analysis and the result can improve over time (better fitting algorithms or better weighting in the same algorithm).  The impedance analysis fits according to the magnitude, phase, frequency, and the probe pair.

The article does not explain the technologies used with the cell phone for communicating between the measurement side and the cellular side (USB / Bluetooth communication link, Java, E-mail application link, etc).  Though these technologies are cellphone specific, it is part of the method, and it is not described.  The iPhone would be a good candidate for this project as well.  A cellphone with integrated GPS would allow for location data to be sent to the server, which may be able to provide better number-crunching in the data processing algorithms, for recognition of geographic regions with high risk.

References:
[1] Laufer S, Rubinsky B, 2009 Cellular Phone Enabled Non-Invasive Tissue Classifier. PLoS ONE 4(4): e5178. doi:10.1371/journal.pone.0005178

[2] Ivorra A, Rubinsky B (2007) In vivo electrical impedance measurements during and after electroporation of rat liver. Bioelectrochemistry 70: 287–295.

Playing with the $100K Robots for Biology Automation

Posted by – June 26, 2009

The Tecan Genesis Workstation 200: It’s an industrial benchtop robot for liquid handling with multiple arms for tray handling and pipetting.

The robot’s operations are complex, so an integrated development environment is used to program it (though biologists wouldn’t call it an integrated development environment; maybe they’d call it a scripting application?), with custom graphical scripting language (GUI-based) and script verification/compilation. Luckily though, the application allows third party software access and has the ability to control the robotics hardware using a minimal command set. So what to do? Hack it, of course; in this case, with Perl. This is only a headache due to Microsoft Windows incompatibilities & limitations — rarely is anything on Windows as straightforward as Unix — so as usual with Microsoft Windows software, it took about three times longer than normal to figure out Microsoft’s quirks. Give me OS/X (a real Unix) any day. Now, on to the source code!

More

Don’t Train the Biology Robot: Have the Machine Read the Protocol and Automate Itself

Posted by – June 3, 2009

Imagine reading these kinds of instructions and performing such a task for a few hours: “Resuspend pelleted bacterial cells in 250 µl Buffer P1 and transfer to a micro-centrifuge tube. Ensure that RNase A has been added to Buffer P1. No cell clumps should be visible after resuspension of the pellet. If LyseBlue reagent has been added to Buffer P1, vigorously shake the buffer bottle to ensure LyseBlue particles are completely dissolved. The bacteria should be resuspended completely by vortexing or pipetting up and down until no cell clumps remain. Add 250 µl Buffer P2 and mix thoroughly by inverting the tube 4–6 times. Mix gently by inverting the tube. Do not vortex, as this will result in…” (The protocol examples used here are from Qiagen’s Miniprep kit, QIAPrep.)

Wait a minute!  Isn’t that what robots are for?  Unfortunately, programming a bioscience robot to do a task might take half a day or a full day (or more, if it hasn’t been calibrated recently, or needs some equipment moved around).   If this task has to be performed 100 or 10,000 times then it is a good idea to use a robot.  If it only has to be done twice or 10 times, it may be more trouble than it’s worth.  Is there a middle ground here?

If regular English-language biology protocols could be fed directly into a machine, and the machine could learn what to do on it’s own, wouldn’t that be great?  What if these biology protocols could be downloaded from the web, from a site like protocol-online.org ?   It’s possible! (Within the limited range of tasks that are required in a biology lab, and the limited range of language expected in a biology protocol.)

Biology Protocol Lexical Analyzer converts biology protocols to machine code for a robot or microfluidic system to carry out

Biology Protocol Lexical Analyzer converts biology protocols to machine code for a robot or microfluidic system to carry out

The point of this prototype project is this: there are thousands of biology protocols in existence, and biologists won’t quickly transition to learning enough engineering to write automated language themselves (and it is also more effort than should be necessary to use a “easy-to-use GUI” for training a robot). The computer itself should be used to bridge the language gap. Microfluidics automation platforms (Lab on Chip) may be able to carry out the bulk of busy work without excessive “training” required.

More

Apple iPhone 3.0 as next generation Biomedical device

Posted by – March 17, 2009

Apple’s developer preview today, of iPhone 3.0 software, included the interesting news of support for external accessories, either connected through the physical docking connector or through Bluetooth wireless.


A spokesman from Johnson & Johnson announced an iPhone-blood-pressure-monitor accessory, which provides health biometrics and allows the biometrics to be sent over the iPhone’s network connection as an emergency alert.  Their goal is to make diabetes monitoring easier.

The details of the new iPhone interface are in a thin draft document, External Accessory Framework Reference. This doesn’t include the hardware details necessary to connect arbitrary devices, though once it does, I’ll be hooking lots of different devices to the “iPhone-smart-phone-turned-general-purpose-minicomputer”.

I’m sure the game companies already have external joysticks in the works. A recent interview with Pangea software owner revealed their earnings of $1.5 million from downloads of a single iPhone game (Enigmo), with over 800,000 downloads. His biggest complaint: “no D-pad game controller.” Rest assured, that will be solved soon.

Games aside, the iPhone (or iTouch) offers a solid software environment which includes graphical presentation, ease of data entry, network support, wireless roaming, audio support, and now external device data accessories. This is exactly the kind of tool that medical and bioscience needs to help with a deluge of patients.

Stanford University: Programmable Microfluidics (2007) – Video

Posted by – March 2, 2009

October 3, 2007 lecture by Bill Thies for the Stanford University Computer Systems Colloquium (EE 380). Bill Thies provides an overview of microfluidic technologies from a computer science perspective, highlight areas in the which computer science researchers can contribute to this field; he will also describe recent work in developing new architectures, programming languages, and CAD tools for the microfluidic domain.



EE 380 | Computer Systems Colloquium:
http://www.stanford.edu/class/ee380/

Play Fold.it, the “Tetris-On-Steroids” game that solves protein folding

Posted by – January 29, 2009

“Protein folding” is what again?

It’s this: Foldit (curiously, at the web address: “fold.it”).  And it’s fun to play.  Addictive, really.  Check out the picture:

After I had been playing a while, my 8-year old niece came over to my laptop to see what the cute sound-effects were all about.  After a minute of watching, she said:  “Tell me the web site, I want to play too!”   Yeah, no kidding.

More

Computational Biology for Discovering Protein Function – as of 2008

Posted by – November 12, 2008

“The vast majority of known proteins have not yet been characterized experimentally, and there is very little that is known about their function.” [1]

A paper just published (Nov 2008) in PLoS Computational Biology describes the fundamental problem of proteins in biology.  It is “the dogma” that the DNA sequence is transcribed and translated to protein sequence; related to this, the protein sequence dictates a protein structure, and the protein’s structure (physical shape) dictates the protein’s function.  Want something biological to work?  Find the right protein shape that, for example, physically fits to the drug.

Except there are big problems here:  the structure can’t be easy predicted from the sequence, similar sequences can have vastly different structures (homologous -vs- non-homologous), some proteins have multiple functions in different environments even with the same structure, similar structures can have vastly different functions, and the different functions are often related to very small changes in structure!  Meanwhile, the catalog of proteins is growing every day, and the lab’s can’t keep up with experiments which highlight a protein’s function. So what’s a bioengineer to do?

The article, The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function, is a good summary of the issue.

  1. “the most common way to infer homology is by detecting sequence similarity”
  2. Sequence similarity is usually done with sequence alignment.
  3. “homology (both orthology and paralogy) does not guarantee conservation of function”
  4. “databases contain incorrect annotations, mostly caused by erroneous automatic annotation transfer by homology”
  5. “homology between two proteins does not guarantee that they have the same function, not even when sequence similarity is very high”
  6. “a relatively small sequence signature may suffice to conserve the function of a protein even if the rest of the protein has changed considerably”
  7. “Residues that have similar function in different proteins are likely to possess similar physicochemical characteristics.”

Biology research also currently can’t figure out why nature has employed some proteins but not others, even when it has been experimentally verified that artifically created (synthsized) proteins can be substituted into natural processes successfully [2].  So if no rules dictate the mapping between protein and function, then how can functions or structures be reliably predicted?

The current shotgun method is to use multiple methods of determining similarity (sequence, structure, binding sites).  Although not mentioned in [1], I assumed electrostatic mapping might also be used, though maybe this hasn’t provided useful results.

[1] Punta M, Ofran Y 2008 The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function. PLoS Computational Biology 4(10): e1000160 doi:10.1371/journal.pcbi.1000160
[2] Chiarabelli, C.; De Lucrezia, D., Question 3: The Worlds of the Prebiotic and Never Born Proteins, Origins of Life and Evolution of Biospheres 2007, 37, 357-361.  See also, The Emergence of Life: From Chemical Origins to Synthetic Biology by Pier Luigi Luisi.

In-Depth Review, Part 3 of 5: “Beginning Perl for Bioinformatics” by James Tisdall

Posted by – November 3, 2008

In my previous write-ups of Part 1 and Part 2, I traced the Perl code and examples in the first half of the book, Beginning Perl for Bioinformatics, by James Tisdall, highlighting different approaches to bioinformatics in Perl.  As I mentioned before, Perl provides many different (and often stylistic) methods to solving a software problem.  The different methods usually differ in execution speed, code size, code scalability, readability / maintainability, simplicity, and advanced Perl symantics.  Since this is a beginning text, the advanced Perl isn’t covered.. that means templates, which could be useful for parsing bioinformatics data, are one of the topics not included here.

Often, the fastest code is the smallest code, and contains subtle code tricks for optimization. This is a perfect setup, because, in Chapter 8, Tisdall starts parsing FASTA files.  With Perl’s parsing engine, the subtly of the tricks leaves a lot of room for optimizing software.

FASTA & Sequence Translation

Tisdall offers a software problem based on the FASTA data, so time to solve it:

Tisdall: When you try to print the “raw” sequence data, it can be a problem if the data is much longer than the width of the page. For most practical purposes, 80 characters is about the maximum length you should try to fit across a page. Let’s write a print_sequence subroutine that takes as its arguments some sequence and a line length and prints out the sequence, breaking it up into lines of that length.

Compare his solution to mine:

# Solution by Tisdall
# print_sequence
#
# A subroutine to format and print sequence data

sub print_sequence {

    my($sequence, $length) = @_;

    use strict;
    use warnings;

    # Print sequence in lines of $length
    for ( my $pos = 0 ; $pos < length($sequence) ; $pos += $length ) {
        print substr($sequence, $pos, $length), "\n";
    }
}

The above is a straightforward, strings-based approach. I chose a regex approach, which took a couple minutes to work out, though should be faster during run-time:

sub dna_print {
  my $str = $_[0];
  do {
    $str =~ s/^([\w]{0,25})//;
    print "$1\n";
  } until (!length($str));
}

The above relies on the following method:

More

“SynBioSS: The Synthetic Biology Modeling Suite”

Posted by – October 20, 2008

SynBioSS (Synthetic Biology Software Suite) is a suite of software for the modeling and simulation of synthetic genetic constructs. SynBioSS utilizes the registry of standard biological parts, a database of kinetic parameters, and both graphical and command-line interfaces to multiscale simulation algorithms. SynBioSS is available under the GNU General Public License. Anthony D. Hill, Jonathan R. Tomshine, Emma M. B. Weeding, Vassilios Sotiropoulos, and Yiannis N. Kaznessis, Bioinformatics 2008 24(21):2551-2553; doi:10.1093/bioinformatics/btn468

Sounds neat, let’s try it. Interestingly, the iGEM participants and biologists, in discussions of modeling, have thrown their hands in the air & state that it is difficult or impossible to model biology. Maybe SynBioSS can do the impossible?  Except: There is no specific installer available for OS/X (as of this writing) and it seems there are many assorted packages required.

Here are my install summary/notes/fixes for getting SynBioSS (version 1.0.1) running on OS/X (Leopard 10.5.5):
More

In-Depth Review, Part 2 of 5: “Beginning Perl for Bioinformatics” by James Tisdall

Posted by – September 8, 2008

My Part 1 of 5 review of the book, Beginning Perl for Bioinformatics, by James Tisdall, left off at Chapter 8, just before Tisdall explains associative arrays, gene expression, FASTA files, genomic databases, and restriction sites.

Tisdall: “For simplicity, let’s say you have the names for all the genes in the organism and a number for the expressed genes indicating the level of the expression in your experiment; the unexpressed genes have the number 0. Now let’s suppose you want to know if the genes were expressed, but not the expression levels, and you want to solve this programming problem using arrays. After all, you are somewhat familiar with arrays by this point. How do you proceed?”

Perl’s associative arrays are one of the most powerful aspects of the language.  This is a good problem to examine using hashes.  Solutions to this kind of problem in other languages (C or matlab) might create an N-dimensional array (or even NxM) as a matrix representation of the problem.  In C, it might be solved using a lookup table possibly using a linked list, and the code to drive that needs to be written from scratch or borrowed from an external library.  Perl has a built-in method to solve these kinds of problems.

The solution is to use a hash:

$gene_name = "triA";
$level = 10;
$expression_levels{$gene_name} = $level;  # save 'level' on per-gene basis

This leads Tisdall to review biological transcription and translation, including code for DNA->RNA and RNA->protein data conversion.  The code is given in long form and then optimized in further examples for speed using associative arrays.  Recall the central dogma of biology:

More

In-Depth Review, Part 1 of 5: “Beginning Perl for Bioinformatics” by James Tisdall

Posted by – September 6, 2008

As a specialized field, Bioinformatics is rather young.  It can be difficult to find universities which teach bioinformatics.  Bioinformatics can refer to many different types of tasks — from using programs and data without any computer science knowledge, to implementing database or web software, to writing data conversion programs which modify file formats between database storage methods, to writing algorithms for modeling and visualizing research problems.  Most of the work is described best as “computational biology”.

In the context of Perl (the famous computer language which runs underneath most web pages), Bioinformatics means computing text data retreived from biological databases.

The book, Beginning Perl for Bioinformatics, by James Tisdall, is for learning introductory software techniques in Perl, with a very brief biology review.  For biologists who have rarely programmed and need a starting language or need to learn Perl, this is a good place to start.  For technologists, note the copyright date on the book, to see how dated the information may be; since bioinformatics is still a young field, standards and technology are evolving rapidly.

Tisdall: “A large part of what you, the Perl bioinformatics programmer, will spend your time doing amounts to variations on the same theme as Examples 4-1 and 4-2. You’ll get some data, be it DNA, proteins, GenBank entries, or what have you; you’ll manipulate the data; and you’ll print out some results.”   (Chapter 4)

For software engineers or computer programmers, the biology field is also a completely new realm which is tough to get a handle on, and has it’s own language: Biology as a field (at least to me) has not yet differentiated itself between “soft, life science” and an engineering science.  For example, as a software engineer, the most basic software question is, “I need to write a look-up table for these elements, what are the all the possible strings for the field values?”  Yet this simple question can be very difficult to answer by consulting a biology textbook.  It is important to keep in mind that data manipulation for biology can involve massive amounts of information: also known as, very, very large strings; the strings represent DNA sequences which may range in practical usage from 10k to 100k.

Perl Bioinformatics Introductory Examples

The author states,

Tisdall: How do you start from scratch and come up with a program that counts the regulatory elements in some DNA? Read on.”

In chapter 4, there are the first simple Perl examples:  convert the DNA sequence to the corresponding RNA sequence.  In biology, the DNA uses A, T, G, C (representing the chemical names, of course); whereas RNA uses U instead of T.  Simple string manipulation provides the answer:  s/T/U/g;

More