Tag: Bioinformatics

Perl Bio-Robotics module, Robotics.pm and Robotics::Tecan

Posted by – July 30, 2009

FYI for Bioperl developers:

I am developing a module for communication with biology robotics, as discussed recently on #bioperl, and I invite your comments. Currently this module talks to a Tecan genesis workstation robot. Other vendors are Beckman Biomek, Agilent, etc. No such modules exist anywhere on the ‘net with the exception of some visual basic and labview scripts which I have found. There are some computational biologists who program for robots via high level s/w, but these scripts are not distributed as OSS.

With Tecan, there is a datapipe interface for hardware communication, as an added $$ option from the vendor. I haven’t checked other vendors to see if they likewise have an open communication path for third party software. By allowing third-party communication, then naturally the next step is to create a socket client-server; especially as the robot vendor only support MS Win and using the local machine has typical Microsoft issues (like losing real time communication with the hardware due to GUI animation, bad operating system stability, no unix except cygwin, etc).

On Namespace:

I have chosen Robotics and Robotics::Tecan. (After discussion regarding the potential name of Bio::Robotics.)  There are many s/w modules already called ‘robots’ (web spider robots, chat bots, www automate, etc) so I chose the longer name “robotics” to differentiate this module as manipulating real hardware. Robotics is the abstraction for generic robotics and Robotics::(vendor) is the manufacturer-specific implementation. Robot control is made more complex due to the very configurable nature of the work table (placement of equipment, type of equipment, type of attached arm, etc). The abstraction has to be careful not to generalize or assume too much. In some cases, the Robotics modules may expand to arbitrary equipment such as thermocyclers, tray holders, imagers, etc – that could be a future roadmap plan.

Here is some theoretical example usage below, subject to change. At this time I am deciding how much state to keep within the Perl module. By keeping state, some robot programming might be simplified (avoiding deadlock or tracking tips). In general I am aiming for a more “protocol friendly” method implementation.

To use this software with locally-connected robotics hardware:

    use Robotics;
    use Robotics::Tecan;

    my %hardware = Robotics::query();
    if ($hardware{"Tecan-Genesis"} eq "ok") {
    	print "Found locally-connected Tecan Genesis robotics!\n";
    }
    elsif ($hardware{"Tecan-Genesis"} eq "busy") {
    	print "Found locally-connected Tecan Genesis robotics but it is busy moving!\n";
    	exit -2;
    }
    else {
    	print "No robotics hardware connected\n";
    	exit -3;
    }
    my $tecan = Robotics->new("Tecan") || die;
    $tecan->attach() || die;    # initiate communications
    $tecan->home("roma0");      # move robotics arm
    $tecan->move("roma0", "platestack", "e");    # move robotics arm to vector's end
    # TBD $tecan->fetch_tips($tip, $tip_rack);   # move liquid handling arm to get tips
    # TBD $tecan->liquid_move($aspiratevol, $dispensevol, $from, $to);
    ...

To use this software with remote robotics hardware over the network:

  # On the local machine, run:
    use Robotics;
    use Robotics::Tecan;

    my @connected_hardware = Robotics->query();
    my $tecan = Robotics->new("Tecan") || die "no tecan found in @connected_hardware\n";
    $tecan->attach() || die;
    # TBD $tecan->configure("my work table configuration file") || die;
    # Run the server and process commands
    while (1) {
      $error = $tecan->server(passwordplaintext => "0xd290"); # start the server
      # Internally runs communications between client->server->robotics
      if ($tecan->lastClientCommand() =~ /^shutdown/) {
        last;

    }
    $tecan->detach();   # stop server, end robotics communciations
    exit(0);

  # On the remote machine (the client), run:
    use Robotics;
    use Robotics::Tecan;

    my $server = "heavybio.dyndns.org:8080";
    my $password = "0xd290";
    my $tecan = Robotics->new("Tecan");
    $tecan->connect($server, $mypassword) || die;
    $tecan->home();

    ... same as first example with communication automatically routing over network ...
    $tecan->detach();   # end communications
    exit(0);

Some notes for those who may also want to create Perl modules for general or BioPerl use:

  • Use search.cpan.org to get Module-Starter
  • Run Module-Starter to create new module from module template
  • Read Module::Build::Authoring
  • Read Bioperl guide for authoring new modules
  • Copy/write perl code into the new module
  • Add POD, perl documentation
  • Add unit tests into the new module
  • Register for CPAN account (see CPAN wiki), register namespace
  • Verify all files are in standard CPAN directory structure
  • Commit & Release

3G Cellphone as Biotech Tool: “Cellular Phone Enabled Non-Invasive Tissue Classifier”

Posted by – July 5, 2009

A recent paper in PLoS ONE describes a diagnostic system which uses a common 3G cellphone with bluetooth to assist in point-of-care measurement of tissues, from tissue samples previously taken, with remote data analysis [1].  The hope, of course, is that this could be used for detecting cancer tissue vs. non-cancer tissue.  In general this technological approach is important for the following reasons: it allows data analysis across large populations with server-side storage of the data for later refinement; not all towns or cities will have expert medical staff to classify tissues at a hospital; and sending the sample to another city for classification takes time and creates measurement risk (mishandling, contamination, data entry error, biological degredation, etc).  Since the tissues are measured by a digital networked device, the results can be quickly sent to a central database for further analysis, or as I hint below, for geographically mapping medical data for bioinformatics.

From my interpretation, the complete system looks like this:

The probe electronics are described in [2]; unfortunately that article is not open access, so I can’t read it.  The probes located around the sample are switched to conduct in various patterns and a learning algorithm is used to isolate the probe pair with the optimal signal.  The sample is placed at the center of the petri dish and covered in saline.

Sending the raw data to a central server for analysis allows for complex pattern recognition across all samples collected; thus, the data analysis and the result can improve over time (better fitting algorithms or better weighting in the same algorithm).  The impedance analysis fits according to the magnitude, phase, frequency, and the probe pair.

The article does not explain the technologies used with the cell phone for communicating between the measurement side and the cellular side (USB / Bluetooth communication link, Java, E-mail application link, etc).  Though these technologies are cellphone specific, it is part of the method, and it is not described.  The iPhone would be a good candidate for this project as well.  A cellphone with integrated GPS would allow for location data to be sent to the server, which may be able to provide better number-crunching in the data processing algorithms, for recognition of geographic regions with high risk.

References:
[1] Laufer S, Rubinsky B, 2009 Cellular Phone Enabled Non-Invasive Tissue Classifier. PLoS ONE 4(4): e5178. doi:10.1371/journal.pone.0005178

[2] Ivorra A, Rubinsky B (2007) In vivo electrical impedance measurements during and after electroporation of rat liver. Bioelectrochemistry 70: 287–295.

Playing with the $100K Robots for Biology Automation

Posted by – June 26, 2009

The Tecan Genesis Workstation 200: It’s an industrial benchtop robot for liquid handling with multiple arms for tray handling and pipetting.

The robot’s operations are complex, so an integrated development environment is used to program it (though biologists wouldn’t call it an integrated development environment; maybe they’d call it a scripting application?), with custom graphical scripting language (GUI-based) and script verification/compilation. Luckily though, the application allows third party software access and has the ability to control the robotics hardware using a minimal command set. So what to do? Hack it, of course; in this case, with Perl. This is only a headache due to Microsoft Windows incompatibilities & limitations — rarely is anything on Windows as straightforward as Unix — so as usual with Microsoft Windows software, it took about three times longer than normal to figure out Microsoft’s quirks. Give me OS/X (a real Unix) any day. Now, on to the source code!

More

Don’t Train the Biology Robot: Have the Machine Read the Protocol and Automate Itself

Posted by – June 3, 2009

Imagine reading these kinds of instructions and performing such a task for a few hours: “Resuspend pelleted bacterial cells in 250 µl Buffer P1 and transfer to a micro-centrifuge tube. Ensure that RNase A has been added to Buffer P1. No cell clumps should be visible after resuspension of the pellet. If LyseBlue reagent has been added to Buffer P1, vigorously shake the buffer bottle to ensure LyseBlue particles are completely dissolved. The bacteria should be resuspended completely by vortexing or pipetting up and down until no cell clumps remain. Add 250 µl Buffer P2 and mix thoroughly by inverting the tube 4–6 times. Mix gently by inverting the tube. Do not vortex, as this will result in…” (The protocol examples used here are from Qiagen’s Miniprep kit, QIAPrep.)

Wait a minute!  Isn’t that what robots are for?  Unfortunately, programming a bioscience robot to do a task might take half a day or a full day (or more, if it hasn’t been calibrated recently, or needs some equipment moved around).   If this task has to be performed 100 or 10,000 times then it is a good idea to use a robot.  If it only has to be done twice or 10 times, it may be more trouble than it’s worth.  Is there a middle ground here?

If regular English-language biology protocols could be fed directly into a machine, and the machine could learn what to do on it’s own, wouldn’t that be great?  What if these biology protocols could be downloaded from the web, from a site like protocol-online.org ?   It’s possible! (Within the limited range of tasks that are required in a biology lab, and the limited range of language expected in a biology protocol.)

Biology Protocol Lexical Analyzer converts biology protocols to machine code for a robot or microfluidic system to carry out

Biology Protocol Lexical Analyzer converts biology protocols to machine code for a robot or microfluidic system to carry out

The point of this prototype project is this: there are thousands of biology protocols in existence, and biologists won’t quickly transition to learning enough engineering to write automated language themselves (and it is also more effort than should be necessary to use a “easy-to-use GUI” for training a robot). The computer itself should be used to bridge the language gap. Microfluidics automation platforms (Lab on Chip) may be able to carry out the bulk of busy work without excessive “training” required.

More

Play Fold.it, the “Tetris-On-Steroids” game that solves protein folding

Posted by – January 29, 2009

“Protein folding” is what again?

It’s this: Foldit (curiously, at the web address: “fold.it”).  And it’s fun to play.  Addictive, really.  Check out the picture:

After I had been playing a while, my 8-year old niece came over to my laptop to see what the cute sound-effects were all about.  After a minute of watching, she said:  “Tell me the web site, I want to play too!”   Yeah, no kidding.

More

In-Depth Review, Part 3 of 5: “Beginning Perl for Bioinformatics” by James Tisdall

Posted by – November 3, 2008

In my previous write-ups of Part 1 and Part 2, I traced the Perl code and examples in the first half of the book, Beginning Perl for Bioinformatics, by James Tisdall, highlighting different approaches to bioinformatics in Perl.  As I mentioned before, Perl provides many different (and often stylistic) methods to solving a software problem.  The different methods usually differ in execution speed, code size, code scalability, readability / maintainability, simplicity, and advanced Perl symantics.  Since this is a beginning text, the advanced Perl isn’t covered.. that means templates, which could be useful for parsing bioinformatics data, are one of the topics not included here.

Often, the fastest code is the smallest code, and contains subtle code tricks for optimization. This is a perfect setup, because, in Chapter 8, Tisdall starts parsing FASTA files.  With Perl’s parsing engine, the subtly of the tricks leaves a lot of room for optimizing software.

FASTA & Sequence Translation

Tisdall offers a software problem based on the FASTA data, so time to solve it:

Tisdall: When you try to print the “raw” sequence data, it can be a problem if the data is much longer than the width of the page. For most practical purposes, 80 characters is about the maximum length you should try to fit across a page. Let’s write a print_sequence subroutine that takes as its arguments some sequence and a line length and prints out the sequence, breaking it up into lines of that length.

Compare his solution to mine:

# Solution by Tisdall
# print_sequence
#
# A subroutine to format and print sequence data

sub print_sequence {

    my($sequence, $length) = @_;

    use strict;
    use warnings;

    # Print sequence in lines of $length
    for ( my $pos = 0 ; $pos < length($sequence) ; $pos += $length ) {
        print substr($sequence, $pos, $length), "\n";
    }
}

The above is a straightforward, strings-based approach. I chose a regex approach, which took a couple minutes to work out, though should be faster during run-time:

sub dna_print {
  my $str = $_[0];
  do {
    $str =~ s/^([\w]{0,25})//;
    print "$1\n";
  } until (!length($str));
}

The above relies on the following method:

More

In-Depth Review, Part 2 of 5: “Beginning Perl for Bioinformatics” by James Tisdall

Posted by – September 8, 2008

My Part 1 of 5 review of the book, Beginning Perl for Bioinformatics, by James Tisdall, left off at Chapter 8, just before Tisdall explains associative arrays, gene expression, FASTA files, genomic databases, and restriction sites.

Tisdall: “For simplicity, let’s say you have the names for all the genes in the organism and a number for the expressed genes indicating the level of the expression in your experiment; the unexpressed genes have the number 0. Now let’s suppose you want to know if the genes were expressed, but not the expression levels, and you want to solve this programming problem using arrays. After all, you are somewhat familiar with arrays by this point. How do you proceed?”

Perl’s associative arrays are one of the most powerful aspects of the language.  This is a good problem to examine using hashes.  Solutions to this kind of problem in other languages (C or matlab) might create an N-dimensional array (or even NxM) as a matrix representation of the problem.  In C, it might be solved using a lookup table possibly using a linked list, and the code to drive that needs to be written from scratch or borrowed from an external library.  Perl has a built-in method to solve these kinds of problems.

The solution is to use a hash:

$gene_name = "triA";
$level = 10;
$expression_levels{$gene_name} = $level;  # save 'level' on per-gene basis

This leads Tisdall to review biological transcription and translation, including code for DNA->RNA and RNA->protein data conversion.  The code is given in long form and then optimized in further examples for speed using associative arrays.  Recall the central dogma of biology:

More

In-Depth Review, Part 1 of 5: “Beginning Perl for Bioinformatics” by James Tisdall

Posted by – September 6, 2008

As a specialized field, Bioinformatics is rather young.  It can be difficult to find universities which teach bioinformatics.  Bioinformatics can refer to many different types of tasks — from using programs and data without any computer science knowledge, to implementing database or web software, to writing data conversion programs which modify file formats between database storage methods, to writing algorithms for modeling and visualizing research problems.  Most of the work is described best as “computational biology”.

In the context of Perl (the famous computer language which runs underneath most web pages), Bioinformatics means computing text data retreived from biological databases.

The book, Beginning Perl for Bioinformatics, by James Tisdall, is for learning introductory software techniques in Perl, with a very brief biology review.  For biologists who have rarely programmed and need a starting language or need to learn Perl, this is a good place to start.  For technologists, note the copyright date on the book, to see how dated the information may be; since bioinformatics is still a young field, standards and technology are evolving rapidly.

Tisdall: “A large part of what you, the Perl bioinformatics programmer, will spend your time doing amounts to variations on the same theme as Examples 4-1 and 4-2. You’ll get some data, be it DNA, proteins, GenBank entries, or what have you; you’ll manipulate the data; and you’ll print out some results.”   (Chapter 4)

For software engineers or computer programmers, the biology field is also a completely new realm which is tough to get a handle on, and has it’s own language: Biology as a field (at least to me) has not yet differentiated itself between “soft, life science” and an engineering science.  For example, as a software engineer, the most basic software question is, “I need to write a look-up table for these elements, what are the all the possible strings for the field values?”  Yet this simple question can be very difficult to answer by consulting a biology textbook.  It is important to keep in mind that data manipulation for biology can involve massive amounts of information: also known as, very, very large strings; the strings represent DNA sequences which may range in practical usage from 10k to 100k.

Perl Bioinformatics Introductory Examples

The author states,

Tisdall: How do you start from scratch and come up with a program that counts the regulatory elements in some DNA? Read on.”

In chapter 4, there are the first simple Perl examples:  convert the DNA sequence to the corresponding RNA sequence.  In biology, the DNA uses A, T, G, C (representing the chemical names, of course); whereas RNA uses U instead of T.  Simple string manipulation provides the answer:  s/T/U/g;

More