Don’t Always Trust Open Source Software. Why Trust Open Source Biology?

Posted by – August 7, 2009

The software you are happily using may be.. unnecessarily brittle. Recently I’ve been developing a little bit of high-level software using open source libraries.  Sometimes it amazes me that open source software works at all.  Here’s an excerpt from the internals I found in the open source library when I looked at why it might not be working properly:

        if (Pipe){
                while(iFlag){
                        vpData = Pipe->Read(&dLen);
                        iFlag = 0;
                                //      If we have more data to read then for God's sake, do it!

                                //      I don't know if this will work ... it would return an
                                //      array. This may not be good. Hmmmm.
                        if(!vpData && GetLastError() == ERROR_MORE_DATA){
                                iFlag = 1;
                        }

                        if(dLen){
                                XPUSHs(sv_2mortal(newSVpv((char *)vpData, dLen)));
                        }else{
                                sv_setsv(ST(0), (SV*) &PL_sv_undef);
                        }
                }

The standard responses from the “open source rah-rah crowd” are something like the following:

  • “Yeah, that’s a crazy comment, but at least you can see it!  In proprietary software, there’s the same problems, it’s just hidden!”
  • “At least you’re given the source code so you can fix it!  In proprietary software, you’re never given access to the source code so you couldn’t fix it if you wanted to!”

These responses miss the big point that commercial software is often much more fully tested for it’s specific environment, and undergoes a much more rigorous design process.  (Beyond the designed-for environment, things might break.  However, the environment is usually described.)

Having something that works — even if it isn’t “great” software — is better than not having anything at all; so on the whole, we can’t complain too much.  Open source is expected to evolve, over the long term (meaning, decades), into a better system: it’s assumed that eventually, most of the oddities will be ironed out.  The Linux kernel itself contains similar comments (I’ve seen them in debugging the UDP/IP stack) which is astounding considering that non-professionals consider Linux to be “stable”.  Kernel hackers know the truth — it “mostly” works (with “mostly” being better than “nothing”)..   Next time someone offers you “free software” take a moment to think:  how much do I have to trust that software to work in a situation which may be different than the author’s original working environment?  How much of the code’s architecture might have comments such as “Hmm.. This isn’t supposed to work or might not work..”?  How much is it going to cost ($$$) to find the oddities and dig into the internals to fix them?

The connection to Biology here is that these crazy design comments like “Hmm.. It really isn’t proper design to build it this way.. but it seems to be work” in synthetic life will be too small to ever read.  (In the RNA or DNA.)   At least with open source software, there’s a big anti-warrantee statement; don’t use the software if there is liability involved.  As I posted last year, the “Open Biology License” hasn’t touched on liability issues at all — only patent issues.  How much can Open Biology be trusted, how much might it cost ($$$) to dig in to find the strange biological behavior, and attempt to fix them?   Debugging biology is much, much harder than debugging software.

7 Comments on Don’t Always Trust Open Source Software. Why Trust Open Source Biology?

  1. Cathal says:

    I’d skip the “Rah Rah” and instead point out that with commercial software, you wouldn’t be allowed to do your high-level programming as freely as you are.

    Bad software is universal, and I’ve seen far more terrible commercial solutions (usually in niche markets where alternatives don’t exist to drive competition) than open-source ones. Not to mention, an increasing number of commercial solutions are driven at their core by a bunch of open-source libraries and bundles.

    Such as, the MacOS I’m using, or the browser I’m typing on, or (chances are) even the software on your server backend, or the database software used to drive this blog.

    Don’t *depend* on unproven open-source. But it’s perfectly trustworthy; you can interrogate it when it fails, and find out why. That’s a pretty sound definition of “trust”.

    • JonathanCline says:

      Cathal – the question is how a user or developer of Open Biology “stuff” can dig down into the “bio source code” when the device fails, or to make an improvement. Interrogating biology is hard enough, and seeing a comment like “/* FIXME – This code is a workaround and might break in the future */” in a re-usable bio part is (currently) impossible. Synthetic parts can use genetic markers, though if I understand, these can go missing over time (as if the comments in code slowly disappear while the program runs). Kind of a far-futuristic-and-currently-moot point of course, since such reusable parts themselves don’t really exist yet.

  2. Cathal says:

    Well, it’s really a question of how things are going in the present; most DNA already *is* open source (in the sense that you have free access to the sequence data), and if something doesn’t work for some reason it’s usually documented in a paper or in the FASTA file that accompanies it.

    If there’s a problem and you *only* have the sourcecode, there are incredible resources available to the would-be-debugger that computer programmers might only dream about; you can blast your sequence to find similar or identical sequences in the vast DNA databases covering hundreds of species. You can align your gene with similar ones to see the differences. You can use freely available tools to optimise your sequence. You can cut and paste in different regulatory sequences to control it differently.

    And finally, if it’s supposed to be advantageous in nature to the organism you’re making it for, you can expect natural selection to debug most of the little issues for you.

    Besides, sequencing and synthesis are getting so cheap now that pretty soon you’ll be able to debug and recompile any DNA you want on the cheap. The free availability of the “source code”, though generally expected, won’t even be strictly necessary.

  3. gioby says:

    A typical response from a open-source community (a real one, not the one that you drawn here) would be: Thank you for pointing that out, can you open a bug report for that?
    Does the bug that you pointed out gives you wrong results in a scenario that we can easily reproduce? Can you tell us how do you use our software, or describe your use-cases?

    The whole point in this post is misleading because it seems you are saying that closed source code doesn’t have comments like this, and that closed code programmers don’t use ‘FIXME’ or ‘TODO’ comments :-(

    The difference between open and closed source is that if you find an error or a wrong behaviour you are able to say that, and the programmers will at least answer you and fix it without charging you for a new version. If you expose an error in the program in a public mailing list, all the users of the software will be able to read it, and if none of the core is able to fix the problem they will be at least aware of it (which is something you can’t have with proprietary software).

    • JonathanCline says:

      The code I posted is from a real open source project, though it does not have community support – which is the majority of open source software: unsupported. Even my own open source code, I don’t really support; I put it out there a decade ago in some cases, and it’s buyer beware or buyer improve, whichever is the apt description. Open source gains from having many eyeballs to find & fix bugs, in a thriving community, and often skimps on the unit testing; or, it’s a design to mimic a commercial interface, so the unit test is only obtains coverage where the commercial interface has been previously reverse-engineered.

      I use both, and develop both. Specific products have better commercial or better open source packages. The consumer thankfully has a choice in a growing number of arenas now. The choice is not always driven by cost. I will choose a commercial non-linear video editing software product (even at $1500+) way before an open source alternative – with today’s technology.

      Now what happens when there is open source bioengineering to act as an alternative to commercial bioengineering? How will you choose? In open source software, it’s possible to look at the code and choose based on design quality and test metrics (for example: compare Linux IP Stack to BSD IP Stack to Microsoft Winsock; Linux is the loser, BSD the winner with even commercial systems adopt BSD, leaving Winsock to be the required evil and avoided if at all possible). In bioengineering, is it possible to look at the code of the design? That remains to be seen, right? The NA sequence is not the code and doesn’t contain comments- it is only the binary operation. The protocols for generating the NA are often closed (unpublished or esoteric), even with open source designs. How is that going to shake out? Especially with so many drawing parallels of open source software to open source biology, often with flawed analogies, it’s an interesting question to consider. At least software can only eat my data. Biology might eat me.

      • gioby says:

        But how would you judge the quality of a DNA sequence produced with a closed protocol? How can you say that a sequence database or a public tool is good or bad if you can’t access to what the community says about that, or if you can’t see how quickly bug reports are closed?

        Just a naive question: did you write to the author of the library that you posted to tell him that it forgot to fix his code?
        It is not fair if you complain about an open source library that you are using but don’t send feedback to the author. You are using a free library, it saves you a lot of time and work and nobody is charging you, so you are at least expected to point out errors when you find them.

        This is a mentality that would be very useful if scientists would adopt it. Let’s say that there is an error in a protein-protein interactions database: an open-source minded biologist would write to the authors to tell them the error and how to fix it, while a closed-source biologist would just look at another alternative database, and the error would remain there so other scientists could fall for it.

        p.s.: I know most of the software written by biologists is crap, no matter whether it is released freely or not :-)

        • JonathanCline says:

          “How would you judge the quality of a DNA sequence produced with a closed protocol? ” – That is exactly the issue I am posing here: how to trust open source biology, given that open source software needs to be judged based on the code itself, and often the code has these obvious “FIXME: this is broken but seems to work for now” problems. I don’t have an answer for this, I am raising the question. There is a false analogy made between open biology and open software: with biology, the DNA is not the software and does not contain comments; the DNA is more like machine code, and having machine code of open source software similarly gains little. Not many experts have the time or knowledge to debug and patch binary machine code. The software used to create the DNA sequences has to also be open, and the protocol has to be open, and the means of replicating the biopart also has to be open (reagents, etc), and so on, and only then can trust begin from having many eyeballs looking at the design & implementation.

          The code snippet I posted is from an open source package that has no active maintainer and no active bug tracking system. The majority of open source software has no active maintainers and no bug tracking system. There is no way to annotate the code – it is buyer beware or buyer-fix-the-bugs or “buyer trust the code assuming it is in 100% working order.” This is why commercial products go through end-of-life cycles when they are purposely no longer sold, since a company does not want to spend resources to support maintenance costs. Companies are usually not rewarded for that in investors eyes either (investors reward risk and innovation, not so much support contracts). I could also point to comments in the Linux kernel source code that have the same issue (“FIXME This is broken”) — specifically in the IP stack — and yet that code has very active maintainers and tracking systems.

          One big innovation in the open biology realm are the PLoS journals which allow the research publications to be publicly annotated. This should provide more trust from having “many eyeballs” looking and commenting on published research. Openwetware.org has attempted to create open bioprotocols, though a wiki format still needs work as a biology collaboration tool to allow better “bug fixing” of the content. Few biologists currently trust protocols posted on Openwetware.org.