Category: Databases

Microbe Wiki

Posted by – November 9, 2007

MicrobeWiki is a free wiki resource on microbes and microbiology, edited by students and monitored by microbiologists at Kenyon College.

MicrobeWiki includes these modules:

Plasmid Information Database

Posted by – November 9, 2007

Look up plasmids for wet lab work and get them delivered right to your door…

Harvard Plasmid Information Database

Plasmid Information Database (PlasmID) is a central repository for plasmid clone collections and distribution based at the Harvard Institute of Proteomics (HIP), which also hosts the Protein Structure Initiative (PSI) Material Repository.

To search for plasmids, go to Plasmid Request or find a list of the collections at

The plasmid repository was established in 2004 at the Harvard Institute of Proteomics (HIP) at Harvard Medical School.  Our repository holds collections of sequenced-verified open reading frame (ORF) clones made by HIP researchers, all clones made by researchers as part of the Protein Structure Initiative (PSI, see, and various clones from other researchers.

There are many ways to search for plasmids on PlasmID.  You can search by specific terms, such as vector, insert name, CloneID, depositor, TargetDB ID, PDB ID, protein expression, solubility or purification.

The most flexible search tool at PlasmID is the Advanced Text Search tool. There, you can use gene names and synonyms, author names, the part or full name of a vector, species name, PDB ID, TargetDB/PepcDB ID or combinations to search plasmids. It is also helpful to use “official” gene names and identifiers as they appear in databases such as NCBI Entrez Gene or organism-specific databases like SGD or FlyBase. Try “view empty vectors” to see a list of vectors that take inserts, act as helper vectors, etc. You can enter more than one ID in a field at “search by gene” to search multiple genes at one time.

Protein Database, Swiss-Prot / Uni-prot

Posted by – November 8, 2007

Protein knowledgebase

The UniProt Knowledgebase consists of:

  • UniProtKB/Swiss-Prot; a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases [More details / References / Linking to Swiss-Prot / User manual / Recent changes / Disclaimer].
  • UniProtKB/TrEMBL; a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot.

These databases are developed by the Swiss-Prot groups at SIB and at EBI.

Cyanosite, Cyanobase, Marine Genome Project for Cyanobacteria

Posted by – November 6, 2007


Cyanosite is dedicated to information transfer within the cyanobacterial research community. This site will work to maintain archives of experimental protocols, taxonomic information, comprehensive bibliographic information, educational resources for college and secondary school teachers, general information about blue-green algae, and links to other cyanobacterial, prochlorophyte, and cyanelle sites on the web.

Marine Picocyanobacteria Genome Project

The Marine Picocyanobacteria Genome Project is an international initiative for sequencing the small cyanobacteria.  A web interface, “Cyanorak”, has been developed by Dr. A. Dufresne in order to retrieve and annotate the clusters of orthologous proteins common to these 11 Synechococcus genomes as well as the first thee published Prochlorococcus genomes, P. marinus SS120 (Dufresne et al., 2003) and MED4 and Prochlorococcus sp. MIT9313 (Rocap et al., 2003). A read-only version is accessible at


CyanoBase (and the New CyanoBase) provides an easy way of accessing the sequences and all-inclusive annotation data on the structures of the cyanobacterial genomes. This database was originally developed by Makoto Hirosawa, Takakazu Kaneko and Satoshi Tabata, and the current version of CyanoBase has been developed and maintained by Yasukazu Nakamura, Takakazu Kaneko, and Satoshi Tabata at Kazusa DNA Research Institute.

EcoliWiki – everything related to E. coli K-12, its phages, plasmids, and mobile genetic elements.

Posted by – November 6, 2007

Escherichia coli has a wiki !


EcoliWiki is for anyone who studies E. coli or who wants to know what’s known about E. coli, which is one of the best studied model organisms.

About EcoliHub

Sixty years of research have made Escherichia coli K-12 the most deeply understood organism at the molecular level. Much of what we know about cellular processes can be traced to fundamental discoveries in E. coli. In spite of its importance as a model organism, information about E. coli is distributed among many online resources. EcoliHub is being developed to make information now housed in multiple information resources, databases, and websites readily available to our user community.

EcoliHub Databases

EcoliHub Databases are largely or wholly supported by the EcoliHub award, including:

EcoliLiterature is being developed as a comprehensive database of all articles, book chapters, and books with basic information on E. coli, its phages, and plasmids (EcoliHub Core Project).

EcoliPredict is a comprehensive database of computationally predicted and experimentally determined structures of proteins encoded by E. coli K-12 (Project Leader: Daisuke Kihara, Purdue).

EcoliWiki is being developed as a community annotation system for EcoliHub. The goal is to create community-based pages about E. coli K-12, its phages, plasmids, and mobile genetic elements for community annotation at Texas A&M University (Project Leader: James C. Hu, TAMU).

GenExpDB is a comprehensive database of publicly deposited E. coli gene expression data, which is hosted at the University of Oklahoma (Project Leader: Tyrrell Conway, OU ).

GenoBase is a E. coli database developed at the Nara Institute of Science and Technology (NAIST) in Japan that contains a wealth of information on comprehensive resources such as the Keio single-gene knockout collection, ASKA ORFeome clone set, and results from high-throughput and systematic experimentation. GenoBase is now being further developed at EcoliHub (Project Leader: Hirotada Mori, NAIST).

Participating Databases

Participating Databases actively collaborate with EcoliHub. While these database providers are independently supported, they may receive support from EcoliHub for specific joint projects.

EcoCyc is a professionally curated encyclopedic source of information on the genome, metabolic pathways, and regulatory network of E. coli K-12 at SRI international (Peter D. Karp, principal investigator (PI)).

EcoGene is a knowledgebase derived from extensive literature surveys and bioinformatics research that document the functions of DNA, protein and RNA in E. coli K-12 at the University of Miami (Kenneth E. Rudd, PI).

RegulonDB is the source of highly curated knowledge on regulation of transcription initiation, operon organization, and regulatory networks in E. coli at the University of Mexico, Cuernavaca (UNAM; Julio Collado-Vides, PI).

The Restriction Enzyme Database

Posted by – November 6, 2007


The Restriction Enzyme Database – Restriction Enzyme   data BASE

A collection of information about restriction enzymes and related proteins. It contains published and unpublished references, recognition and cleavage sites, isoschizomers, commercial availability, methylation sensitivity, crystal, genome, and sequence data. DNA methyltransferases, homing endonucleases, nicking enzymes, specificity subunits and control proteins are also included. Putative DNA methyltransferases and restriction enzymes, as predicted from analysis of genomic sequences, are also listed. REBASE is updated daily and is constantly expanding.

The Restriction Enzyme Database

The four basic types of restriction systems (I-IV)

Type I

The key characteristics of the Type I R-M systems are that these enzymes are multisubunit proteins that function as a single protein complex and usually contain two R subunits, two M subunits, and one S subunit. After locating their recognition site they serve as molecular motors to translocate DNA until a collision occurs that triggers cleavage. The resulting fragments thus tend to be fairly random.

Type II

The Type II restriction systems typically contain individual restriction enzymes and modification enzymes encoded by separate genes. The Type II restriction enzymes typically recognize specific DNA sequences and cleave at constant positions at or close to that sequence to produce 5-phosphates and 3-hydroxyls. Usually they require Mg2+ ions as a cofactor, although some have more exotic requirements. The methyltransferases usually recognize the same sequence although some are more promiscuous. Three types of DNA methyltransferases have been found as part of Type II R-M systems forming either C5-methylcytosine, N4-methylcytosine or N6-methyladenine.

Type III

These systems are composed of two genes (mod and res) encoding protein subunits that function either in DNA recognition and modification (Mod) or restriction (Res). Both subunits are required for restriction, which also has an absolute requirement for ATP hydrolysis. For DNA cleavage, the enzyme must interact with two copies of a non-palindromic recognition sequence and the sites must be in an inverse orientation in the substrate DNA molecule. Cleavage is preceded by ATP-dependent DNA translocation as with the Type I REases. The enzymes cleave at a specific distance away from one of the two copies of their recognition sequence. The Mod subunit can function independently of the Res subunit to methylate DNA: in all known cases the methylated base formed is N6-methyladenine and full modification is actually hemi-methylation.

Type IV

These systems are composed of one or two genes encoding proteins that cleave only modified DNA, including methylated, hydroxymethylated and glucosyl-hydroxymethylated bases. Their recognition sequences have usually not been well defined except for EcoKMcrBC, which recognizes two dinucleotides of the general form RmC (a purine followed by a methylated cytosine either m4C or m5C) and which are separated by anywhere from 40-3000 bases. Cleavage takes place approximately 30 bp away from one of the sites.

The standard nomenclature for restriction enzymes, DNA methyltransferases and related proteins can be found in Roberts et al. 2005 Nucl. Acids Res. 31: 1805-1812 (REBASE ref 7998).


REBASE Recognition sequences representations use the standard abbreviations
(Eur. J. Biochem. 150: 1-5, 1985) to represent ambiguity:

                        R = G or A
                        Y = C or T
                        M = A or C
                        K = G or T
                        S = G or C
                        W = A or T
                        B = not A (C or G or T)
                        D = not C (A or G or T)
                        H = not G (A or C or T)
                        V = not T (A or C or G)
                        N = A or C or G or T

These are written from 5′ to 3′, when only one strand is shown.

Typically, the recognition sequences are oriented so that the cleavage sites lie on their 3′ side.


Homing endonucleases do not really have recognition sequences in the way that restriction enzymes do. The recognition sequence listed is one site that is known to be recognized and cleaved. In general, single base changes merely change the efficiency of cleavage and the precise boundary of required bases is not known.

For putative enzymes, the recognition sequences are predicted.

Protein Data Bank

Posted by – November 5, 2007

Search for protein information on the PDB!

The RCSB PDB provides a variety of tools and resources for studying the structures of biological macromolecules and their relationships to sequence, function, and disease.

The RCSB is a member of the wwPDB whose mission is to ensure that the PDB archive remains an international resource with uniform data.

This site offers tools for browsing, searching, and reporting that utilize the data resulting from ongoing efforts to create a more consistent and comprehensive archive.

The RCSB PDB is supported by funds from the National Science Foundation (NSF), the National Institute of General Medical Sciences (NIGMS), the Office of Science, Department of Energy (DOE), the National Library of Medicine (NLM), the National Cancer Institute (NCI), the National Center for Research Resources (NCRR), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the National Institute of Neurological Disorders and Stroke (NINDS), and the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK).

The RCSB PDB Advisory Committee, an international team of experts in X-ray crystallography, cryoEM, NMR, bioinformatics and education, provides feedback and advise on an ongoing basis.

Data files contained in the PDB archive ( are free of all copyright restrictions and made fully and freely available for both non-commercial and commercial use.