Monday, April 16, 2007

broadcast of the day:

Macaque Genome Analysis Will Help Find Human Disease Genes
The rhesus macaque (Macaca mulatta) is physiologically similar to humans. Its genome was sequenced in 2005 (2.9 billion DNA base pairs). The humans and chimpanzees are so closely related(6 million years) that a comaparative genomic study is not as informative as using the macaque. The different studies involve studying the common genes between these 3 genomes, differences between the Indian and Chinese macaques (for example, Chinese macaques develop AIDS-like symptoms more slowly than Indian macaques).

Full Article
Medicinal leeches have been misclassified for centuries
Until now, the leeches were assumed to be the species Hirudo medicinalis, but new research reveals they are actually a closely related but genetically distinct species, Hirudo verbana. Wild European medicinal leeches are at least three distinct species, not one.
Full Article

Human sperm made from bone marrow

Stem cells from the bone marrow have been used to create immature sperm cells. It is expected that this research can be be used in the future to find a cure for male infertility. Currently, mature sperms have not been created. Of course, with the bans, moral, ethical issues involved in stem cell research in addition to the scientific fact that manipulating stem cells can cause lasting genetic changes that may not all be desirable, its too early to jump to any conclusions.

Wednesday, April 11, 2007

paper of the day:

"A Systems Biology Dynamical Model of Mammalian G1 Cell Cycle Progression"
Thomas Haberichter, Britta Mädge, Renee A Christopher, Naohisa Yoshioka, Anjali Dhiman, Robert Miller, Rina Gendelman, Sergej V Aksenov, Iya G Khalil1 & Steven F Dowdy

The paper describes a combined experimental and computational approach used to understand progression of the mammalian G1 cell cycle, one of the phases in mammalian cell reproduction and tumor growth.
The GNS software was used to quantitatively model the cell cycle progression and then experimentally verified using cultured cells. An excellent example to demonstrate the power of the combinatorial approach.


broadcast of the day

University of Pittsburgh School of Medicine and Children's Hospital have recently made a startling discovery. Female stem cells are more able to regenerate muscle, that is, make muscle cells than male cells.
Advantages of this finding:
- influence treatment approaches for Duchenne muscular dystrophy (genetic condition found in boys causing progressive weakening of muscles)
- maybe provide an explanation for why some therapies work better on women than men
- make scientists more aware and consider whether stem cells are collected from or injected into males/females

GLOSSARY: from Wikipedia
1. Stem cells: primal cells common to all multi-cellular organisms that retain the ability to renew themselves through cell division and can differentiate into a wide range of specialized cell types. (more on stem cells to follow in future posts)

Tuesday, April 10, 2007

broadcast of the day:

  • Genome of streptococcus sanguinis (2.4m bp) has been sequenced. This bacteria lives in healthy human mouth but can cause deadly heart infection (bacterial endocarditis) if it enters the bloodstream (through minor cut or wound). It also plays a role in formation of dental plaque.

  • Symbiosis of the fungus Rhizopus microsporus and Burkholderia bacteria that live within its cells: The two species effectively team up to break down young rice plants for their nutrients, causing a plant disease known as rice seedling blight. Latest research shows that reproduction (spore formation) of the fungus is dependent on the bacteria, which lives inside its cytoplasm.

Sunday, March 25, 2007

Software Tool: GENIUS

GENIUS: a new tool for gene networks visualization
Paolo Ciccarese, Stefano Mazzocchi, Fulvia Ferrazzi, Lucia Sacchi

Methods for gene network reconstruction based on : (Reverse engineering methods)
  • Boolean networks
  • Bayesian networks
  • Differential Equations
INPUT:
For n genes in the network, an nXn matrix such that
aij = 1 if connection between genes i and j
aij = 0 if no connection between genes i and j

GENIUS visualizes
Genes = nodes
connections = edges

Two types of visualizatiobs:

AGORA STYLE
Algorithm used assumes that every individual can be treated exactly the same.
Simulation paradigm: "PRIVATE SPACE"
This mathematical model uses a repulsive force field and a basic attractive force field.
- Repulsive force field --> Infinity
as
distace between objects --> 0
- Then
Repulsive force field rapidly decreases to 0 on a short distance.
- Attractive force field starts with 0 and increases to infinity.

The Agora view tool has been extended so that a connection between two genes is directed such that the 'from node' is the regulator and the 'to node' is the regulated gene.

TOUCHGRAPH STYLE
This view is useful to show relationships between nodes characterized by maximum level of the number of edges in the minimum-length path connecting these nodes in the graph.

This paper then examines the network visualization of cDNA microarray data set analyzed in [1] and then analyzing temporal profiles relative to the 517 genes using the Reveal algorithm described in [2].Data set available here.

Brief description of Reveal algorithm
- For every gene x,
find set of regulators(minimal set of input genes that can univocally explain behavior of output gene x)
- Based on use of Entropy and Mutual Information scores:
if for 2 genes x and y,
Mutual Information(x,y) = Entropy(x)
then
y univocally determines x

Use of Reveal in GENIUS
They extend the algorithm to include 3 discretization data levels instead of 2.
-1 : under-expression
0 : equal expression
+1 : over-expression
of serum stimulated cell genes w.r.t. expression values of same genes measured using non-stimulated cells.

179 groups(pseudo-genes) recognized and extended algorithms was applied to them.

You can check out the example given in the paper to see how the output looks.

REFERENCES
1. Iyer V. R. et al. (1999): The transcriptional program in the response of human fibroblasts to serum. Science: 283: 83-87
2. Liang S, Fuhrman S, Somogyi R. REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Pacific Symp. Biocomp. 1998: 98 (3):18-29.

Thursday, March 22, 2007

Paper summarized - Principles of microRNA regulation of a human cellular signaling network

Principles of microRNA regulation of a human cellular signaling network
Qinghua Cui, Zhenbao Yu, Enrico O Purisima and Edwin Wang

What are microRNAs?
  • ~22nucleotide long non-coding RNAs
  • responsible for RNA-based gene regulation
  • act as post transcriptional and translational regulators
  • base-pair with target mRNAs
  • ~1% of predicted genes in human genome
  • BUT Regulate 10–30% of genes
  • Targets
    • signaling proteins
    • enzymes
    • transcription factors
It is unclear if and how miRNAs might orchestrate their regulation of cellular signaling networks and how regulation of these networks might contribute to the biological functions of miRNAs.

What are signalling networks?
These make decisions about whether to grow, differentiate, move or die. Their components are Proteins. They are represented as graphs where the nodes represent the proteins and the links between the nodes represent the interactions between the proteins.

Hypothesis paper is based on
Role of miRNAs in strength and specificity of signaling networks through direct control of proteins at post-transcriptional and translational levels.

Signaling network used:
Signal transduction processes from multiple cell surface receptors to various cellular machines in a mammalian hippocampal CA1 neuron consisting of:
540 nodes
1258 links
-689 activating (positive) links
-306 inhibitory (negative) links
-263 neutral (protein interactions)

Results stated (Glossary for the terms given below)
  • MiRNAs more frequently target network downstream signaling components than ligands and cell surface receptors


  • MiRNAs preferentially target the downstream components of the adaptors, which have potential to recruit more downstream components

  • MiRNAs more frequently target positively linked network motifs

  • MiRNAs avoid targeting common components of cellular machines in the network
Glossary
  1. adaptor proteins: The function of these proteins is recruiting downstream signaling components to the vicinity of receptors. It invloves no enzyme activity – they physically interact with upstream and downstream signaling proteins
  2. network motif: A complex signaling network can be broken down into distinct regulatory patterns, or network motifs, typically comprised of three to four interacting components capable of signal processing. The function of a motif also depends on whether the links are positive or negative.
  3. scaffold proteins: Unlike adaptors, scaffold proteins do not directly activate or inhibit other proteins but provide regional organization for activation or inhibition between other proteins.
  4. functional modules: represent a set of proteins that are always present in various cellular conditions.

Tuesday, March 20, 2007

Sequence Alignment - An introduction


Sequence alignment is one of the most important and basic concepts in computational genomics. Given 2 sequences, what is the best way to arrange the letters of the sequences one below the other so that there are maximum matches between them? Putting it another way, given 2 sequences s and t, find s' and t' such that
1. by removing the gaps from s' and t', we can retrieve s & t.
2. there is no i such that s[i] and t[i] are gaps.
3. length(s') = length(t') >= max(length(s),length(t)) and <= sum(length(s),length(t)) In the above definition, s[i] means the letter in sequence s at the ith position. length(s') means length of sequence s' max(length(s),length(t)) means maximum of length of s and length of t sum(length(s),length(t)) means sum of the lengths of s and t I think things will get absolutely clear with some examples here. Mind you, I am trying to explain keeping a layman in mind. Some of the terms used might sound too technical to some while to others my explanations might seem too kiddish. I hope to attain the right balance. Let us assume we have 2 DNA sequences s = ACCT and t = ACGT. For those who dont know about DNA click here.

When we try to align these 2 sequences,

A C C T
| | : |
A C G T

The A,C,T match whereas there is a mismatch at position 3. Naturally, there are many alignments possible for any such pair. (Real data would consist of much longer sequences). The alignment chosen is the one with the highest alignment score.

The score is calculated adding the scores for the matches and mismatches(usually negative) and penalizing for gaps. For example, if match(M) = +1, mismatch(m) = -1 and gap penalty(g) = 2,

A C C - T
| | |
A C - G T

"-" are gaps. The alignment scores for the 2 alignments are

1. M+M+m+M = 1+1-1+1 = 2
2. M+M-g-g+M = 1+1-2-2+1 = -1

Clearly, the first option is preferable.

There are 2 types of alignments based on number of sequences to align:
1. Pairwise alignments
2. Multiple Sequence Alignments

There are 2 types of alignments based on parts of the sequences aligned:
1. Global alignments - align entire sequences
2. Local Alignments - align regions of the sequences

I will cover each of these types in latter posts.


References and Further Reading:
1. Bioinformatics - Sequence Analysis by David Mount
2. Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443-453

Friday, March 16, 2007

today's video

This is an amazing short video and article on


Diagnosing Alzheimer's Early

Sunday, March 11, 2007

Saturday, March 10, 2007

DNA...

I know this post has come much later but I hope it helps anyways.
So what's DNA?
Two strands of linked nucleotides with one of the four bases adenine (A), thymine
(T), guanine (G) and cytosine (C).
Every cell in an organism has chromosomes in its nucleus which is made up of DNA strands coiled together with very high density. Think of this DNA as a string of letters from an alphabet. The english alphabet has 26 letters. Similarly, the DNA alphabet has 4 letters: A,C,T,G. Just like english has sentences made of words which are made of the 26 letters, exactly the same way, DNA strands have strings made of "codons"(explained in a moment) which are made of the 4 letters.

What are these codons?
Codons are equivalent to the words in English. But English has words of different lengths . Heres where codons differ. They ONLY consist of 3 nucleotides(A,C,T,G). Each codon then after transcription (conversion of DNA to RNA) gets translated by a complex mechanism to amino acids. Again using analogies from English language, like every word has a specific meaning (stupid eg. you cant use "man" when you mean "car"), every codon can be translated only to a specific amino acid. Strings of amino acids form proteins. Again, since there are 4 nucleotides and 3 letter long codons, there are 4^3 = 64 possible codons. But there are only 20 amino acids(more about them later) thus there is some ambiguity. Like man, male, fellow refer to the same thing, many codons code for same amino acid. Apart from the 20 amino acids, there is one START codon and 3 STOP codons. The codon table is given below (taken from [1])



What is transcription?
Transcription is the process of converting DNA to RNA. RNA has the 4 letters A,C,G,U in its alphabet. More about transcription later.


Fig: from Bioinformatics: from data to biological knowledge by Dena Leshkowitz

References:
1. http://www.cs.cmu.edu/~blmt/Seminar/SeminarMaterials/?N=A

Further Reading:
1. Molecular Biology by Robert Weaver for details on transcription and translation.

Wednesday, March 07, 2007

microRNA target recognition - 2

Experimental identification of miRNA targets is not an easy task, especially using conventional tools. The principal challenge in target recognition of miRNAs is based on the small size of their targets (18-24 nucleotides (nts)). Also, every human miRNA has hundreds of targets with limited complementarity, unlike plant miRNAs. The affinity and specificity required for their recognition requires highly precise tools as the difference between a true target and a false positive might be a single base.
There has been an explosion in computational biology algorithms for human miRNA target prediction. We propose their identification and verification for human miRNAs using a combinatorial approach involving computational and molecular biology. We intend to make extensions to well-established tools and techniques to verify the miRNA targets predicted using the computational methods.
Prediction of miRNA targets provides an alternative approach to assign biological functions. This is simpler in plants due to their high complementarity and limited targets per miRNA but functional duplexes can be more variable in structure in humans [1]. Thus, we propose the use of more than one method to verify these targets.
For accurate and sensitive means to measure the expression levels of miRNAs without need for RNA size fractionation and/or RNA amplification, we intend to optimize RNA preparation protocols, as well as labeling and hybridization protocols.
A Harvard University researcher and pioneer of miRNA research, Gary Ruvkun has called miRNAs "the biological equivalent of dark matter, all around us but almost escaping detection." It has been well-established that miRNAs have a role in cancer development and tissue differentiation. They regulate almost one third of the genes in the human genome [2] although, it is still not known why miRNAs regulate some genes and not others. Some of their other functions include cell proliferation, apoptosis, oncogenesis and anti-viral defense. These previously considered “junk” RNA have implications for the treatment of cancer, diabetes and brain disorders.
The necessity to study these tiny pieces of mRNA also stems from the fact that they comprise 1% of the genes in animals and are highly conserved across the species. The understanding of miRNA function is very limited, which makes even target prediction an extremely challenging task.
Once miRNA targets are known, it might help understand complicated gene regulation, especially in gene networks. It has been shown that genes with higher cis-regulation complexity are more coordinately regulated by transacting factors at the transcriptional level and by miRNAs at the post-transcriptional level [3]. Thus, understanding the miRNA regulation pattern might fill gaps in the studies of gene networks and regulation.

References:
1. Brennecke J, Stark A, Russell RB, Cohen SM (2005) Principles of microRNA–target recognition. PLoS Biol 3(3): e85.
2. Lewis BP, Burge CB, Bartel DP (2005) Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets. Cell 120: 15–20.
3. Cui Q, Yu Z, Pan Y, Purisima EO, Wang E; MicroRNAs preferentially target the genes with high transcriptional regulation complexity; Biochem Biophys Res Commun. 2006 Nov 27

Saturday, March 03, 2007

microRNA target recognition - 1

This is part of an original proposal I came up with for one of my courses. I will be following up on it soon with another project and hopefully get some answers some day :D

This article will be posted in parts. Today, I will be giving an introduction to miRNAs.

MicroRNAs are small (~22nucleotide long) non-coding RNAs that form part of a highly conserved system of RNA-based gene regulation in eukaryotes. Mature miRNAs are found in cytoplasm where they act as post transcriptional regulators of gene expression by base-pairing with target mRNAs.

miRNAs are transcribed as regions of longer RNA molecules, that are processed in the nucleus into hairpin RNAs (70-100nt) by the dsRNA-specific ribonuclease Drosha. The hairpin RNAs are transported to the cytoplasm via an exportin-5 dependent mechanism where they are digested by a second dsRNA specific ribonuclease called Dicer. The resulting ~22mer is bound to a complex called RNA-induced Silencing Complex (RISC). RISC is responsible for RNAi. These miRNAs bind to mRNA through limited complementarity in humans and thus, cause reduced/blocked gene expression, through mechanisms not yet understood completely. miRNAs are bound to proteins that belong to the Argonaute family and, in humans, may also assemble with other proteins, including the Gemin3 and Gemin4 proteins, to form micro-ribonucleoprotein complexes[1]

The first miRNAs, lin-4 were discovered in Caenorhabditis elegans in 1993 (Lee et al. 1993) in a genetic focused on identifying genes involved in the heterochronic pathway. For almost a decade, they were considered relatively unimportant. But they have stirred much enthusiasm in the biological and medical communities since 2004 when their function of stifling the production of proteins , contrary to their close relatives, mRNA was highlighted through the work of many laboratories and their roles in brain development, HIV resistance, blood cell development, obstruction of genes causing certain types of cancer etc. were discovered.

References:
1. 9. Kiriakidou M, Nelson PT, Kouranov A, et al; A combined computational-experimental approach predicts human microRNA targets. Genes Dev. 2004 May 15;18(10):1165-78

Fig. Biogenesis of miRNAs
from http://www.gurdon.cam.ac.uk/~miskalab/research.php

Tuesday, February 27, 2007

Bioinformatics....

In case you are wondering what do Bioinformatics experts do.

Biological questions can be explored through wetlab experimental work - the traditional arena of biologists - or through modeling and simulation in virtual environments, also known as drylab research or computational biology. The later is more generally the domain of mathematicians and algorithm researchers. Of course wetlab research is used to develop better models to describe our understanding of biology, while drylab research needs to validate its' results through wetlab experimentation. Thus wet~ and drylab biology is closely related.

Bioinformatics is about improving the methods and technologies for the management and manipulation of data used by people trying to answer biological questions.

So what do I do or what do I intend to do? I intend to be a computational biologist who not only models systems computationally but also verifies those results experimentally in the laboratories. I dont think there can be a more satisfying feeling for a computational designer.

revamped...restart....

I started this blog as a hobby, posted some very basic bioinformatics posts and stopped... Well, today I am a Computational Biologist and am restarting this wonderful journey of blogging again... I want to share my knowledge, my thoughts and facts... I hope you enjoy reading this blog as much as I love writing here... Its a slow start so give me a few days to catch up and revamp this site...