Sunday, March 25, 2007

Software Tool: GENIUS

GENIUS: a new tool for gene networks visualization
Paolo Ciccarese, Stefano Mazzocchi, Fulvia Ferrazzi, Lucia Sacchi

Methods for gene network reconstruction based on : (Reverse engineering methods)
  • Boolean networks
  • Bayesian networks
  • Differential Equations
INPUT:
For n genes in the network, an nXn matrix such that
aij = 1 if connection between genes i and j
aij = 0 if no connection between genes i and j

GENIUS visualizes
Genes = nodes
connections = edges

Two types of visualizatiobs:

AGORA STYLE
Algorithm used assumes that every individual can be treated exactly the same.
Simulation paradigm: "PRIVATE SPACE"
This mathematical model uses a repulsive force field and a basic attractive force field.
- Repulsive force field --> Infinity
as
distace between objects --> 0
- Then
Repulsive force field rapidly decreases to 0 on a short distance.
- Attractive force field starts with 0 and increases to infinity.

The Agora view tool has been extended so that a connection between two genes is directed such that the 'from node' is the regulator and the 'to node' is the regulated gene.

TOUCHGRAPH STYLE
This view is useful to show relationships between nodes characterized by maximum level of the number of edges in the minimum-length path connecting these nodes in the graph.

This paper then examines the network visualization of cDNA microarray data set analyzed in [1] and then analyzing temporal profiles relative to the 517 genes using the Reveal algorithm described in [2].Data set available here.

Brief description of Reveal algorithm
- For every gene x,
find set of regulators(minimal set of input genes that can univocally explain behavior of output gene x)
- Based on use of Entropy and Mutual Information scores:
if for 2 genes x and y,
Mutual Information(x,y) = Entropy(x)
then
y univocally determines x

Use of Reveal in GENIUS
They extend the algorithm to include 3 discretization data levels instead of 2.
-1 : under-expression
0 : equal expression
+1 : over-expression
of serum stimulated cell genes w.r.t. expression values of same genes measured using non-stimulated cells.

179 groups(pseudo-genes) recognized and extended algorithms was applied to them.

You can check out the example given in the paper to see how the output looks.

REFERENCES
1. Iyer V. R. et al. (1999): The transcriptional program in the response of human fibroblasts to serum. Science: 283: 83-87
2. Liang S, Fuhrman S, Somogyi R. REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Pacific Symp. Biocomp. 1998: 98 (3):18-29.

Thursday, March 22, 2007

Paper summarized - Principles of microRNA regulation of a human cellular signaling network

Principles of microRNA regulation of a human cellular signaling network
Qinghua Cui, Zhenbao Yu, Enrico O Purisima and Edwin Wang

What are microRNAs?
  • ~22nucleotide long non-coding RNAs
  • responsible for RNA-based gene regulation
  • act as post transcriptional and translational regulators
  • base-pair with target mRNAs
  • ~1% of predicted genes in human genome
  • BUT Regulate 10–30% of genes
  • Targets
    • signaling proteins
    • enzymes
    • transcription factors
It is unclear if and how miRNAs might orchestrate their regulation of cellular signaling networks and how regulation of these networks might contribute to the biological functions of miRNAs.

What are signalling networks?
These make decisions about whether to grow, differentiate, move or die. Their components are Proteins. They are represented as graphs where the nodes represent the proteins and the links between the nodes represent the interactions between the proteins.

Hypothesis paper is based on
Role of miRNAs in strength and specificity of signaling networks through direct control of proteins at post-transcriptional and translational levels.

Signaling network used:
Signal transduction processes from multiple cell surface receptors to various cellular machines in a mammalian hippocampal CA1 neuron consisting of:
540 nodes
1258 links
-689 activating (positive) links
-306 inhibitory (negative) links
-263 neutral (protein interactions)

Results stated (Glossary for the terms given below)
  • MiRNAs more frequently target network downstream signaling components than ligands and cell surface receptors


  • MiRNAs preferentially target the downstream components of the adaptors, which have potential to recruit more downstream components

  • MiRNAs more frequently target positively linked network motifs

  • MiRNAs avoid targeting common components of cellular machines in the network
Glossary
  1. adaptor proteins: The function of these proteins is recruiting downstream signaling components to the vicinity of receptors. It invloves no enzyme activity – they physically interact with upstream and downstream signaling proteins
  2. network motif: A complex signaling network can be broken down into distinct regulatory patterns, or network motifs, typically comprised of three to four interacting components capable of signal processing. The function of a motif also depends on whether the links are positive or negative.
  3. scaffold proteins: Unlike adaptors, scaffold proteins do not directly activate or inhibit other proteins but provide regional organization for activation or inhibition between other proteins.
  4. functional modules: represent a set of proteins that are always present in various cellular conditions.

Tuesday, March 20, 2007

Sequence Alignment - An introduction


Sequence alignment is one of the most important and basic concepts in computational genomics. Given 2 sequences, what is the best way to arrange the letters of the sequences one below the other so that there are maximum matches between them? Putting it another way, given 2 sequences s and t, find s' and t' such that
1. by removing the gaps from s' and t', we can retrieve s & t.
2. there is no i such that s[i] and t[i] are gaps.
3. length(s') = length(t') >= max(length(s),length(t)) and <= sum(length(s),length(t)) In the above definition, s[i] means the letter in sequence s at the ith position. length(s') means length of sequence s' max(length(s),length(t)) means maximum of length of s and length of t sum(length(s),length(t)) means sum of the lengths of s and t I think things will get absolutely clear with some examples here. Mind you, I am trying to explain keeping a layman in mind. Some of the terms used might sound too technical to some while to others my explanations might seem too kiddish. I hope to attain the right balance. Let us assume we have 2 DNA sequences s = ACCT and t = ACGT. For those who dont know about DNA click here.

When we try to align these 2 sequences,

A C C T
| | : |
A C G T

The A,C,T match whereas there is a mismatch at position 3. Naturally, there are many alignments possible for any such pair. (Real data would consist of much longer sequences). The alignment chosen is the one with the highest alignment score.

The score is calculated adding the scores for the matches and mismatches(usually negative) and penalizing for gaps. For example, if match(M) = +1, mismatch(m) = -1 and gap penalty(g) = 2,

A C C - T
| | |
A C - G T

"-" are gaps. The alignment scores for the 2 alignments are

1. M+M+m+M = 1+1-1+1 = 2
2. M+M-g-g+M = 1+1-2-2+1 = -1

Clearly, the first option is preferable.

There are 2 types of alignments based on number of sequences to align:
1. Pairwise alignments
2. Multiple Sequence Alignments

There are 2 types of alignments based on parts of the sequences aligned:
1. Global alignments - align entire sequences
2. Local Alignments - align regions of the sequences

I will cover each of these types in latter posts.


References and Further Reading:
1. Bioinformatics - Sequence Analysis by David Mount
2. Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443-453

Friday, March 16, 2007

today's video

This is an amazing short video and article on


Diagnosing Alzheimer's Early

Sunday, March 11, 2007

Saturday, March 10, 2007

DNA...

I know this post has come much later but I hope it helps anyways.
So what's DNA?
Two strands of linked nucleotides with one of the four bases adenine (A), thymine
(T), guanine (G) and cytosine (C).
Every cell in an organism has chromosomes in its nucleus which is made up of DNA strands coiled together with very high density. Think of this DNA as a string of letters from an alphabet. The english alphabet has 26 letters. Similarly, the DNA alphabet has 4 letters: A,C,T,G. Just like english has sentences made of words which are made of the 26 letters, exactly the same way, DNA strands have strings made of "codons"(explained in a moment) which are made of the 4 letters.

What are these codons?
Codons are equivalent to the words in English. But English has words of different lengths . Heres where codons differ. They ONLY consist of 3 nucleotides(A,C,T,G). Each codon then after transcription (conversion of DNA to RNA) gets translated by a complex mechanism to amino acids. Again using analogies from English language, like every word has a specific meaning (stupid eg. you cant use "man" when you mean "car"), every codon can be translated only to a specific amino acid. Strings of amino acids form proteins. Again, since there are 4 nucleotides and 3 letter long codons, there are 4^3 = 64 possible codons. But there are only 20 amino acids(more about them later) thus there is some ambiguity. Like man, male, fellow refer to the same thing, many codons code for same amino acid. Apart from the 20 amino acids, there is one START codon and 3 STOP codons. The codon table is given below (taken from [1])



What is transcription?
Transcription is the process of converting DNA to RNA. RNA has the 4 letters A,C,G,U in its alphabet. More about transcription later.


Fig: from Bioinformatics: from data to biological knowledge by Dena Leshkowitz

References:
1. http://www.cs.cmu.edu/~blmt/Seminar/SeminarMaterials/?N=A

Further Reading:
1. Molecular Biology by Robert Weaver for details on transcription and translation.

Wednesday, March 07, 2007

microRNA target recognition - 2

Experimental identification of miRNA targets is not an easy task, especially using conventional tools. The principal challenge in target recognition of miRNAs is based on the small size of their targets (18-24 nucleotides (nts)). Also, every human miRNA has hundreds of targets with limited complementarity, unlike plant miRNAs. The affinity and specificity required for their recognition requires highly precise tools as the difference between a true target and a false positive might be a single base.
There has been an explosion in computational biology algorithms for human miRNA target prediction. We propose their identification and verification for human miRNAs using a combinatorial approach involving computational and molecular biology. We intend to make extensions to well-established tools and techniques to verify the miRNA targets predicted using the computational methods.
Prediction of miRNA targets provides an alternative approach to assign biological functions. This is simpler in plants due to their high complementarity and limited targets per miRNA but functional duplexes can be more variable in structure in humans [1]. Thus, we propose the use of more than one method to verify these targets.
For accurate and sensitive means to measure the expression levels of miRNAs without need for RNA size fractionation and/or RNA amplification, we intend to optimize RNA preparation protocols, as well as labeling and hybridization protocols.
A Harvard University researcher and pioneer of miRNA research, Gary Ruvkun has called miRNAs "the biological equivalent of dark matter, all around us but almost escaping detection." It has been well-established that miRNAs have a role in cancer development and tissue differentiation. They regulate almost one third of the genes in the human genome [2] although, it is still not known why miRNAs regulate some genes and not others. Some of their other functions include cell proliferation, apoptosis, oncogenesis and anti-viral defense. These previously considered “junk” RNA have implications for the treatment of cancer, diabetes and brain disorders.
The necessity to study these tiny pieces of mRNA also stems from the fact that they comprise 1% of the genes in animals and are highly conserved across the species. The understanding of miRNA function is very limited, which makes even target prediction an extremely challenging task.
Once miRNA targets are known, it might help understand complicated gene regulation, especially in gene networks. It has been shown that genes with higher cis-regulation complexity are more coordinately regulated by transacting factors at the transcriptional level and by miRNAs at the post-transcriptional level [3]. Thus, understanding the miRNA regulation pattern might fill gaps in the studies of gene networks and regulation.

References:
1. Brennecke J, Stark A, Russell RB, Cohen SM (2005) Principles of microRNA–target recognition. PLoS Biol 3(3): e85.
2. Lewis BP, Burge CB, Bartel DP (2005) Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets. Cell 120: 15–20.
3. Cui Q, Yu Z, Pan Y, Purisima EO, Wang E; MicroRNAs preferentially target the genes with high transcriptional regulation complexity; Biochem Biophys Res Commun. 2006 Nov 27

Saturday, March 03, 2007

microRNA target recognition - 1

This is part of an original proposal I came up with for one of my courses. I will be following up on it soon with another project and hopefully get some answers some day :D

This article will be posted in parts. Today, I will be giving an introduction to miRNAs.

MicroRNAs are small (~22nucleotide long) non-coding RNAs that form part of a highly conserved system of RNA-based gene regulation in eukaryotes. Mature miRNAs are found in cytoplasm where they act as post transcriptional regulators of gene expression by base-pairing with target mRNAs.

miRNAs are transcribed as regions of longer RNA molecules, that are processed in the nucleus into hairpin RNAs (70-100nt) by the dsRNA-specific ribonuclease Drosha. The hairpin RNAs are transported to the cytoplasm via an exportin-5 dependent mechanism where they are digested by a second dsRNA specific ribonuclease called Dicer. The resulting ~22mer is bound to a complex called RNA-induced Silencing Complex (RISC). RISC is responsible for RNAi. These miRNAs bind to mRNA through limited complementarity in humans and thus, cause reduced/blocked gene expression, through mechanisms not yet understood completely. miRNAs are bound to proteins that belong to the Argonaute family and, in humans, may also assemble with other proteins, including the Gemin3 and Gemin4 proteins, to form micro-ribonucleoprotein complexes[1]

The first miRNAs, lin-4 were discovered in Caenorhabditis elegans in 1993 (Lee et al. 1993) in a genetic focused on identifying genes involved in the heterochronic pathway. For almost a decade, they were considered relatively unimportant. But they have stirred much enthusiasm in the biological and medical communities since 2004 when their function of stifling the production of proteins , contrary to their close relatives, mRNA was highlighted through the work of many laboratories and their roles in brain development, HIV resistance, blood cell development, obstruction of genes causing certain types of cancer etc. were discovered.

References:
1. 9. Kiriakidou M, Nelson PT, Kouranov A, et al; A combined computational-experimental approach predicts human microRNA targets. Genes Dev. 2004 May 15;18(10):1165-78

Fig. Biogenesis of miRNAs
from http://www.gurdon.cam.ac.uk/~miskalab/research.php