Search This Blog

Showing posts with label VARNA. Show all posts
Showing posts with label VARNA. Show all posts

Wednesday, June 2, 2010

Parsing WUSS notation of RNA secondary structure annotation

A key part of this project is to parse the secondary structure line of Stockholm files so that it can be interpreted for coloring schemes. I have been adding mini-goals as appropriate. I will probably also need to add code to check that the sequence length and secondary structure length are the same, as well as the same number of open and closed parentheses.

WUSS notation is used in RNA stockholm files to indicate secondary structure. WUSS notation can support more characters than I thought, but Rfam uses the simplified version that the covariance modeling program Infernal uses. The description of Rfam on the Janelia Farm page is


Rfam is a collection of multiple sequence alignments and covariance models covering many common non-coding RNA families. The main use of Rfam is as a source of RNA multiple alignments with consensus secondary structure annotation in a consistent format. In conjunction with the Infernal software package, Rfam covariance models (CMs) can be used to search genomes or other DNA sequence databases for homologs to known structural RNA families.


WUSS notation uses <>, (), [], and {} to indicate base pairs and ':', ',', '_', '.', and '~' as single stranded columns. Each type of symbols has subtle meaning, but for Infernal the structure annotation line only needs to indicate which columns are base paired to each other. Thus, full WUSS notation is not necessary and a simple minimal annotation uses <> to indicate base pairs and '.' for single stranded positions of the alignment.

In more detail taken from the Infernal user guide:

Base pairs: the different symbols indicate different depth
*<> for simple terminal stems
*() for "internal" helices enclosing a multifunction of all terminal stems
*[] for internal helices enclosing a multifunction that includes at least one annotated () stem already
*{} for all internal helices enclosing deeper multifurcations

Hairpin loops
*indicated by underscores '_'
*Simple stem loops example: <<<____>>>

Bulge, interior loops
*indicated by dashes '-'

Multifurcation loops
*indicated by commas ','
*example: <<<___>>>,,<<<__>>>

External residues completely outside structure
*indicated by colons ':'

Insertions
* . to a known structure
* ~ used to indicate that a local structural alignment left regions of target and query unaligned.

Pseudoknots
* pairs of upper case/lower case letters
* example: <<<<_AAAA____>>>>aaaa


Things that I am thinking about:

-I need to interpret WUSS notation in a general way. It shouldn't be too difficult, but it is necessary since the same structure can be written in multiple ways. An example from the Infernal user guide is : <<<<....>>>> and ((((____)))) and <(<(._._)>)> all indicate a four base stem with a four base loop

-How should I store the secondary structure line so that it will be easily interpreted to implement coloring schemes?

Potentially I can store pairs of positions like how the disulfide bond positions are stored as annotations (Jim pointed this one out). I also need to keep in mind that bulges might exist, so I can't just interpret a run of the same type of bracket as part of the same stem. VARNA interprets bulges just fine, so I don't have to worry about that. An example of a complicated structure with a bulge:

<<<<……<<<< <<<<…..>>>>..>>>>……<<<<…>>>>….>>>>

-How can I make sure that there are 4 stems instead of 3? I can't simply scan through from left to right or eat away at both ends at the same time. It looks like the RALEE mode for Emacs handles bulges just fine based on this example in the readme




0123456789012345678901234
.<<<<<...>>.<<...>>..>>>.


Column 1 pairs with 23
2 with 22
3 with 21
4 with 10
5 with 9
12 with 18
13 with 17



The image is from VARNA. Note the numbering in the image starts at 1 instead of 0.

RALEE is written in Emacs Lisp, so I need to look up some basics in Lisp before I can feel confident that I'm interpreting the code correctly! I think that this code will cut down on my thinking time, however.

To do
-check that secondary structure line and sequence are the same length. Does Jalview already do this?
-change all bracket types to () for VARNA (I just noticed that VARNA only likes (), not <> for base pairing! )
-convert all WUSS symbols to something simple, like how Jalview already does for protein secondary structure (simple helices and sheets)
-Need to figure out how to detect pseudoknots
-Add support for error checking when a user adds a base pair annotation. Make sure same number of column groups are selected
-How will colors cycle for different numbers of stems?

Monday, May 24, 2010

Beginning of Coding!

Today is the official start of coding! I have already started coding to make up time that I'll lose at the RNA Society Conference. I've added the .sto option as readable extensions into Jalview. There is already a Stockholm file parser in Jalview, but it wasn't obvious in the dialog box for opening alignment files.

I'm not quite as far as I want to be for coding, partly due to the time I spent on looking at RNA secondary structure viewers, but I think that the rest of the summer will be better for it. I am also still getting some requests from my thesis advisor for work, and I need to remind him that I have this other project going! He was very supportive of me doing GSoC, so I think he has just forgotten that coding has started.

I've talked to two other RNA biologists about the features they would like in an RNA secondary structure viewer. VARNA doesn't have all of the visualization that scientists might want, but perhaps I can add these if I have time. Thinking about how other scientists might use the secondary structure viewer has been a lesson in software development. How will the user launch VARNA? What kind of interaction will they have with it? What kind of interaction will Jalview and VARNA have?

I'm glad that most of the planning is over and I can really get into coding!

Thursday, May 20, 2010

Importing VARNA into Eclipse, Jar files

I was trying to look at the VARNA source code last night, but it was in a JAR file. I tried to find a JAR file plugin for Eclipse, but I couldn't get it to work. I tried JadEclipse.

I ended up using the jar command in unix to unzip the file and copy it into Eclipse. I then had a funny Java libray error.

Jim helped me out with importing the VARNA code into Eclipse and fixed the library error I was getting. Jim's instructions are below for interested parties. =)


There are a couple of 'archive viewer' plugins, but in the case of the VARNA source, you don't need any - particularly since the archive actually contains an eclipse project. Here's how to import it and get it running:
1. Open the 'File->Import...' dialog.
2. Choose the 'Existing projects into workspace' entry under the 'General' tab.
3. Select import from archive, and locate the VARNA source archive.
4. Check the VARNA3-1 project and hit the import button!

You'll notice that the project is set up a little differently to the Jalview one (there's no source directory, or lib directory, for instance) - this isn't a problem, but its worth remembering that this is the case when you come to package a VARNA jar for Jalview.

The next steps are necessary to get the Java references set up correctly, since you'll have a different version of Java on your system to the one that the archive was exported from.

5. a. Right click the new project (VARNA3-1 probably), and select the 'configure build path' option under the 'Build path' menu.
5.b. fix up the broken reference to the JDK in the 'Libraries' tab - select the entry with an 'x' and hit edit, then pick the JDK which you have on your system (probably java 1.6).
5.c. Select the 'source folders' tab, and select the 'src' folder and remove it from the list.
5.d. hit the 'Add folder' button and select the project's root directory as a source folder to add to the path.
5. e. Check the 'allow output folders for source folders
5. f. Change the default output directory from VARNA3-1/bin to VARNA3-1/
6. hit ok, and the project should rebuild itself with the new settings, with any luck you'll have no errors.

After this, you should be able to run VARNA by locating the fr.orsay.lri.varna.applications.VARNAGUI class and using the 'Run as ... application' menu entry.

RNA Secondary Structure Viewers

One of the goals of this project is to embed a secondary structure viewer into Jalview for RNA. At today's meeting with Jim, it looks like there might be some interactive editing features that we'd like to implement into VARNA, if we choose it. Adding interactive base connection making and breaking would be nice. I'm pretty sure that we will use VARNA. The advantages of VARNA are:

1. It is written in Java and under the GPL license. This means that it can be easily added to Jalview.

2. You can specify secondary structure with WUSS notation

3. It exports the image you make in a variety of formats

4. (My opinion) It has pretty output.

5. You can add annotations and change color of bases

6. Other features allow different visualization schemes, such as basepairing and modes such as "Feynman's diagram"

7. Can handle a variety of file formats (.ct, .dbn, etc)

One interesting thing about VARNA is that it can create multiple panels in the visualization window. This might be useful for comparing a multiple sequence alignment. On the other hand, another type of visualization might be useful, such as overlaying the structures and using color to indicate conservation.

After looking at various RNA structure viewers, it looks like it might be useful to add more valid file formats in Jalview, such as ConnecT (.ct), Base Pair Sequence (.bpseq), and Dot Bracket Notation (.dbf, .dbn).

Below is a list of some RNA secondary structure viewers that I've looked at.

VARNA: Visualization Applet for RNA
http://varna.lri.fr/index.html

RNA-DV
http://rna-dv.sourceforge.net/

jViz.RNA 2.0
http://jviz.cs.sfu.ca/index.html

XRNA
http://rna.ucsc.edu/rnacenter/xrna/xrna.html

RNA2D3D
http://www-lmmb.ncifcrf.gov/~bshapiro/structurelab/structureLab.html

SStructView
http://helix-web.stanford.edu/sstructview/home.html

RNAmovies
http://bibiserv.techfak.uni-bielefeld.de/rnamovies/

RnamlView
http://ndbserver.rutgers.edu/services/help/rnamlview-readme.html

List from: http://openwetware.org/wiki/Wikiomics:Alignment_visualization_and_plotting_RNA_structures

Thursday, May 13, 2010

Adding VARNA, source code up!

I've introduced myself to the jalview-discuss list! Hopefully I won't be flooded with requests.

I had a meeting with Jim today and he went over some of the details on how Jmol is embedded into Jalview. This will help me add VARNA to Jalview, if possible. The main places I will need to look for inspiration (or modify) will be the

StructureSelectionManager class in the jalview.structurepackage
AppJmol class in jalview.gui package
PopupMenu class in jalview.gui package

I emailed Yann Ponty, one of the VARNA authors and he responded very warmly. He took a brief look at my goals and wrote that VARNAPanel does not support mouseover events. I need to look more closely at the code to see if this will make my job difficult.

I also added Jalview ver. 2.5 source code to Google's open source project hosting site, code.google.com. I used svn:

svn import jalview-2.5 https://jalview.googlecode.com/svn/trunk/ -m "First import" --username lauren.ucsc

Right now only Jim and I can add code. I added a project description but I need to figure out what needs to be on the wiki.