Search This Blog

Wednesday, August 18, 2010

Demonstration of GSoC Project

The features that developed for the GSoC project will be available in the next Jalview release. You can also contact me and I can send you a jar file with the new features, or download the code from the Google code hosting page for Jalview and see my post on how to set up Eclipse with Jalview. The code submitted to Google is the same as rev 32 on the Google code host site.

Addendum: Jim posted a comment with a link to the webstart version.

  1. You can fetch sequences from RFAM via the sequence fetcher. Go to "File > Fetch Sequence(s)..." and a dialog box will appear.

  2. From the database drop down menu select RFAM (Full) for the "Full" alignment from RFAM or RFAM (Seed) for the "Seed" alignment. Seed alignments are the original alignments constructed to create a covariance model for searching databases. Full alignment are the result of a search using the covariance model against the sequence database. For this demonstration, I used RFAM (Seed), but selecting the RFAM (Full) will be very similar.

  3. Click the "Example" button to load an alignment name. Click the "OK" button to fetch the alignment.

  4. A new dialog box will appear with the alignment. The RFAM sequence fetcher retrieves files in Stockholm format, so Jalview can interpret the secondary structure information in the file. The secondary structure information in the file is displayed in WUSS notation and helices that are determined from this information are displayed as blue arrows in the Annotation panel.

  5. You can color the helices with the "By RNA helices" color option. Go to "Colour > By RNA helices."

  6. The helices should now be colored.

  7. To view the consensus logo in the Annotation panel, go to "View > Autocalculated Annotation > Show Consensus logo". The coloring of the logo will change based on which color scheme is selected.

  8. Close up of Consensus logo when "By RNA helices" or "Purine/Pyrimidine coloring is selected.

  9. To change the color scheme to "Purine/Pyrimidine" go to "Colour > Purine/Pyrimidine"

Monday, August 9, 2010

Fetching sequences from Rfam

I added the ability to fetch sequences from Rfam. Jim recommended that I do some refactoring of the code to reuse methods to fetch sequences from Pfam, which is a database for protein families, rather than RNA families. Rfam was designed to be similar to Pfam, so it wasn't too hard to add the ability to fetch sequences from Rfam to Jalview.

Under the guidance of Jim, I created an Xfam class in the package, which the Rfam and Pfam classes extend. ( Sidenote, there is an Xfam blog about new developments of the Rfam and Pfam databases. ) Then I added RfamSeed and RfamFull classes, similar to the ones for Pfam. These contain the methods to fetch sequences from the "seed" and "full" alignments available on Rfam in Stockholm format, respectively. I had to modify the names of some methods to keep things consistent. In (in package, I called addDBRefSourceImpl() for RfamSeed and RfamFull to enable calling Rfam sequence retrieval from the Jalview menu.

To fetch sequences from Rfam (and Pfam), Jalview accesses stable urls that can be used to get the alignments. In the RfamSeed and RfamFull classes, part of the url is hardcoded in with the correct variables for the query string (I learned this while I was creating the classes, see the wikipedia page on CGI and QUERY_STRING.)

The variables for the Rfam website:

  • 'acc': followed by "=" and the accession number will give you the corresponding familiy.
  • 'id': followed by "=" and the ID name will give you the corresponding familiy.

  • 'alnType': alignment type, can be 'seed' or 'full'

  • 'nseLabels': toggle for species names, can be 0 or 1

  • 'format': file format, can be 'stockholm', 'pfam', 'fasta' or 'fastau'.

  • Rfam has two mirrors, one at the Sanger Institute and one at Janelia farm. The Janelia farm mirror has not yet been updated to Rfam 10.0, so I used the Sanger Institute url.

    There's one bug and I think it might be in the Stockholm parser. If you go to View > Alignment properties... no dialog window pops up with the alignment properties.

Tuesday, August 3, 2010

Random colors for covariation color scheme

For the covariation color scheme, I want to generate random colors so I don't have to worry about storing an index of colors for an unknown number of helices. I found a nice algorithm at I set up everything for the covariation color scheme and tried out the random color selection for each nucleotide.

Examples are below. I like the first example that uses pastels, but I worry that the colors are not distinct enough from each other. The problem with the second example is that the dark colors make it impossible to read the nucleotide; I could figure out how to make the text white in those cases.

Adding color schemes

Currently I'm adding color schemes, including the covariation color scheme. When you add a color scheme, files need to be edited in the jalview.schemes, jalview.jbgui, and jalview.gui packages. The jbgui package is named for the fact that JBuilder was used initially to automatically generate some of the code. The gui and jbgui classes have matching classes, but the classes in jbgui are preceded by a "G."

The GAlignFrame class maintains a shadow class for the user interface, and the class in the gui package, AlignFrame, has most of the logic. This class includes objects that you see in Jalview, such as buttons. The jbInit method adds the buttons into the gui and sets the properties of the buttons.

Step 1: Create a color scheme and put it in the jalview.schemes package
1a. make it a subclass of ResidueColourScheme (extends)
1b. You will probably write findColour methods that overrides the ones found in ResidueColourScheme
1c. May need to call an index of colors (eg. based on characters in the alignment); store these in

Step 2: Edit in jalview.schemes package
2a. add in a new enumeration symbol for scheme
2b. List of functions to edit:
*getColourName(ColourSchemeI cs)
*getColourName(int index)
*getColour(java.util.Vector seqs, int width, int index)

Step 3: Add a new JRadioButtonMenuItem in GAlignFrame for scheme

Step 4: Edit setColourSelected function in GAlignFrame to check the menu item when the colourscheme is selected

Step 5: Add menu item text and action settings into the jbInit function so it calls a protected GAlignFrame action method (copy the nucleotideColour button example)
5a. Add protected void myButton_actionPerformed(ActionEvent e) function (at end of GAlignFrame)
5b. Add your menu item to the colourMenu window menu (at end of jbInit)
5c. Add your menu item to the colours ButtonGroup (above public void setColourSelected(String defaultColour)
*ButtonGroup colours -> syntax is colours.add(colorscheme)

Step 6. Create a new public method with the identical name (myButton_actionPerformed(ActionEvent e)) which calls changeColour(my new
colourscheme) in AlignFrame

Step 7: For more complex colourschemes, you'll need to make sure any calculated values (e.g. covariation) are completed before the color scheme is applied in the AlignFrame._actionPerformed function

Step 8: Repeat process in the PopupMenu
*There is no GPopupMenu! (this is because PopupMenu is a dynamic menu, so the JBuilder mechanism isn't used).

If you want to add the scheme to the Jmol applet:
Step 1: In jalview.gui package
*Create public void method, scheme_actionPerformed method that creates new scheme object (right above setJalviewColourScheme method)

Step2: In
*add new JMenuItem for scheme
*setText for scheme and add actionlistner for scheme (setText and addActionListner)
*add empty public void scheme_actionPerformed method
*add scheme to colourMenu

If you want to add the scheme to the Jalview applet, it looks like you change these items:
*Add MenuItem in
*Add action listener scheme.addActionListner(this);
*add to coloursMenu
*In actionPerformed method, setJalviewcolorscheme

Monday, August 2, 2010

MAFFT - multiple sequence alignment

One of my coworkers let me know about MAFFT, a multiple sequence alignment program. He thought that it might be useful for my Gsoc project. I found out that it uses a Jalview applet to display the results! Jalview is everywhere! Right now I know that Pfam, Rfam, Clustal, and MAFFT use Jalview applets to display their alignment results...

Addendum: I noticed right after I posted this that one of the alignments in the example file that Jalview has was generated by MAFFT! Shows how observant I am...

Nucleotide codes

I've added a purine/pyrimidine color scheme to Jalview and added descriptions for lesser used nucleotide codes. The page at the KEGG database has a useful table. Here are the ones that I added:

W Weak (A or T)
S Strong (G or C)
M Amino (A or C)
K Keto (G or T)
B Not A (G or C or T)
H Not G (A or C or T)
D Not C (A or G or T)
V Not T (A or G or C

Monday, July 26, 2010

Wrong about RALEE bug

After having email discussion with Sam Griffiths-Jones, the author of RALEE, I discovered that there was a difference interpretation, not a bug, and I decided that Sam's interpretation was better.

I had interpreted a contiguous run of base pairs to be a helix, but after looking at a few structures in VARNA, I realized that some of the "helices" could be considered one helix with a few bulges. Sam gave some good examples:

You have continuous on one side, but not the other:


Alignments where a helix is continuous in most sequences, but bulged occasionally:


In the above case, what is a helix could be subject to arbitrary change, just by adding one sequence with a bulge.