Aligning sequences and building trees showing relationships among sequences

Return to the main tutorial menu

For this tutorial PCR products from the Lisa HIV case will be used to demonstrate how to align sequences and build trees. The tutorial below begins after the PCR products have been generated.

Click here for a tutorial that describes how to generate these PCR products.

Note: It is not always necessary to run PCR prior to tree-building. PCR products are preferable to use, if possible, because they are much shorter than the original DNA samples, and multiple alignment occurs much more rapidly for relatively short DNA sequences. In this example, the DNA samples represent the entire HIV genome (> 9000 bp) isolated from patients, whereas the PCR products from these sample represent a portion of the HIV env gene (about 300 bp).

If PCR primers are not available, portions of sequences can be selected and exported (see Step #12 below). In addition, the search feature can be used to find similar sequences in different samples (there is a separate tutorial that shows how to search sequences).

The tutorial below shows how to build trees quickly using the 'Opened & processed' (O&P) window. It also shows how to build trees using both the O&P and Sequence Analysis windows. The latter method takes a bit longer, but is more flexible as you will see.

1. Highlight the PCR products again shift-clicking on lines in the O&P window as shown below.

Note: Do not highlight the original DNA files or the primers, but rather highlight the PCR products (designated by arrows proceeding the filename, and the suffix "(PCR)" following the filename).

It is possible to align sequences and build the tree using the original DNA files, but it will take a relatively long time because of their size. This is the reason why it is preferable to use shortened sequences (such as PCR products) rather than the original files.

For any particular alignment you should use either all DNA or all protein sequences. This example uses DNA files. Although it is possible to construct a tree using a combination of DNA and protein files, there is no reason to deliberately do so.

2. Now use the 'Analyze' menu as shown below and select 'Build tree directly from selected files in O&P window...

**MEGA 4, a separate application, must be installed for this command to work. Click here for more information on installation of MEGA 4 .

**Please be patient after selecting this command, as it takes awhile for the sequences to be aligned and the tree built. It can take several minutes for long sequences to be aligned, but shorter sequences such as the PCR products used here are aligned in 10 seconds or so. Do not click the mouse while you are waiting.

** You may get one or two windows asking if you if you really want to run the application. If this happens, click the 'Run' button on each window (these windows may be hidden behind other windows).

...several windows will pop up, including the one above that indicates that multiple sequence alignment has been started.

Message boxes may appear asking if you really want to run the software may appear. If it does, click the Run button on each message box ( if separate boxes appear, each must be closed).

3. Eventually another series of windows will pop up as MEGA4 automatically opens and shows the resulting tree, indicating the degree of similarity among the various PCR products. Note that in this case the strain infecting Lisa and her boyfriend are more similar to the highly pathogenic strains than the low pathogenic strains.

Note: The MEGA4 windows will appear in random overlapping order, and the tree below may be hidden beneath other windows. You may have to drag the windows about to find the tree.

WARNING: Do not close any of the MEGA4 windows at this point, because if you close the main window (the one with the MEGA 4.0.2 title bar) it will automatically close all other windows. Instead of closing them, rearrange them to find the ones containing the tree and / or the window displaying the multiple alignment.

4. To save an image of the tree, first use the Image menu of MEGA 4 to copy the image to your computer's clipboard, and then paste the image into an application such as Paint.

Note: You can also save as a TIFF file to avoid using Paint, but the font size on the resulting image may be too large.

5. After saving the image, close MEGA 4 by clicking the X on the main window as shown below.

Note the appearance of the main window (below) so that you can identify it. Closing this window instantly closes all other MEGA 4 windows, so that you do not have to close the other windows individually.


6. A more flexible way of conducting multiple alignments involves use of the 'Sequence analysis' (SA) window. This window enables you to add additional sequences before aligning or building the tree.

To demonstrate this, shift-click on the first four PCR products to highlight them in the O&P window, then click the 'Analyze' button at the bottom of the O&P window and make the menu selection shown below.

7. The above command causes the 'Sequence analysis' window to open automatically, and the selected PCR products are added to the Export field of this new window.

8. Now click the other Analyze button, shift-click to highlight the last three sequences, and make the same menu selection.

9. All seven PCR products will now be in the Export window.

Note: Clicking the blue 'Find FASTA' button in the upper-right hand corner of the Sequence analysis window allows you to quickly find the FASTA definition line of each sequence added to the window, to make sure that all of the sequences were in fact added to the Export field.


10. You can now build the tree, using the contents of the Export field, as shown by the menu selections below. We will not show the tree, as in this example it would be the same as before. The point is, use of the Export field gives greater flexibility because you can select any series of files, or single files, to add to the Export field at any time, and then align sequences and build the tree when you are finished.

Other options involving the Export field

11. You can replace rather than add to the Export field. For example, you could select the PCR products shown below and replace contents of the Export field with these products. Additional options are shown in the screen shot below.

12. If PCR primers are not available, another method is available to shorten sequences to decrease the time of multiple alignment and tree-building. We will illustrate by selecting the original Lisa sequence (not the PCR product that had been generated from this sequence). Note that the original sequence is 9153 characters in length.

13. Click the 'Sequence' checkbox, and then click the Sequence analysis button (unless the window is open already)

14. Use the red drag handles in the Sequence Analysis window and make the field for 'Range and length of selection' larger.

15. Highlight a portion of the sequence by dragging with your mouse, and note that the range (from the beginning of the sequence) and length changes dynamically as you drag the selection.

16. Use the yellow 'Analyze' button and make the selection shown below...

17. ...and the highlighted selection appears in the Export field. Selections from other sequences can be added to the field, before alignment and tree-building (using either the Analyze button of the Opened & Processed window or the yellow Analyze button shown below. We will not build the tree here, as the procedure is the same as in Step 2 above.

18. Another option is to put the entire sequence below, rather than a highlighted portion, into the Export field.

Return to the main tutorial menu