Example 1: Yeast Genomes Alignment

Let’s take this unicellular eukaryotic model organism for example and provide a detailed description.

We are going to compare nine yeast strains, which are Saccharomyces cerevisiae, Lachancea kluyveri, Saccharomyces bayanus, Saccharomyces bayanus var. uvarum, Saccharomyces castellii, Saccharomyces kudriavzevii, Saccharomyces mikatae, Saccharomyces paradoxus, Saccharomyces pastorianus respectively. All the genomes are accessible at Saccharomyces Genome Database (http://www.yeastgenome.org/download-data/sequence).

download

Data download and file format change Please rename the downloaded files with brief names (less than 30 characters). These names will also be used as the genome identities in result files. Besides, We provide a command line to format sequence headers. Files processed can be used for alignment.

perl -ne "if(/^\>/){s/\W+/_/g; s/^\_/>/; s/\_$/\n/;} print;" example.fasta >out.fa

Notice that this command can process most of the sequences that are not downloaded form NCBI. For those sequences downloaded from NCBI, our website can process them automatically.

For users' convenience, we have provided a test account stored with the data used for example tests. You can log into the account for yeast with the username "bacteria" and password "password".

Upload Click on "Upload" to enter the page after you log in the site. The upload page shows an upload dialog.
upload
If you need to upload a genome file, you can select a file in the open dialog box, and click on the “Upload” button. For a web-based alignment instead of process through a virtual machine, the individual file size is limited to 30M. Only three kinds of file types are accepted: .fasta, .fasta.gz, .newick. You can click on the red button to remove an uploaded file. Do not refresh the page while uploading a file!

Alignment You can click on the “Align” button on the menu to enter the align page.
set
At this page you should set some parameters for the construction of alignments, such as the name of this alignment job, a target genome, at least one query genome. Besides, you can choose to provide a guide tree or not. The absence of a guide tree will take a while for us to generate one. We also provide a recommended realigning method.

After parameter settings are completed, you can click on “Begin aligning” button to jump to the processing page automatically.
pre
This page displays three dialog boxes. The dialog above contains parameters set in the previous step. You can’t open a new job if the current one has not finished. When your job is completed or you want this job terminated, you can click on the red button at the top-right corner and open up a new task. Notice that terminated job can’t recover.

The middle dialog contains buttons for different processing stages. Among them, the first step is data pre-processing. You can click on the “prepare.sh” to start a job running.
postpre
The information section contains general information about the running task including the start and end times and running status.

Below is progress information for the particular run step. When the last line shows “*DONE*”, the step is supposed to be completed. Users may refresh this page and move on to the next step.

After the pre-processing completes, refresh the page and buttons for the rest steps will appear in the middle dialog box. Users may click on buttons in sequence to generate an alignment. For every step, when the last line shows “*DONE*”, the task is completed and you can move on to the next step.

Downloading results After all jobs are done, users may click on the red “finish” button and download the results generated in every step.
finish
result