Display, evaluate and edit alignments
for comparative modeling

How to submit, evaluate and edit target-template alignment using MODalign

PDF version of this help PDF icon

Overview

MODalign is an interactive web based tool aimed at helping protein structure modelers to inspect and manually modify the alignment between the sequences of a target protein and of its template(s). It interactively computes, displays and, upon modification of the target-template alignment, updates the multiple sequence alignments of the two proteins, their conservation score, secondary structure and solvent accessibility values, and local quality scores of the implied three-dimensional model(s).

Among others, MODalign provides:

Submitting the job

Input target-template alignment format

Input to MODalign is a target-template alignment in FASTA, PIR, CLUSTAL, PHYLIP, or MSF format.

The first sequence must be the target, all following sequences the templates. See Where do I get the alignment from? if you don't have a target-template alignment yet.

Target name is arbitrary, while template names should be pdbcode_chain, e.g.: 1xxx_A.

Example input with raw names (FASTA format):

>PMS2-CTD
------------------------KTMFAEME-IIGQFNLGFIITKLNEDIFIVDQHATD
EKYNFEMLQQHT----VLQGQRLIAPQTLNLTAVNEAVLIENLEIFRKNGFDFVIDENAP
VTERAKLISLPTSKNWTFGPQDVDELIFMLSDSPGVMCRPSRVKQMFASRACRKSVMIGT
ALNTSEMKKLITHMGEMDHPWNCPHGRPTMRHIANLGVISQN-----------
>3kdg_A
---------------------GAM-DRVPIMY-PIGQMHGTYILAQNENGLYIIDQHAAQ
ERIKYEYFREKVGEV-EPEVQEMIVPLTFHYSTNEALIIEQHKQELESVGVFLESFGS--
--NSYIVRCHPAWFPKGEEAELIEEIIQQVLDSKN-IDIKKLREEAAIMMSCKGSIKANR
HLRNDEIKALLDDLRSTSDPFTCPHGRPIIIHHST-------YEMEKMFKRVM
>3ncv_A
TMGSSHHHHHHSSGLVPRGSHSQS--ELPPLGFAIAQLLGIYILAQAEDSLLLIDMHAAA
ERVNYEKMKRQRQENGNLQSQHLLIPVTFAASHEECAALADHAETLAGFGLELSDMGG--
--NTLAVRAAPVMLGKSDVVSLARDVLGELAQVGSSQTIASHENRILATMSCHGSIRAGR
RLTLPEMNALLRDMENTPRSNQCNHGRPTWVKLTL-------KELDTLFLRGQ                
                
Rich names format:

The names must follow a specific convention, for target:

type:target|names:some_name
For templates:
src:some_name|aln_id:some_number|type:template|name:pdbcode_chain
src can be used to indicate the server from which the alignment to this template originates. aln_id can be used to distinguish between the alignments to the same template from the same src. Both src and aln_id are optional. In future, we will add a feature that when using this format, you could have alignments to the same template, as long as they have different src or aln_id attributes.

Example input with rich names (FASTA format):

>type:target|name:PMS2-CTD
------------------------KTMFAEME-IIGQFNLGFIITKLNEDIFIVDQHATD
EKYNFEMLQQHT----VLQGQRLIAPQTLNLTAVNEAVLIENLEIFRKNGFDFVIDENAP
VTERAKLISLPTSKNWTFGPQDVDELIFMLSDSPGVMCRPSRVKQMFASRACRKSVMIGT
ALNTSEMKKLITHMGEMDHPWNCPHGRPTMRHIANLGVISQN-----------
>src:unknown|aln_id:0|type:template|name:3kdg_A
---------------------GAM-DRVPIMY-PIGQMHGTYILAQNENGLYIIDQHAAQ
ERIKYEYFREKVGEV-EPEVQEMIVPLTFHYSTNEALIIEQHKQELESVGVFLESFGS--
--NSYIVRCHPAWFPKGEEAELIEEIIQQVLDSKN-IDIKKLREEAAIMMSCKGSIKANR
HLRNDEIKALLDDLRSTSDPFTCPHGRPIIIHHST-------YEMEKMFKRVM
>src:unknown|aln_id:1|type:template|name:3ncv_A
TMGSSHHHHHHSSGLVPRGSHSQS--ELPPLGFAIAQLLGIYILAQAEDSLLLIDMHAAA
ERVNYEKMKRQRQENGNLQSQHLLIPVTFAASHEECAALADHAETLAGFGLELSDMGG--
--NTLAVRAAPVMLGKSDVVSLARDVLGELAQVGSSQTIASHENRILATMSCHGSIRAGR
RLTLPEMNALLRDMENTPRSNQCNHGRPTWVKLTL-------KELDTLFLRGQ
                
When using rich name format, you must select Names in rich format checkbox in the input form: Showing rich names option selected

Multiple sequence alignment (MSA) of the target family

You can optionally submit your own MSA of the target family in FASTA, CLUSTAL, PHYLIP, or MSF format. Click on "Advanced options - OPTIONAL". The following form should appear: User MSA submission field After you upload a file, two required additional fields will appear, TARGET sequence name in MSA and Alignment format: Required fields for MSA submission TARGET sequence name in MSA does not have to match the name in target-template alignment, but the sequence must be the same (gaps are of course allowed).

Submitting a custom MSA is highly recommended: by default, the target MSA is constructed using two iterations of HHblits against uniprot database and filtered to remove very similar sequences. This automatic procedure not always generates an MSA optimal for a particular case (see note on MSA comparison)

Maximum number of templates

Currently, you cannot submit more than 4 templates.

Too many templates in the target-template alignment could make the editor interface working too slow. If you do have many templates, you should consider whether you really need all of them. Note, that usually it does not make much sense to use more than one to five templates.

Read before submitting

There is some advice you should consider before submitting the target-template alignment

Using the workspace

After submitting the job, you will be redirected to the workspace.

In the workspace, you can:

First, you will see that the job is queued: Job queued in workspace or running: Job running in workspace When it's running, you can monitor the job in detail in the displayed log.

After some time (usually 3-10 minutes) you will see that the job is ready: Job ready in workspace or it crashed: Job crashed in workspace

Accessing the workspace

There are several ways of accessing the workspace at any time.

Using the alignment editor

You can access the alignment viewer and editor by clicking on the alignment description in the workspace: Job running in workspace

You will be redirected to the editor interface. After you minimize a splash log window, you will see this view:

First view of editor with explanations
See larger version of this figure

Description of default interface sections:

Target homologs
Sequence alignments of target homologs. Only up to ten representative sequences are shown. Insertions in homologs relatively to the target sequence are not shown. This section is not editable
T-t alignment
Target-template alignment. Displays the target sequence, similarity row, and templates. In this part, you can edit the alignment (See how to edit the alignment)
SEQRES, original user sequence, and sequence alignments of template homologs
First view of editor with explanations
  • SEQRES - canonical SEQRES derived from PDB file
  • ori_seq - original sequence submitted by the user
  • homologs - representative homologs of the template. Only up to ten representative sequences are shown. Insertions in homologs relatively to the template sequence are not shown. This section is not editable

Analyzing sequence conservation

MODalign provides several tools to analyze sequence conservation between target and templates:

Displaying potential errors in the alignment

You can display potential errors mapped on the target sequence and optionally on the sequence of its representative homologs. Selecting one of the options below will add new sections above the target and below the templates.

Comparing secondary structure and solvent accessibility patterns

You can compare secondary structure and solvent accessibility PREDICTED for target and its homologs and CALCULATED for templates.

Accessing "flanking regions"

"Flanking regions" are columns of the alignment before and after the first and the last residue of the target. They contain regions of template sequences (derived from PDB SEQRES records) even if the original target-template alignment was truncated. Note that if the flanking regions were automatically added, they are not aligned in any way between the templates.

By default, the flanking regions are shaded: the flanking regions are shaded The shading disappears as soon as a previously shaded column or residue becomes aligned with a target residue: shading disappears as soon as a previously shaded column or residue becomes aligned with a target residue

The shading can be toggled on/off using the Tools menu: Shade flanking regions menu

Dealing with residues missing in PDB structures

Any residues that are present in PDB SEQRES sequence, but cannot be derived from the PDB atom sequence (i.e. from ATOM records) are treated as missing, and displayed in lower case in the target-template alignment section: Missing residues in alignment

Changing a reference template

You can change a reference sequence by moving it in the target-template alignment section with the Up and Down keyboard keys. Simply select a template and move it up or down. Remember, to select the template you must click next to the name - the background should get highlighted.

Editing the alignment

Assessing the alignment quality with QMEAN

You can display both global and local QMEAN scores for target and templates as well as their representative homologs. For target and homologs, MODalign will take the current alignment, build quick full atom models (without modeling insertions, see details about modeling) in the background, and return the results. Usually this procedure takes around one minute.

Analyzing the alignments in 3D

Click on the "Go 3D!!" button and select a template: Zoom at Go 3D! button Jmol window will open: Jmol with template just after opening

Analyzing positions of insertions and deletions in 3D

Open Jmol with a template (see Analyzing the alignments in 3D). Click on INDELS highlighting The coloring should change to reflect the current positions of insertions and deletions. Insertions and deletions displayed in Jmol Note, that if you now start editing the alignment by removing or adding gaps, the coloring will immediately reflect your changes!

Saving the alignment

At any time, you can save the current alignment on the server: Click on Save-> To workspace to save the alignment You can later access the saved alignment in the workspace for your job: You can access old alignment versions in the workspace

Exporting the alignment

To export the alignment from MODalign use this menu: Click on Save-> To local file to export the alignment There are two options

Building the model

To build a 3D model from the current alignment, use "Build model" menu: Click on Build model A window will open in which you can select templates for modeling: Select templates in window Explanation of the options:

After clicking "Build Model" in the window, you will see that a job is running on the Log Console icon: Log console icon shows sth running When a model is ready, you will see "Your models" icon blinking: Your models icon blinking Read how the model is built

Accessing the models

You can view the models and superposition with templates in 3D using Jmol, see the global QMEAN scores, download the models as PDB files, and delete models that are not needed anymore.

Integrated bioinformatics software and databases

HHblits

HHblits is used to build multiple sequence alignment of target and its homologs.

hhfilter

hhfilter is used to filter alignments of target and template families.

PSIPRED

PSIPRED (version 3.2.1) is run using addpsipred.pl from hhblits version 2.2.20 (Sep 2011) package

addpsipred.pl allows to run PSIPRED from custom MSA by calculating the BLAST checkpoint and profile (mtx) file. It performs the following operations (for details please refer to the addpsipred.pl source code):

reformat.pl -v 2 -M first fas a3m target.a3m target.in.a3m
reformat.pl -v 2 -r -noss a3m psi target.in.a3m target.psi
blastpgp -b 1 -j 1 -h 0.001 -d /db/HHsearch/dummydb -i target.sq -B target.psi \ -C target.chk 1> target.blalog 2> target.blalog
makemat -P target
/psipred321/bin/psipred target.mtx /psipred321/data/weights.dat \ /psipred321/data/weights.dat2 /psipred321/data/weights.dat3 \ > target.ss
/psipred321/bin/psipass2 /psipred321/data/weights_p2.dat 1 0.98 1.09 \ target.ss2 target.ss > target.horiz       

The input MSA generated by hhblits and filtered with -diff 100 -cov 5 with hhfilter is used (see above).

ACCpro

ACCpro is used to predict solvent accessibility states of the target and its homologs.

DSSP

DSSP is added based on a local copy of DSSP database updated weekly from DSSP website

If DSSP file is not available in the database, it's calculated on the fly using DSSP CMBI version by Elmar.Krieger@cmbi.kun.nl / November 18,2002, with default parameters.

For display in the editor, DSSP alphabet is converted to three letter alphabet using the following conversion scheme:

G -> H
I -> H
H -> H
E -> E
B -> C
L -> C
T -> C
S -> C

POPS

POPS is used to calculate solvent accessibility states of the templates.

Muscle

muscle program is used to align user-provided sequences to PDB sequences with the following parameters:

Modeller

QMEAN

Theseus

Theseus is used to superpose template and model structures. The superposition is performed based on the current target-template sequence alignment, using only the alignment column range aligned with the target sequence.

Databases

Frequently asked questions

Where do I get the alignment from?

Use external servers. Some of the recommended servers are listed below.

What the server calculates after pressing Submit button?

  1. Creates an MSA of target homolog sequences

    The target MSA is constructed using three iterations of HHblits against uniprot database and filtered to remove very similar sequences.

    This step can be disabled in Advanced options of the submission form.

  2. Adds CANONICAL SEQRES sequences of templates to the alignment

    The sequences are aligned to corresponding original sequences already in the target-template alignment using muscle program with the following parameters:

    -maxiters 1
    -diags
    -sv
    -distance1 kbit20_3
    -gapopen -100

    According to our tests, these parameters ensure fast accurate alignment of two sequences that are nearly identical but can contain missing regions relatively to each other.

  3. Prepares MSAs of representative homologs of target and templates

    Filtered MSAs of template homologs are retrieved from HHSearch database of alignments.

  4. Adds ATOM sequences

    ATOM sequences are derived from ATOM record of PDB files. Some non-standard amino acids are changed to canonical ones if possible. The sequences are aligned to corresponding SEQRES sequences already in the "big MSA" using muscle as above.

  5. Predicts secondary structure for representative homologs

    Secondary structure is predicted using PSIPRED

  6. Predicts solvent accessibility for representative homologs

    Solvent accessibility is predicted using ACCpro

  7. Calculates secondary structure for templates

    Secondary structure is calculated with DSSP.

  8. Calculates solvent accessibility for templates

    Solvent accessibility is calculated using POPS

  9. Merges all sequences into one single MSA

    MSA is created containing:

    • Original target and template sequences
    • ATOM sequences of target and template sequences
    • SEQRES sequences of target and template sequences
    • Homologs of target and template

    This alignment (including up to 10 representative homologs for target and each template) is displayed in the alignment editor.

How sequence conservation is calculated?

First, based on the Blosum62 matrix, two overlapping amino acid classifications were created

Then,

How to get Modeller key?

To obtain Modeller key submit Modeller registration form online. MODELLER is available free of charge to academic non-profit institutions.

Which is the best browser for MODalign?

MODalign works well in all major browsers,

Exact performance depends, however, on the particular operating system / web browser combination.

Browser/OS Linux MacOS Windows
Google Chrome v. 16.x
Safari v. 5.1.x
Firefox 8.x - 9.x
Firefox 3.x - 7.x
Opera
Internet Explorer 6.x - 9.x

Data for: January 2012

Legend
Best combination
It works and performs acceptably
It works, but SLOW performances
It works, but VERY SLOW performances
It doesn't work at all
Browser not available for that OS

How long jobs are kept on the server?

Jobs are kept for 90 days from the last access

How privacy of my job is protected?

Your jobs get unique id numbers composed of submission date and random string composed of numbers and letters (e.g. 2011-10-13/kP9aS_ru1z). Your jobs can be accessed from the web browser you run it only as long as the browser data is not cleared. From all other browsers in the world, it can be accessed only using that difficult-to-guess job id. If you name your job "p53" there is still no way users can access your job by this common name - the job id described above is always required.

Information about your e-mail address is not stored on our server.

If you run or access your job from a public web browser we recommend:

  1. Provide e-mail address in a submission form
  2. After finishing the work on your job, clear cache, cookies, and history of your browser OR/AND:
  3. Delete the job in Your recent jobs section.

    The job will be deleted only from the web browser data, you can still access the job using a link in the e-mail

How to cite MODalign?

MODalign is submitted for publication

Troubleshooting

Slow editor interface

MODalign editor can get slow when:

We recommend the following ways of solving performance problem:

3D visualization (Jmol) is not working

Platform specific notes:


Please send feedback to: support@modorama.org


If you use this web server, please cite the following references:

  • Kosinski, J., Barbato, A., Tramontano, A. MODexplorer: an integrated tool for exploring protein sequence, structure and function relationships (2013) Bioinformatics, 29 (7):953-4 [article]
  • Barbato A., Benkert P., Schwede T., Tramontano A. and Kosinski J. Improving your target-template alignment with MODalign. (2012) Bioinformatics 28 (7):1038-1039 [article]