msInspect/AMT Tutorial



msInspect/AMT Tutorial

This is a step-by-step tutorial for msInspect/AMT. In this tutorial, you will install the necessary dependencies for running msInspect/AMT on your machine, create an AMT database from example pepXML files, match MS1 features to the AMT database, and combine the AMT matches with MS/MS data and perform protein inference. All output files from the commands described here are already provided in the tutorial bundle, so that you can see the output with or without running the commands yourself.

If you are already familiar with running msInspect and have the latest version installed on your machine, feel free to use your regular msInspect command in place of the "msinspect.sh" or "msinspect.bat" script mentioned below.

Please note: Running time of the scripts in this tutorial will depend on the speed of your processor and amount of memory available. We recommend at least 1GB of RAM.

Table of Contents

Step 1: Download the Tutorial Bundle

First, download the msInspect/AMT Tutorial Bundle (~30MB). Choose a directory on your machine (either a Windows or a Linux/Unix machine) and unzip the bundle. Change your directory to the amt_tutorial (top) directory of the bundle, which contains viewerApp.jar, README, etc.

Step 2: msInspect/AMT Dependencies

Next, you will need to install the dependencies that msInspect needs to run, if they are not already installed on your machine. These dependencies are Java 1.5 (or later), and the R statistical language.

Java

To run msInspect, you must have a Java 1.5 (or later) JVM installed and on your PATH. Please see the Java website for download and installation instructions for "J2SE 5.0", which includes the required version.

R

msInspect/AMT also requires the R statistical language, version 2.2 or later (some optional functionality requires 2.5.1 or later). Make sure that R is installed and is on your path. I.e., when you type "R" (on Linux; "RCMD" on Windows) from a command prompt, the correct version of R starts.

msInspect/AMT requires the "quantreg" package, which is not installed by default with R and which may be installed from the R command prompt as follows:

source("http://bioconductor.org/biocLite.R");
biocLite(c("quantreg"));

You may be informed that you don't have access to the Rlib directory. If this happens, you will need to set a user variable called R_LIB to some directory that you have access to, and re-run R.

Running msInspect

Now that you have set up these dependencies, ensure that you can run msInspect successfully. We have provided two scripts to run msInspect, one appropriate for Windows (msinspect.bat) and one for Linux (msinspect.sh). These scripts simply run Java on the JAR file viewerApp.jar and pass along whatever arguments you specify. If you run the appropriate script with no arguments, you should see the msInspect splash image and then the msInspect viewer, with a dialog asking you to choose a file to open. Cancel out of this and close msInspect.

Step 3: Find and Filter MS1 Peptide Features

At this point, you would normally run msInspect to find MS1 peptide features in the set of runs you wish to include in the AMT database. This step is not necessary, and would not be appropriate for databases containing runs for which high-resolution MS1 data are not available.  But if high-resolution data are available, it is preferable to use those data in building the AMT database.

Unfortunately, the mzXML files containing MS1 spectra are far too large to include in this tutorial. If those files were available, you would run a command like this:

Windows:
.\msinspect.bat --findpeptides --outdir=features mzXML\*.mzXML
Linux:
./msinspect.sh --findpeptides --outdir=features mzXML/*.mzXML

In this tutorial, the resulting feature files have already been created for you, in the features directory. It is important to filter these feature files, so that they contain only high-quality features:

Windows:
.\msinspect.bat --filter --outdir=features\filtered features\*.tsv --minpeaks=2 --maxkl=3
Linux:
./msinspect.sh --filter --outdir=features/filtered features/*.tsv --minpeaks=2 --maxkl=3


Step 4: Create AMT Database

In this step, you will create an AMT database. In the pepxml directory, there are several files with extension .pep.xml. These files, in the standard pepXML format, are the result of an MS/MS database search on acrylamide-labeled data from complex samples. In order to keep this tutorial bundle small, these files represent only a few fractions of a single sample -- normally you would build an AMT database from many more fractions from multiple samples. For demonstration purposes, you will create an AMT database from just these few files.

Filter pepXML files

It is important that the AMT database contain only high-quality peptide entries. To filter the pepXML files, writing the filtered files to the directory pepxml/filtered, use the following command:

Windows:
.\msinspect.bat --filter --outdir=pepxml\filtered --minpprophet=.95 pepxml\*pep.xml
Linux:
./msinspect.sh --filter --outdir=pepxml/filtered --minpprophet=.95 pepxml/*pep.xml

Create amtXML file

Next, we will create an amtXML file to store all the entries in the AMT database.  Note: the AMT database build here will contain retention times derived from high-resolution MS1 peptide features.  If you were not using MS1 retention times in your database, you would omit the "--ms1dir" argument.

Windows:
.\msinspect.bat --createamt --mode=directories --out=amt\tutorial.amt.xml --ms2dir=pepxml\filtered --ms1dir=features\filtered
Linux:
./msinspect.sh --createamt --mode=directories --out=amt/tutorial.amt.xml --ms2dir=pepxml/filtered --ms1dir=features/filtered

Align runs in database

Now that the AMT database contains all the runs you want, all of the different runs (in this case, fractions) in the AMT database will need to be aligned to each other to resolve any nonlinear differences in retention time:

Windows:
.\msinspect.bat --manageamt --mode=alignallruns amt\tutorial.amt.xml --out=amt\tutorial_aligned.amt.xml --showcharts
Linux:
./msinspect.sh --manageamt --mode=alignallruns amt/tutorial.amt.xml --out=amt/tutorial_aligned.amt.xml --showcharts

The "--showcharts" argument, which is available for many commands, will display charts relevant to the command. In this case, a chart will be displayed that shows the relationship between retention time and normalized retention time for each of the runs in the AMT database, as they are added.  A good alignment will appear to follow the data very closely.  The "--showcharts" argument becomes infeasible for very large databases.

The chromatography in the four sample runs is very similar, and so the alignment does not have much of an effect.  In a larger database, or a database made up of experiments from different labs or instruments, you would see much larger changes.


Step 5: Match MS1 Features with AMT Database

Next you will perform AMT matching between the MS1 features and the AMT database. The msInspect/AMT will nonlinearly align each run to the AMT database using the MS2 peptides as a guide, match features to the database within wide mass and RT tolerances, and then construct a model for the probabilities of the matches based on match error.

First you will perform a match on just one MS1 feature file, to demonstrate the process. msInspect will produce several charts to help you assess the success of the matching process. These charts are described in detail here.

Windows:
.\msinspect.bat --matchamt amt\tutorial_aligned.amt.xml --outdir=matched --mode=singlems1 --ms1=features\filtered\frac1.filtered.tsv --ms2dir=pepxml --modifications=C71.0366,M15.995V,C3.0101V --minmatchprob=.9 --showcharts
Linux:
./msinspect.sh --matchamt amt/tutorial_aligned.amt.xml --outdir=matched --mode=singlems1 --ms1=features/filtered/frac1.filtered.tsv --ms2dir=pepxml --modifications=C71.0366,M15.995V,C3.0101V --minmatchprob=.9 --showcharts

There are a few important things to note about the match:

  • The probability model relies on adequate information in the AMT database. In this toy example, the AMT database is relatively sparse, so the model is not a perfect fit. It errs on the conservative side, assigning a lower probability to matches than is likely warranted.
    • Another reason for the imperfect fit is that many of the peptides are represented by exactly one observation, which is the same observation that is being matched!  These peptides will not follow the normal distribution of error assumed by the model, since their retention time error will be exactly zero.  In real, large datasets, this effect is almost completely absent, as peptides are observed multiple times.
  • The "modifications" argument is the most important non-required argument (see AMT user's guide). The value provided here is appropriate for acrylamide-labeled data. If no value is provided, the system defaults to values appropriate for unlabeled data.
  • The "minmatchprob" argument sets the minimum AMT match probability that will be kept in the output files. AMT matching probability is stored as PeptideProphet probability in the matching files. It is advisable only to keep high-probability matches, but you could also keep all matches initially (by leaving out this argument) and, later, filter the output on "minpprophet" as described above.
Now that you have seen a single match in detail, you can perform a single command to match all files at once. Note: typically, you will be matching many files at once with this command, so the --showcharts argument is left out, but it is possible to see charts for all matched files by providing this argument, if you want.

Windows:
.\msinspect.bat --matchamt amt\tutorial_aligned.amt.xml --outdir=matched --mode=ms1dir --ms1dir=features\filtered --ms2dir=pepxml --minmatchprob=.9 --modifications=C71.0366,M15.995V,C3.0101V
Linux:
./msinspect.sh --matchamt amt/tutorial_aligned.amt.xml --outdir=matched --mode=ms1dir --ms1dir=features/filtered --ms2dir=pepxml --minmatchprob=.9 --modifications=C71.0366,M15.995V,C3.0101V

AMT matching should succeed for all of these runs.  If matching failed (meaning the EM algorithm did not converge in the maximum number of iterations), details about the failure would be displayed as output.

You may compare the AMT matching results with the MS2 search results using msInspect's "peptidecompare" command. This command gives you many options for comparing multiple files, peptide by peptide. For a simple comparison of peptide overlap between the AMT and MS2 versions of the same run, try this:

Windows:
.\msinspect.bat --peptidecompare --mode=showoverlap pepxml\frac1.pep.xml matched\frac1.filtered.matched.tsv --minpprophet=.9
Linux:
./msinspect.sh --peptidecompare --mode=showoverlap pepxml/frac1.pep.xml matched/frac1.filtered.matched.tsv --minpprophet=.9

Bear in mind that the AMT database used in this tutorial contained very few runs, and so the number of new peptides confidently matched by AMT is relatively small.

Step 6: Merge AMT Data with MS2 Data

The next step is to augment the MS2 data with your AMT matching data. Of course, you may not want to combine these two types of data -- you may wish to deal with MS2 data and AMT data completely separately. Instructions for that type of workflow are provided in the AMT User's Guide. This tutorial will concentrate on the combined data use case.

There are actually three different steps associated with the merge of AMT and MS2 data, and msInspect/AMT includes a command to perform them all at once. They are:

  1. Add AMT matches as additional results in the MS2 pepXML files
  2. For each AMT peptide, guess a single protein that it may have come from (using a FASTA database)
  3. Run RefreshParser (part of the Trans-Proteomic Pipeline) to guess the rest.
The last two steps are only necessary if you intend to continue with protein inference via ProteinProphet. The reasons for these last two steps, and directions on installing the Trans-Proteomic Pipeline, are provided in the AMT User's Guide. As described there, you will need to make sure that RefreshParser is on your path before running the next command.

For the last two steps, you will need access to the same FASTA database that was used in the MS/MS database search. For the search that created these pepXML files, that database is the August 23, 2006 version (version 3.20) of the human IPI database. The file is prohibitively large to include with this tutorial. As of this time, the database can be downloaded from EBI's FTP site, here: ftp://ftp.ebi.ac.uk/pub/databases/IPI/old/HUMAN/ipi.HUMAN.v3.20.fasta.gz.

Download this file into the same amt_tutorial directory of the tutorial bundle that you have been working on. Unzip the file -- on Linux you can use "gunzip ipi.HUMAN.v3.20.fasta.gz", on Windows you can use WinZip or a similar utility.

Now you are ready to merge the AMT and MS2 results:

Windows:
.\msinspect.bat --combineamtms2 --outdir=merged_ms2_amt\ --ms2dir=pepxml --amtdir=matched --fasta=ipi.HUMAN.fasta.20060823 --restrictcharge --refreshparser --guessproteins
Linux:
./msinspect.sh --combineamtms2 --outdir=merged_ms2_amt/ --ms2dir=pepxml --amtdir=matched --fasta=ipi.HUMAN.fasta.20060823 --restrictcharge --refreshparser --guessproteins

The resulting files in the merged_ms2_amt directory contain both the AMT and the MS2 results. If you run the peptidecompare command from the last step on one of the merged files and its corresponding MS2 file, you will see that the merged file contains all of the peptides in the MS2 file, and more.

Step 7: Run Q3 for Labeled Quantitation

Since these particular data are isotopically labeled, you will want to run a quantitation algorithm to determine abundance ratios for each labeled peptide. A good quantitation algorithm for acrylamide-labeled data like these is Q3, and it is included with msInspect. Unfortunately these quantitation algorithms require access to the original mzXML files, which are too large to include in this tutorial, so you will not be able to run this command. If the mzXML files were available, you would run Q3 on the merged files as follows:

Windows:
.\msinspect.bat --q3new --labeledResidue=C --massDiff=3.01006449 --outdir=merged_ms2_amt\q3 merged_ms2_amt\*pep.xml -dmzXML --stripoldq3 --maxFracDeltaMass=20ppm --minPeptideProphet=.75 --forceoutput
Linux:
./msinspect.sh --q3new --labeledResidue=C --massDiff=3.01006449 --outdir=merged_ms2_amt/q3 merged_ms2_amt/*pep.xml -dmzXML --stripoldq3 --maxFracDeltaMass=20ppm --minPeptideProphet=.75 --forceoutput

We have provided the output of this command in the directory merged_ms2_amt/q3 as a demonstration.

Step 8: Protein Inference

The final step in this workflow is protein inference. For this step, you will need ProteinProphet available and on your path. Note that this step uses the files in the "q3" directory, which were provided in the tutorial; you can run this on the pep.xml files that you created in the merged_ms2_amt directory if you prefer, but quantitation information will not be available.

Windows:
ProteinProphet merged_ms2_amt\q3\*pep.xml protxml\merged.prot.xml
Linux:
ProteinProphet merged_ms2_amt/q3/*pep.xml protxml/merged.prot.xml

This will create a protXML file containing all proteins inferred from the peptide evidence in your merged files. Since these are acrylamide-labeled data, we will need to run Q3ProteinParser (part of the TPP) to process the Q3 quantitative information.

Windows:
Q3ProteinRatioParser protxml\merged.prot.xml
Linux:
Q3ProteinRatioParser protxml/merged.prot.xml

You may now work with this protXML file just as you would a protXML derived only from MS2 data. For comparison purposes, we have provided a protXML file that was created using only the MS2 data, and not the AMT data. msInspect contains tools to compare two protXML files.

Windows:
.\msinspect.bat --protxmlcompare protxml\ms2_only.prot.xml protxml\merged.prot.xml
Linux:
./msinspect.sh --protxmlcompare protxml/ms2_only.prot.xml protxml/merged.prot.xml

Again, bear in mind that the AMT database used in this tutorial contained very few runs, and so the gains from the AMT data are minimal at best. A real AMT match using a database from dozens or hundreds of fractions would provide a significant boost to identified peptides and proteins.


This concludes the msInspect/AMT tutorial. Please provide feedback on the tutorial process, including any errors you may have encountered, in order to help us make the tutorial as useful as possible.