org.fhcrc.cpl.viewer.amt
Class AmtDatabaseBuilder

java.lang.Object
  extended by org.fhcrc.cpl.viewer.amt.AmtDatabaseBuilder

public class AmtDatabaseBuilder
extends java.lang.Object

Builds AMT databases, in various ways If you want to build an AMT database in some other way, put a method in here to do it your way.


Field Summary
static boolean DEFAULT_IGNORE_UNKNOWN_MODIFICATIONS
           
static double DEFAULT_MAX_LEVERAGE_NUMERATOR
           
static double DEFAULT_MAX_STUDENTIZED_RESIDUAL_FOR_INCLUSION
           
static double DEFAULT_MAX_STUDENTIZED_RESIDUAL_FOR_REGRESSION
           
static int DEFAULT_MIN_OBSERVATIONS_FOR_ALIGNMENT_REGRESSION
           
static int DEFAULT_MIN_PEPTIDES_FOR_ALIGNMENT_REGRESSION
           
static float DEFAULT_MS1_MS2_MASS_TOLERANCE_PPM
           
static float DEFAULT_MS1_MS2_TIME_TOLERANCE_SECONDS
           
protected  boolean ignoreUnknownModifications
           
protected  float ms1Ms2MassTolerancePPM
           
protected  float ms1Ms2TimeToleranceSeconds
           
 
Constructor Summary
AmtDatabaseBuilder()
           
 
Method Summary
 void addRunToAmtDatabase(AmtDatabase amtDatabase, FeatureSet newRunFeatureSet, FeatureSet ms1FeatureSet, int scanOrTimeMode, boolean robustRegression, double maxStudentizedResidualForRegression, double maxStudentizedResidualForInclusion, double minDBObservationsForAlignmentMatch, double maxHDiffForAlignmentMatch, int minMatchedPeptidesForAlignment)
          workinprogress Add a run to an AMT database.
static FeatureSet buildFeatureSet(AmtDatabase amtDB, MSRun run, FeatureSet regressionMS2FeatureSet, int chargeForAllFeatures, double minPeptideProphet, boolean robustRegression)
          Create a feature set based on this AMT database, mapped to a particular run.
static double[] calculateStudentizedResiduals(Feature[] allFeatures, int scanOrTimeMode)
          Calculate "studentized residuals" for each feature.
static Feature[] chooseFeaturesWithMaxStudentizedResidual(Feature[] allFeatures, double[] studentizedResiduals, double maxStudentizedResidual)
          Given an array of features, choose just the ones with residual <= of maxStudentizedResidual.
protected static java.util.Map<java.lang.String,java.lang.Integer> countSpectraForPeptides(Feature[] features)
          Count the spectra with each peptide identification.
 AmtDatabase createAmtDatabaseForRun(FeatureSet ms2FeatureSet, FeatureSet ms1FeatureSet, int scanOrTimeMode, boolean robustRegression, Pair<java.lang.Integer,java.lang.Integer> outNumFeaturesChosen, double maxStudentizedResidualForRegression, double maxStudentizedResidualForInclusion)
          Create an AMT database for a single run, doing those arcane things we do to cast out the unworthy features.
static AmtDatabase createAmtDatabaseForRun(FeatureSet featureSet, int scanOrTimeMode, double[] timeToHCoefficients, boolean calculateHydrophobicities, java.util.Map<java.lang.String,java.lang.Integer> spectralCountsMap, boolean ignoreUnknownModifications)
          calculate observed hydrophobicity for all features and add them all as observations to a new database.
 AmtDatabase createAmtDBFromAllPepXml(java.io.File allPepXmlFile, java.io.File ms1Dir, int scanOrTimeMode, FeatureSet.FeatureSelector featureSelector, boolean robustRegression, double maxStudentizedResidualForRegression, double maxStudentizedResidualForInclusion)
          Create an AMT database from an all.pep.xml file -- pass null to the workhorse method so that it knows we need to look in the file from a reference to the mzXML file
 AmtDatabase createAmtDBFromDirectories(java.io.File pepXmlDir, java.io.File ms1Dir, java.io.File mzXmlDir, int scanOrTimeMode, FeatureSet.FeatureSelector featureSelector, boolean robustRegression, double maxStudentizedResidualForRegression, double maxStudentizedResidualForInclusion, boolean align)
          Given a directory full of pepXml files and a directory full of the corresponding mzXML files, create an AMT database from all the pepXml files, using the mzXML files to grab retention times.
 AmtDatabase createAmtDBFromPepXml(java.io.File allPepXmlFile, java.io.File ms1Dir, MSRun run, int scanOrTimeMode, FeatureSet.FeatureSelector featureSelector, boolean robustRegression, double maxStudentizedResidualForRegression, double maxStudentizedResidualForInclusion)
          Create an AMT database from a pep.xml file.
static MS2Modification[] createDefaultNTerminalModifications()
          Create the default N-Terminal modifications that are used by X!Tandem.
 float getMs1Ms2MassTolerancePPM()
           
 float getMs1Ms2TimeToleranceSeconds()
           
 boolean isIgnoreUnknownModifications()
           
protected  void keepOnlySinglyMatchedMS1Features(FeatureSet ms2FeatureSet, FeatureSet ms1FeatureSet)
           
protected static java.util.Map<java.lang.String,java.lang.Double> performInitialRegression(Feature[] allFeatures, int scanOrTimeMode)
           
 void setIgnoreUnknownModifications(boolean ignoreUnknownModifications)
           
 void setMs1Ms2MassTolerancePPM(float ms1Ms2MassTolerancePPM)
           
 void setMs1Ms2TimeToleranceSeconds(float ms1Ms2TimeToleranceSeconds)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MAX_STUDENTIZED_RESIDUAL_FOR_REGRESSION

public static final double DEFAULT_MAX_STUDENTIZED_RESIDUAL_FOR_REGRESSION
See Also:
Constant Field Values

DEFAULT_MAX_STUDENTIZED_RESIDUAL_FOR_INCLUSION

public static final double DEFAULT_MAX_STUDENTIZED_RESIDUAL_FOR_INCLUSION
See Also:
Constant Field Values

DEFAULT_MIN_PEPTIDES_FOR_ALIGNMENT_REGRESSION

public static final int DEFAULT_MIN_PEPTIDES_FOR_ALIGNMENT_REGRESSION
See Also:
Constant Field Values

DEFAULT_MIN_OBSERVATIONS_FOR_ALIGNMENT_REGRESSION

public static final int DEFAULT_MIN_OBSERVATIONS_FOR_ALIGNMENT_REGRESSION
See Also:
Constant Field Values

DEFAULT_MAX_LEVERAGE_NUMERATOR

public static final double DEFAULT_MAX_LEVERAGE_NUMERATOR
See Also:
Constant Field Values

DEFAULT_MS1_MS2_MASS_TOLERANCE_PPM

public static final float DEFAULT_MS1_MS2_MASS_TOLERANCE_PPM
See Also:
Constant Field Values

DEFAULT_MS1_MS2_TIME_TOLERANCE_SECONDS

public static final float DEFAULT_MS1_MS2_TIME_TOLERANCE_SECONDS
See Also:
Constant Field Values

ms1Ms2MassTolerancePPM

protected float ms1Ms2MassTolerancePPM

ms1Ms2TimeToleranceSeconds

protected float ms1Ms2TimeToleranceSeconds

DEFAULT_IGNORE_UNKNOWN_MODIFICATIONS

public static final boolean DEFAULT_IGNORE_UNKNOWN_MODIFICATIONS
See Also:
Constant Field Values

ignoreUnknownModifications

protected boolean ignoreUnknownModifications
Constructor Detail

AmtDatabaseBuilder

public AmtDatabaseBuilder()
Method Detail

createDefaultNTerminalModifications

public static MS2Modification[] createDefaultNTerminalModifications()
Create the default N-Terminal modifications that are used by X!Tandem. In later X!Tandem versions, these will be declared explicitly. If they're not, need to add them.

Returns:

createAmtDatabaseForRun

public static AmtDatabase createAmtDatabaseForRun(FeatureSet featureSet,
                                                  int scanOrTimeMode,
                                                  double[] timeToHCoefficients,
                                                  boolean calculateHydrophobicities,
                                                  java.util.Map<java.lang.String,java.lang.Integer> spectralCountsMap,
                                                  boolean ignoreUnknownModifications)
calculate observed hydrophobicity for all features and add them all as observations to a new database. This one's the workhorse. Everything else calls it eventually

Parameters:
scanOrTimeMode -
Returns:

performInitialRegression

protected static java.util.Map<java.lang.String,java.lang.Double> performInitialRegression(Feature[] allFeatures,
                                                                                           int scanOrTimeMode)

calculateStudentizedResiduals

public static double[] calculateStudentizedResiduals(Feature[] allFeatures,
                                                     int scanOrTimeMode)
Calculate "studentized residuals" for each feature. This is based on the residual of the feature, and on its leverage. It's a measure of the certainty that the feature is messed up w.r.t. the rest of the features.

Parameters:
allFeatures - the residual squared must be in order to be dropped from the set
Returns:

chooseFeaturesWithMaxStudentizedResidual

public static Feature[] chooseFeaturesWithMaxStudentizedResidual(Feature[] allFeatures,
                                                                 double[] studentizedResiduals,
                                                                 double maxStudentizedResidual)
Given an array of features, choose just the ones with residual <= of maxStudentizedResidual. This is done twice, so it's a method.

Parameters:
allFeatures -
studentizedResiduals -
maxStudentizedResidual -
Returns:

countSpectraForPeptides

protected static java.util.Map<java.lang.String,java.lang.Integer> countSpectraForPeptides(Feature[] features)
Count the spectra with each peptide identification. Assumes filtering has already been done such that these features are the ones we consider good

Parameters:
features -
Returns:

createAmtDatabaseForRun

public AmtDatabase createAmtDatabaseForRun(FeatureSet ms2FeatureSet,
                                           FeatureSet ms1FeatureSet,
                                           int scanOrTimeMode,
                                           boolean robustRegression,
                                           Pair<java.lang.Integer,java.lang.Integer> outNumFeaturesChosen,
                                           double maxStudentizedResidualForRegression,
                                           double maxStudentizedResidualForInclusion)
Create an AMT database for a single run, doing those arcane things we do to cast out the unworthy features. Here's the sequence: 1. Calculate the leverage of all features 2. Toss out all features with leverage > 4/n, perform a regression to map time to normalized H (first cut) 3. Calculate the studentized residual of all features from the original set 4. Take out features with studentized residual < 2.0, perform a regression to map time to normalized H (final version) 5. Take features with studentized residual < 2.0, insert those peptides into the database

Parameters:
ms2FeatureSet -
ms1FeatureSet -
scanOrTimeMode -
robustRegression -
outNumFeaturesChosen -
maxStudentizedResidualForRegression -
maxStudentizedResidualForInclusion -
Returns:

createAmtDBFromDirectories

public AmtDatabase createAmtDBFromDirectories(java.io.File pepXmlDir,
                                              java.io.File ms1Dir,
                                              java.io.File mzXmlDir,
                                              int scanOrTimeMode,
                                              FeatureSet.FeatureSelector featureSelector,
                                              boolean robustRegression,
                                              double maxStudentizedResidualForRegression,
                                              double maxStudentizedResidualForInclusion,
                                              boolean align)
                                       throws java.lang.Exception
Given a directory full of pepXml files and a directory full of the corresponding mzXML files, create an AMT database from all the pepXml files, using the mzXML files to grab retention times. Highly dependent on naming conventions. If you're starting with an all.pep.xml file, use the method below instead. The nice thing about this is that it can check for existence of all the mzXml files before starting work.

Parameters:
pepXmlDir -
mzXmlDir -
scanOrTimeMode -
Returns:
Throws:
java.lang.Exception

keepOnlySinglyMatchedMS1Features

protected void keepOnlySinglyMatchedMS1Features(FeatureSet ms2FeatureSet,
                                                FeatureSet ms1FeatureSet)
Parameters:
ms2FeatureSet -
ms1FeatureSet -

addRunToAmtDatabase

public void addRunToAmtDatabase(AmtDatabase amtDatabase,
                                FeatureSet newRunFeatureSet,
                                FeatureSet ms1FeatureSet,
                                int scanOrTimeMode,
                                boolean robustRegression,
                                double maxStudentizedResidualForRegression,
                                double maxStudentizedResidualForInclusion,
                                double minDBObservationsForAlignmentMatch,
                                double maxHDiffForAlignmentMatch,
                                int minMatchedPeptidesForAlignment)
workinprogress Add a run to an AMT database. After linear H calculation, if we have significant agreeing peptide overlap (at least minMatchedPeptides that match database entries with at least minDBObservationsForAlignmentMatch observations, within maxHDiffForAlignmentMatch H units) then use those agreeing peptides as points for alignment, and set all the rest of the H values according to that alignment.

Parameters:
amtDatabase -
newRunFeatureSet -
scanOrTimeMode -
robustRegression -
maxStudentizedResidualForRegression -
maxStudentizedResidualForInclusion -
maxHDiffForAlignmentMatch -
minMatchedPeptidesForAlignment -

createAmtDBFromAllPepXml

public AmtDatabase createAmtDBFromAllPepXml(java.io.File allPepXmlFile,
                                            java.io.File ms1Dir,
                                            int scanOrTimeMode,
                                            FeatureSet.FeatureSelector featureSelector,
                                            boolean robustRegression,
                                            double maxStudentizedResidualForRegression,
                                            double maxStudentizedResidualForInclusion)
                                     throws java.lang.Exception
Create an AMT database from an all.pep.xml file -- pass null to the workhorse method so that it knows we need to look in the file from a reference to the mzXML file

Parameters:
allPepXmlFile -
scanOrTimeMode -
featureSelector -
robustRegression -
Returns:
Throws:
java.lang.Exception

createAmtDBFromPepXml

public AmtDatabase createAmtDBFromPepXml(java.io.File allPepXmlFile,
                                         java.io.File ms1Dir,
                                         MSRun run,
                                         int scanOrTimeMode,
                                         FeatureSet.FeatureSelector featureSelector,
                                         boolean robustRegression,
                                         double maxStudentizedResidualForRegression,
                                         double maxStudentizedResidualForInclusion)
                                  throws java.lang.Exception
Create an AMT database from a pep.xml file. If no mzXml file is specified, look in the pepXml fractions themselves to find mzXML files Warning: Since PepXmlLoader runs through the pepXml file sequentially, we have to load the databases one at a time... so if an mzXml file is missing at the very end, we won't find it until we've done most of the work. TODO: currently this will fail if you're on an OS that uses a different file separator character from the one the all.pep.xml file was created on, because fraction.getSpectrumPath() will have the wrong characters. Should be converted

Parameters:
allPepXmlFile -
scanOrTimeMode -
Returns:
Throws:
java.lang.Exception

buildFeatureSet

public static FeatureSet buildFeatureSet(AmtDatabase amtDB,
                                         MSRun run,
                                         FeatureSet regressionMS2FeatureSet,
                                         int chargeForAllFeatures,
                                         double minPeptideProphet,
                                         boolean robustRegression)
Create a feature set based on this AMT database, mapped to a particular run. Make a BUNCH of assumptions.

Parameters:
run -
regressionMS2FeatureSet -
Returns:

getMs1Ms2MassTolerancePPM

public float getMs1Ms2MassTolerancePPM()

setMs1Ms2MassTolerancePPM

public void setMs1Ms2MassTolerancePPM(float ms1Ms2MassTolerancePPM)

getMs1Ms2TimeToleranceSeconds

public float getMs1Ms2TimeToleranceSeconds()

setMs1Ms2TimeToleranceSeconds

public void setMs1Ms2TimeToleranceSeconds(float ms1Ms2TimeToleranceSeconds)

isIgnoreUnknownModifications

public boolean isIgnoreUnknownModifications()

setIgnoreUnknownModifications

public void setIgnoreUnknownModifications(boolean ignoreUnknownModifications)


Fred Hutchinson Cancer Research Center