org.fhcrc.cpl.toolbox.proteomics
Class ProteinUtilities

java.lang.Object
  extended by org.fhcrc.cpl.toolbox.proteomics.ProteinUtilities

public class ProteinUtilities
extends java.lang.Object

This class doesn't do parsing of ProtXML files itself. It uses ProtXmlReader for that. This is for utilities to work with the output of ProtXmlReader


Field Summary
protected static org.apache.log4j.Logger _log
           
 
Constructor Summary
ProteinUtilities()
           
 
Method Summary
static void assignContainingProteinsToFeatures(Feature[] ms1FeaturesWithPeptides, java.io.File fastaFile)
           
static void createPepXml(Feature[] featuresWithPeptides, java.io.File fastaFile, java.io.File outputFile)
          Create a pepxml file that can be used with proteinprophet
static java.util.Map<java.lang.String,java.util.Set<java.lang.String>> findFastaProteinsForPeptides(java.util.Collection<java.lang.String> peptideList, java.io.File fastaFile)
           
static java.util.List<java.io.File> findSourcePepXMLFiles(java.io.File protXmlFile)
          Look inside a protXML file to find the source PepXML files
static PanelWithChart generateSensSpecChart(java.io.File protXmlFile)
          Generate a sensitivity-and-specificity-curve chart
static PanelWithChart generateSensSpecChart(java.io.File protXmlFile1, java.io.File protXmlFile2)
          Generate a sensitivity-and-specificity-curve chart for two files.
protected static Pair<java.lang.Character,java.lang.Character> getPrevNextAAs(java.lang.String peptide, Protein protein)
           
static void guessAllProteinsForFeaturePeptides(FeatureSet[] featureSets, java.io.File fastaFile, Protein[] fastaProteins)
          For every feature with a peptide in every featureset passed in, find ALL proteins in the fasta file that contains that peptide, and assign them all.
static void guessProteinsForFeaturePeptides(FeatureSet[] featureSets, java.io.File fastaFile)
          cover method
static void guessProteinsForFeaturePeptides(FeatureSet[] featureSets, java.io.File fastaFile, Protein[] fastaProteins)
          For every feature with a peptide in every featureset passed in, find some protein in the fasta file that contains that peptide, and assign it
static void guessProteinsForFeaturePeptides(FeatureSet featureSet, java.io.File fastaFile)
          helper method for one featureset
static void guessProteinsForFeaturePeptides(FeatureSet featureSet, Protein[] fastaProteins)
          helper method for one featureset
static java.util.Map<java.lang.String,ProtXmlReader.Protein> loadFirstProteinOccurrence(java.io.File protXmlFile, java.util.Collection<java.lang.String> proteinNames, float minProteinProphetGroupProbability)
          Returns a map from protein names to first occurrences of proteins in the protXML file, if they exist with minimum probability
static ProtXmlReader.Protein loadFirstProteinOccurrence(java.io.File protXmlFile, java.lang.String proteinName)
          returns null if protein not found
static ProtXmlReader.Protein loadFirstProteinOccurrence(java.io.File protXmlFile, java.lang.String proteinName, float minProteinProphetGroupProbability)
          returns null if protein not found with a minimum group probability
static java.util.Map<java.lang.String,java.util.Set<java.lang.Integer>> loadPeptideProteinGroupMapFromProtXML(java.io.File protXmlFile, double minProteinProphet)
          Create a mapping between all peptides noted in the protXml file, and all protein groups that they are associated with.
static java.util.Map<java.lang.String,java.util.Set<java.lang.String>> loadPeptideProteinMapFromProtXML(java.io.File protXmlFile, double minProteinProphet)
          Create a mapping between all peptides noted in the protXml file, and all proteins that they are associated with.
static java.util.List<ProteinGroup> loadProteinGroupsFromProtXML(java.io.File protXmlFile)
          WARNING! after running through the iterator, the proteins in each group disappear
static java.util.Map<java.lang.String,java.util.List<ProtXmlReader.Peptide>> loadProteinPeptideMapFromProtXML(java.io.File protXmlFile, double minProteinProphet)
          Create a mapping between all proteins in the protxml file, and all peptides associated with them.
static java.util.Map<java.lang.String,java.util.Set<java.lang.String>> loadProteinPeptideSequenceMapFromProtXML(java.io.File protXmlFile, double minProteinProphet)
          Create a mapping between all proteins in the protxml file, and all peptides associated with them.
static java.util.Map<java.lang.String,java.lang.Float> loadProteinProbabilityMapFromProtXML(java.io.File protXmlFile)
          Create a mapping between all proteins in the protxml file, and all peptides associated with them.
static java.util.ArrayList<Protein> loadProteinsFromFasta(java.io.File fastaFile)
          Load all proteins from a fasta file
static java.util.List<ProtXmlReader.Protein> loadProtXmlProteinsFromProtXML(java.io.File protXmlFile)
           
static java.util.Set<java.lang.String> loadTrypticPeptidesFromFasta(java.io.File fastaFile)
           
static java.util.Map<java.lang.String,java.util.List<Protein>> mapPeptidesToProteins(java.util.Set<java.lang.String> peptides, java.io.File[] protXmlFiles, Protein[] proteinsInFasta, double minProteinProphet)
          Map peptides to proteins using multiple protxml files
static java.util.Map<java.lang.String,java.util.List<Protein>> mapPeptidesToProteins(java.util.Set<java.lang.String> peptides, java.io.File protXmlFile, Protein[] proteinsInFasta, double minProteinProphet)
          Map peptides to proteins.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_log

protected static org.apache.log4j.Logger _log
Constructor Detail

ProteinUtilities

public ProteinUtilities()
Method Detail

loadPeptideProteinMapFromProtXML

public static java.util.Map<java.lang.String,java.util.Set<java.lang.String>> loadPeptideProteinMapFromProtXML(java.io.File protXmlFile,
                                                                                                               double minProteinProphet)
                                                                                                        throws java.io.FileNotFoundException,
                                                                                                               javax.xml.stream.XMLStreamException
Create a mapping between all peptides noted in the protXml file, and all proteins that they are associated with. This means all indistinguishable proteins

Parameters:
protXmlFile -
Returns:
Throws:
CommandLineModuleExecutionException
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

findSourcePepXMLFiles

public static java.util.List<java.io.File> findSourcePepXMLFiles(java.io.File protXmlFile)
                                                          throws java.io.FileNotFoundException,
                                                                 javax.xml.stream.XMLStreamException
Look inside a protXML file to find the source PepXML files

Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

generateSensSpecChart

public static PanelWithChart generateSensSpecChart(java.io.File protXmlFile)
                                            throws java.io.FileNotFoundException,
                                                   javax.xml.stream.XMLStreamException
Generate a sensitivity-and-specificity-curve chart

Parameters:
protXmlFile -
Returns:
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

generateSensSpecChart

public static PanelWithChart generateSensSpecChart(java.io.File protXmlFile1,
                                                   java.io.File protXmlFile2)
                                            throws java.io.FileNotFoundException,
                                                   javax.xml.stream.XMLStreamException
Generate a sensitivity-and-specificity-curve chart for two files. Second one will have dashed lines

Parameters:
protXmlFile1 -
protXmlFile2 -
Returns:
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

loadPeptideProteinGroupMapFromProtXML

public static java.util.Map<java.lang.String,java.util.Set<java.lang.Integer>> loadPeptideProteinGroupMapFromProtXML(java.io.File protXmlFile,
                                                                                                                     double minProteinProphet)
                                                                                                              throws java.io.FileNotFoundException,
                                                                                                                     javax.xml.stream.XMLStreamException
Create a mapping between all peptides noted in the protXml file, and all protein groups that they are associated with.

Parameters:
protXmlFile -
Returns:
Throws:
CommandLineModuleExecutionException
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

loadProteinPeptideMapFromProtXML

public static java.util.Map<java.lang.String,java.util.List<ProtXmlReader.Peptide>> loadProteinPeptideMapFromProtXML(java.io.File protXmlFile,
                                                                                                                     double minProteinProphet)
                                                                                                              throws java.io.FileNotFoundException,
                                                                                                                     javax.xml.stream.XMLStreamException
Create a mapping between all proteins in the protxml file, and all peptides associated with them. This means all indistinguishable proteins

Parameters:
protXmlFile -
Returns:
Throws:
CommandLineModuleExecutionException
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

loadProteinPeptideSequenceMapFromProtXML

public static java.util.Map<java.lang.String,java.util.Set<java.lang.String>> loadProteinPeptideSequenceMapFromProtXML(java.io.File protXmlFile,
                                                                                                                       double minProteinProphet)
                                                                                                                throws java.io.FileNotFoundException,
                                                                                                                       javax.xml.stream.XMLStreamException
Create a mapping between all proteins in the protxml file, and all peptides associated with them. This means all indistinguishable proteins

Parameters:
protXmlFile -
Returns:
Throws:
CommandLineModuleExecutionException
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

loadProteinProbabilityMapFromProtXML

public static java.util.Map<java.lang.String,java.lang.Float> loadProteinProbabilityMapFromProtXML(java.io.File protXmlFile)
                                                                                            throws java.io.FileNotFoundException,
                                                                                                   javax.xml.stream.XMLStreamException
Create a mapping between all proteins in the protxml file, and all peptides associated with them. This means all indistinguishable proteins

Parameters:
protXmlFile -
Returns:
Throws:
CommandLineModuleExecutionException
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

loadProteinGroupsFromProtXML

public static java.util.List<ProteinGroup> loadProteinGroupsFromProtXML(java.io.File protXmlFile)
                                                                 throws java.io.FileNotFoundException,
                                                                        javax.xml.stream.XMLStreamException
WARNING! after running through the iterator, the proteins in each group disappear

Parameters:
protXmlFile -
Returns:
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

loadProtXmlProteinsFromProtXML

public static java.util.List<ProtXmlReader.Protein> loadProtXmlProteinsFromProtXML(java.io.File protXmlFile)
                                                                            throws java.io.FileNotFoundException,
                                                                                   javax.xml.stream.XMLStreamException
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

loadFirstProteinOccurrence

public static ProtXmlReader.Protein loadFirstProteinOccurrence(java.io.File protXmlFile,
                                                               java.lang.String proteinName,
                                                               float minProteinProphetGroupProbability)
                                                        throws java.io.FileNotFoundException,
                                                               javax.xml.stream.XMLStreamException
returns null if protein not found with a minimum group probability

Parameters:
protXmlFile -
proteinName -
minProteinProphetGroupProbability -
Returns:
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

loadFirstProteinOccurrence

public static ProtXmlReader.Protein loadFirstProteinOccurrence(java.io.File protXmlFile,
                                                               java.lang.String proteinName)
                                                        throws java.io.FileNotFoundException,
                                                               javax.xml.stream.XMLStreamException
returns null if protein not found

Parameters:
protXmlFile -
proteinName -
Returns:
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

loadFirstProteinOccurrence

public static java.util.Map<java.lang.String,ProtXmlReader.Protein> loadFirstProteinOccurrence(java.io.File protXmlFile,
                                                                                               java.util.Collection<java.lang.String> proteinNames,
                                                                                               float minProteinProphetGroupProbability)
                                                                                        throws java.io.FileNotFoundException,
                                                                                               javax.xml.stream.XMLStreamException
Returns a map from protein names to first occurrences of proteins in the protXML file, if they exist with minimum probability

Parameters:
protXmlFile -
proteinNames -
minProteinProphetGroupProbability -
Returns:
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

createPepXml

public static void createPepXml(Feature[] featuresWithPeptides,
                                java.io.File fastaFile,
                                java.io.File outputFile)
                         throws CommandLineModuleExecutionException
Create a pepxml file that can be used with proteinprophet

Throws:
CommandLineModuleExecutionException

loadProteinsFromFasta

public static java.util.ArrayList<Protein> loadProteinsFromFasta(java.io.File fastaFile)
Load all proteins from a fasta file

Parameters:
fastaFile -
Returns:

loadTrypticPeptidesFromFasta

public static java.util.Set<java.lang.String> loadTrypticPeptidesFromFasta(java.io.File fastaFile)

findFastaProteinsForPeptides

public static java.util.Map<java.lang.String,java.util.Set<java.lang.String>> findFastaProteinsForPeptides(java.util.Collection<java.lang.String> peptideList,
                                                                                                           java.io.File fastaFile)

assignContainingProteinsToFeatures

public static void assignContainingProteinsToFeatures(Feature[] ms1FeaturesWithPeptides,
                                                      java.io.File fastaFile)
Parameters:
ms1FeaturesWithPeptides -
fastaFile -

mapPeptidesToProteins

public static java.util.Map<java.lang.String,java.util.List<Protein>> mapPeptidesToProteins(java.util.Set<java.lang.String> peptides,
                                                                                            java.io.File[] protXmlFiles,
                                                                                            Protein[] proteinsInFasta,
                                                                                            double minProteinProphet)
                                                                                     throws java.io.FileNotFoundException,
                                                                                            javax.xml.stream.XMLStreamException
Map peptides to proteins using multiple protxml files

Parameters:
peptides -
protXmlFiles -
proteinsInFasta -
Returns:
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

mapPeptidesToProteins

public static java.util.Map<java.lang.String,java.util.List<Protein>> mapPeptidesToProteins(java.util.Set<java.lang.String> peptides,
                                                                                            java.io.File protXmlFile,
                                                                                            Protein[] proteinsInFasta,
                                                                                            double minProteinProphet)
                                                                                     throws java.io.FileNotFoundException,
                                                                                            javax.xml.stream.XMLStreamException
Map peptides to proteins. If the user has supplied a fasta file, do this based on sequence. If the user has supplied a protxml file, do this based on that. protxml file takes precedence

Parameters:
peptides -
Returns:
Throws:
java.io.FileNotFoundException
javax.xml.stream.XMLStreamException

guessProteinsForFeaturePeptides

public static void guessProteinsForFeaturePeptides(FeatureSet featureSet,
                                                   java.io.File fastaFile)
helper method for one featureset

Parameters:
featureSet -
fastaFile -

guessProteinsForFeaturePeptides

public static void guessProteinsForFeaturePeptides(FeatureSet featureSet,
                                                   Protein[] fastaProteins)
helper method for one featureset

Parameters:
featureSet -
fastaProteins -

guessProteinsForFeaturePeptides

public static void guessProteinsForFeaturePeptides(FeatureSet[] featureSets,
                                                   java.io.File fastaFile)
cover method

Parameters:
featureSets -
fastaFile -

guessProteinsForFeaturePeptides

public static void guessProteinsForFeaturePeptides(FeatureSet[] featureSets,
                                                   java.io.File fastaFile,
                                                   Protein[] fastaProteins)
For every feature with a peptide in every featureset passed in, find some protein in the fasta file that contains that peptide, and assign it

Parameters:
featureSets -
fastaProteins -

guessAllProteinsForFeaturePeptides

public static void guessAllProteinsForFeaturePeptides(FeatureSet[] featureSets,
                                                      java.io.File fastaFile,
                                                      Protein[] fastaProteins)
For every feature with a peptide in every featureset passed in, find ALL proteins in the fasta file that contains that peptide, and assign them all. This is much more computationally intensive than finding just one

Parameters:
featureSets -
fastaProteins -

getPrevNextAAs

protected static Pair<java.lang.Character,java.lang.Character> getPrevNextAAs(java.lang.String peptide,
                                                                              Protein protein)


Fred Hutchinson Cancer Research Center