Availableonlineatwww.sciencedirect.com
ScienceDirect
j o u r n a l ho me p ag e :h t t p : / / w w w . e l s e v i e r . c o m / l o c a t e / e u p r o t
Using synthetic peptides to benchmark peptide identification software and search parameters for MS/MS data analysis
Andreas Quandt
a, Lucia Espona
a,b, Akos Balasko
c, Hendrik Weisser
a, Mi-Youn Brusniak
d, Peter Kunszt
b,1, Ruedi Aebersold
a,e,
Lars Malmström
a,∗,1aDepartmentofBiology,InstituteofMolecularSystemsBiology,ETHZurich,Switzerland
bSyBIT,SystemsX.ch,Switzerland
cMTASZTAKI,LaboratoryofParallelandDistributedSystems,Budapest,Hungary
dInstituteforSystemsBiology,Seattle,USA
eFacultyofScience,UniversityofZurich,Switzerland
a r t i c l e i n f o
Articlehistory:
Received4November2013 Receivedinrevisedform 17April2014
Accepted6October2014 Availableonline28October2014
Keywords:
Massspectrometry Dataanalysis
Classicaldatabasesearch Syntheticpeptides Searchengine
a bs t r a c t
Tandemmassspectrometryandsequencedatabasesearchingarewidelyusedinproteomics toidentifypeptidesincomplexmixtures.Herewepresentabenchmarkstudyinwhicha poolof20,103syntheticpeptideswasmeasuredandtheresultingdatasetwasanalyzed usingaround1800differentsoftwareandparametersetcombinations.Theresultsindicate astrongrelationshipbetweentheperformanceofananalysisworkflowandtheapplied parametersettings.Wepresentanddiscussstrategiestooptimizeparametersettingsin ordertosignificantlyincreasethenumberofcorrectlyassignedfragmentionspectraandto maketheanalysismethodrobust.
©2014TheAuthors.PublishedbyElsevierB.V.onbehalfofEuropeanProteomics Association(EuPA).ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/3.0/).
1. Introduction
Tandemmassspectrometry(MS/MS)isthemethodofchoice foridentifyingandquantifyingproteinsincomplexmixtures
Abbreviations: CPM,classicalparametricmodel;FDR,falsediscoveryrate;FME,fragmentmasserror;I,MS2Deisotope;LTQ,lineartrap quadrupole;FT,Fouriertransform;M,Mascot;MC,missedcleavage;N,MS2Denoise;O,OMSSA;PM,parametricmodel;PMC,parametric modelwithcorrectionofthenegativedistributionbasedondecoyhits;PME,parentmasserror;PSMs,peptidespectrummatches;PTMs, post-translationalmodifications;R,precursorRefine;SPM,thesemi-parametricmodel;TPP,Trans-ProteomicPipeline;UIPs,uniquely identifiedpeptides;X,X!Tandem.
∗ Correspondingauthor.Tel.:+41446332195.
E-mailaddress:lars@imsb.biol.ethz.ch(L.Malmström).
1 Currentaddress:S3IT,UniversityofZurich,Switzerland.
becauseofitshighthroughput,sensitivityandrelativeease ofuse.However,the optimalanalysisoftheresultingmass spectrometry data is complex and the subject of continu- ousresearch. Inthe mostfrequentdataanalysisworkflow, fragment ionspectra generatedfrom selectedpeptide ions
http://dx.doi.org/10.1016/j.euprot.2014.10.001
2212-9685/©2014TheAuthors.PublishedbyElsevierB.V.onbehalfofEuropeanProteomicsAssociation(EuPA).Thisisanopenaccess articleundertheCCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/3.0/).
areassignedtotheircorrespondingpeptidesequencesusing software tools commonly referred to as database search engines.Numeroussearchengineshavebeendeveloped,each oneusingadifferentalgorithmtomaximizethenumberof peptide-spectrummatches(PSMs)andtoassess confidence inthecorrectnessoftheirassignments[1,2].Searchengines computeascoreforeachPSMthatreflectsthequalityofthe assignment;theuserdefinesacutoffthatoptimallyseparates correctfrom incorrectassignments.Inmorerecentstudies, thescorecutoffisselectedbyatarget-decoy strategy[3]to achieveaspecificfalsediscoveryrate(FDR)[4]. PSMsabove thecutoffcanbeeithertruepositivesorfalsepositives,and PSMsbelowthecutoffcanbeeitherfalsenegativesortrue negatives. Most search engines use a protein database to define which proteinsare expectedin the sample,thereby reducing the search space significantly. De novo sequenc- ingalgorithms[5]andspectrallibrarysearchengines[6]use nodatabaseoraspectrallibrarydatabase insteadofapro- teindatabase.Wedidnotusethesetypesofsearchengines inthis study sincethey are used less compared tosearch enginesthat rely on proteindatabases. Althoughdatabase searchenginesusevariationsonthesameprinciple,matching ameasuredtoatheoreticalspectrum,theirrespectivesearch resultsdifferevenifthe samedata setissearchedagainst thesamesequencedatabase[7].Searchenginesprovidedif- ferent results because they generate different fractions of correctandincorrectPSM-assignments.Alternatively,thisis probablytheresultofsearch enginesdifferinginthenum- berandtypeofcorrectassignmentstheymake.Determining correctlyidentifiedpeptidesandwronglyidentifiedpeptides ineachdataset,respectively,isthereforeimportanttoeval- uate the performance of a data analysis workflow. Most workflows rely on either manual, expert inspection ofthe searchresults oron softwaretoolstoestimatethe propor- tionoffalseidentifications.Themanualassessmentofthe quality of PSMs is error-prone, dependent on the level of experienceoftheevaluator,inconsistentbetweenevaluators and time-consuming [8]. For computer-based assessments of the quality of PSMs, there are two principal strategies that are primarily applied. The first uses statistical mod- elsasexemplifiedbyPeptideProphetintheTrans-Proteomic Pipeline (TPP) [7,9] or Percolator [10], and the second uses atarget-decoystrategy[3].PeptideProphetreliesonmixture modelstointegrate differenttypesofinformation, suchas thedistributionofsearchenginescores,thelikelihoodthat assigned peptides are present in the sample or the score differencebetweenthebestandsecond-bestassignmentof a spectrum. The mixture models are used to convert this informationinto searchengine-independent scores,reflect- ingtheprobabilitythataparticularPSMhasbeencorrectly assigned [11]. Theprinciple behind atarget-decoy strategy isbasedonthecalculationofFDRsusingthedecoypartof thesearchdatabase[4]toestimatehowmanyfalseassign- mentsareexpectedamongthehitsinthetarget-partofthe database atsomescorecutoffs. Theconsistentdetermina- tion of the FDR for different data sets provided either by statisticalmodelsorbytarget-decoy strategiesiscritical in makingmeaningfulcomparisonsofdifferentsearchengines andparametersets[6,12–18].Toincreasethefractionofcor- rectly assigned spectra and to increase confidence in the
reported resultstheoutput ofmultiplesearch engineswas combined[19–22].
Althoughbothmethodsforassessingthequalityofsearch resultsarewidelyused,itstillremainschallengingtoobjec- tivelyevaluatethedifferentanalysisworkflows,searchtools and parametersetsandtoprovethatonehasabetterper- formancethantheother.Thedifficultiesarisefortwomajor reasons. The first isthe absence ofa complex sample set ofknowncompositionalthoughthisdifficultyisincreasingly mitigatedbythefallingpricestocreatesyntheticpeptides.The secondistheinabilitytosystematicallyassesstheinfluenceof variousperturbationstotheanalysisworkflow.Moststudies thatpresentanewworkflowusespecificbiologicalsamples [23],orso-calledspike-insamples[24],andincreasinglycom- plex syntheticsamples[25,26] toevaluateits performance.
Usingabiologicalsamplesuchasadigestedcelllysatehas theadvantagethatpeptide-to-spectrummatchingiscarried out under realistic conditions, i.e. on a sample that con- tainsthousandsofpeptidescoveringawiderangeofsignal intensities. However,these studiesare limited because the truepeptidecompositionofsuchsamplesisunknown,partly duetothepresenceofpeptidesthathavepost-translational modifications (PTMs) or that are the result ofnon-specific andmissedcleavages[27].Itisoftennotpossibletocontrol these biologicalevents preventing areliablecomparisonof thesearchresults.PSMscannotbecategorizedascorrector incorrectwithconfidencebecausethereisnoevidencethat the matching peptide trulyexists inthe measured sample anditisdifficulttoestimatehowcloselythegeneratedsetof identifiedpeptidesmatchesthemaximallyachievableset.An alternativetobiologicalsamplesistheuseofspike-insamples.
Spike-insamplesusuallyconsistofamixofafewdozenpuri- fiedrecombinantproteinsofknownsequenceandquantity thataredigestedwithtrypsintogenerateapeptidemixtureof knowncomposition.Suchsamplesarethenanalyzedeitherby themselves,orinacomplexbackgroundsampleofunknown composition. ThepresenceofPTMsisnolonger aconcern [28].However,non-specificandmissedcleavages,artifactual modificationsgeneratedduringthesampleprocessingandthe presenceofproteinsintroducedasminorcontaminantsofthe purifiedreferenceproteinsarestillpossibleand,therefore,the peptidecompositionofsuchsamplesisstillunknown,evenif theyareanalyzedwithoutaddedbackground.Inaddition,if analyzedwithoutaddedbackground,thecomplexityofsuch samplesdoesnotmatchthecomplexityandintensitydistri- butionofmostbiologicalsamples.Thespectraproducedoften containfewersignalsfrompeptidesco-fragmentedwiththe targetpeptidesand,ifthesedooccur,theyshowalowersig- nalintensitycomparedtothatoftheusualbiologicalsamples.
Thesefactorsaffectthecomplexityofthespectrumpattern andlowerthethresholdatwhichtargetedpeptidesarecor- rectlymatchedusingasearchengine.Thesecondproblem, thatis,theinfluenceofperturbations–suchasvariationsin theparameterset–onthesearchresults,isoftennotsystem- aticallyaddressedintherelatedstudiesbecausemostdata analysisworkflowsarenotautomatedtoalevelwheremany differentsearchparametersetscanbeeasilytestedandcom- pared.
Herewepresentastudyonsystematicallyvaryingparam- etersandsearchengines,inwhichweinvestigatetheimpact
thesehaveonthesensitivityoftheanalysisofadatasetgen- eratedfromacomplexsyntheticsampleofmorethan20,100 peptides,previouslyobservedmyMSinhumansamples.The complexityofthesyntheticsampleissufficientforarealistic testandallowedustoaccuratelyestimateboththesensitivity andspecificityofthesearchresults.Thesampledoeshowever notmimicbiologicalsamplesintermsofdynamicrange.We propose a strategy to find optimal search parameters and presentdetailedinformationonhowthevariousparameters influencetheresults.Thepeaklistfilesandtheidentification resultsare publicly availableinthe PeptideAtlas repository (https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/PASS View?identifier=PASS00090).
2. Methods
2.1. Preparationofthebenchmarksample
20,103 unique peptides were synthesized by JPT Peptide Technologies GmbH using SPOT-synthesis [28] (suppl. syn- thetic peptidesequences.txt) and crude synthesis products werealiquotedataconcentrationofapproximately60nmol/l perpeptidein96wellmicrotiterplates.5lfromeachwell wasusedtocreateintermediatepoolsthatweresubsequently usedtocreateapoolofallpeptides,eachatanestimatedfinal concentrationofabout3pmol/lperpeptide.
2.2. Massspectrometry
The synthetic peptide pool was measured on two liquid chromatography–tandem mass spectrometers (LC–MS/MS), an LTQ-FT Ultra (Thermo Fischer Scientific) coupled to a TempoNanoLC(AppliedBiosystems)andanLTQ-OrbitrapXL (ThermoFischerScientific)witha1D-NanoLC-Ultrasystem (Eksigent).Bothsystemswereequippedwithastandardnano- electrospraysourceandthechromatographicseparationwas performedwiththesamebuffersystem:97%water,3%ace- tonitrile and 0.1%formicacid constituted mobile phaseA, whilemobilephaseBcomprised3%water,97%acetonitrile and 0.1% formicacid. For each LC–MS/MS run, 2l ofthe peptidepoolwereinjectedontoan11cm×0.075mmI.D.col- umnpackedin-housewithMagicC18material(3mparticle size, 200 ˚A pore size, Michrom Bioresources). Thepeptides wereloadedontotheLCcolumnataflowrateof300nl/min andelutedwitheitherofthe twofollowinggradients:Gra- dient #1: 0–5min=5% phase B solution, 5–95min=linear gradientfrom5–35%phaseBsolution,95–97min=lineargra- dient from 35–95%phase B solution and 97–107min=95%
phase B solution. Gradient #2: 0–5min=5% phase B solu- tion,5–125min=lineargradientfrom5–35%phaseBsolution, 125–127min=linear gradientfrom 35–95%phase Bsolution and 127–137min=95% phase B solution. The ion source and transmission settings for both mass spectrometers were:Sprayvoltage=2kVwithcapillarytemperature=200◦C, capillaryvoltage=60Vandtubelensvoltage=135V.Allmea- surementsofthesyntheticpeptidemix,onboththeFTand Orbitrapinstruments,wereacquiredindata-dependentmode, selectinguptofiveprecursorsfromanMS1scan(resolution fortheFT:100,000;resolutionfortheOrbitrap: 60,000) ina
rangeof350–1600m/zforcollision-induceddissociation(CID).
Theiontargetvalueswere1,000,000(ormaximum500msfill time)forfullscansand10,000(ormaximum200msfilltime (Orbitrap)and300ms(FT))forfragmentionscans,respectively.
Ionswithasingleor unknownchargestatewereautomat- icallyrejected. Thesyntheticpeptidemixwasmeasuredin duplicates on eithermass spectrometer,using each of the twospecifiedgradients,resultinginatotalofeightLC–MS/MS datasets.Themassspectrometerswereequilibratedusinga standardmixturebeforeeachsampleinjection.
2.3. Dataanalysis
Thedataanalysisworkflowsrelyonsearchingtheacquired fragment ionspectra againsta proteinsequence database.
WethereforederivedahumansubsetfromUniprotKB/Swiss- Prot,version57.1[29],performingthefollowingsteps.Firstly, thecomplete UniprotKB/Swiss-Protdatabasewasconverted from itsoriginalDAT formatinto aFASTAfileincluding all known splice variants and isoforms. For this process, we useduniprotdat2fasta.pl,which ispart ofInSilicoSpectro,a bioinformatics toolscollection usedformass spectrometry [30]. Secondly,thesubsetofhumanproteinsequences was extractedwithsubsetdb,whichispartoftheTPP[7].Inthe finalstep,atarget-decoydatabase withreverse decoyswas generatedusingfasta-decoy.pl,anothertoolincludedinInSil- icoSpectro.Aftergeneratingthedatabase,thedataacquired fromthemassspectrometerswereconvertedintopeaklistsof theformatmzXML[31].Thisprocesswasaccomplishedusing ReAdW.exe,whichispartoftheTPP.Next,wecreatedseven workflowsinordertosearchthesepeaklistsagainsttheprevi- ouslygenerateddatabase.Forthreeworkflowsthiswasdone usingasingledatabasesearchengine,specificallyMascot[16]
(version2.3) (M),OMSSA[18](version2.1.7)(O)andX!Tan- dem[17](version2009.04.01,k-scoreplugin)(X);forafurther threeworkflowstwodatabasesearchengineswerecombined, specificallyMascot-OMSSA(MO),Mascot-X!Tandem(MX)and X!Tandem-OMSSA(XO);andforthefinalworkflowthethree databasesearchenginestestedwerecombined(MXO).Ineach workflow,thesearch engineoutputwas convertedinto the pepXMLformatandscoredusingPeptideProphet[7].Thefinal scoredistributionmodelwascalculatedwithiProphet[20]and the resultconvertedfrom itspepXMLformat intoasimple textformat(CSV) usingpepxml2csv[32]. Inadditiontothe iProphetscore,weappliedthetarget-decoystrategyandcalcu- latedthecorrespondingFDRvalueswithfdr2probability[32].
Theresultingdatamatrixwasthenimportedinto aMySQL database[33]toevaluatetheperformanceofeachdataanaly- sisbasedonthenumberofuniquelyidentifiedpeptides(UIP) thatmatchedasyntheticpeptidesequenceat1%FDR.
Eachofthesevenworkflowswastestedwith54different parametersettingstoinvestigatetheinfluenceofthefollow- ing searchparameters: theprecursor masserror(PME),the fragment masserror (FME), the number ofallowedmissed cleavages(MC)andthedifferentscoringmodelsofPeptide- Prophet.ValuesforthePMEweresetto25ppm,15ppmand 5ppm, respectively. We chose values of 0.8Da, 0.6Da and 0.4Da forthe FME, respectively (Fig. 1). Wealso alloweda maximum of one or two missed cleavage events, respec- tively, and defined carbamidomethylation of cysteine as a
Fig.1–Overviewoftheexperimentaldesign.Thediagramshowsthestepsthatweresystematicallyvaried.Thespectral pre-processingwasonlycarriedoutontheoptimalcombinationofsearchengines.
fixedmodification.ForPeptideProphet,thefollowingstatisti- calmodelsweretested:theclassicalparametricmodel(CPM), theparametricmodel(PM)withcorrectionofthenegativedis- tributionbasedondecoyhits(PMC)andthesemi-parametric model(SPM).Intotal,756dataanalyseswereproducedinorder tostudytheperformancevariationofsingleandmulti-search engine workflows induced by changing parameter settings (suppl.Fig.S1) (Supplementary figures are availablefreeof chargeviatheInternetathttp://pubs.acs.org).
Inordertoinvestigatepeaklistpre-processing,westud- iedtheeffectsofthefollowingfilters:MS2Deisotope(I),which deisotopes;MS2Denoise(N),whichdenoises;andprecursor- Refine(R),whichrefinestheprecursorionmass;alloperating onMS2 spectra.All three were implementedinmsconvert (revision2238),whichformspartofProteoWizard[34].Since thesefiltershavetobeappliedintheprocessofconverting theoriginalRAWformattoapeaklistfile,weusedmscon- vertinsteadofReAdW.exeforthisinvestigation.Basedonthe originalinstrumentfiles,wegeneratedpeaklistfilesineight differentfiltercombinations,oneusingnofilter,threeusing asinglefilter,threewheretwofilterswerecombinedandone whereallthreefilterswerecombined.Thepeaklistfilesof eachfiltercombinationweretestedwiththeMXO-workflow usingthesamesequencedatabaseandthesame54parame- tersettingsaswereusedforthepreviousdataanalyses,plus anadditionalsetofparametercombinations(suppl.Figs.S2 andS10).Intotal,1074searcheswereperformedtoinvesti- gatetheeffectofpre-processingthepeaklistsonworkflow performance.
Detection ofpeptide synthesis by products was carried out bygeneratingadatabase withall permutationsofsin- gle amino acids missing. Thisdatabase was then used as describedabove.
OpenMSMS1featuredetectionandpeptidequantification wascarriedoutasdescribedbyWeisseretal.[35].
Allthedataanalysespresentedinthis studywereauto- matedandexecutedusingtheworkflowsystemP-GRADE[36].
3. Results
Thebenchmark study we present was performed in three steps. Firstly, we explored the influence of critical search parametersontheperformanceofindividualsearchengines.
Secondly,wetestedhowthecombinationsoftheresultsfrom different search engines affected the identification perfor- mance.Thirdly,wemeasuredtheinfluenceofpre-processing thepeaklistsonsearchperformance.Theresultsoftheanaly- sesweperformedwerecomparedbasedonthenumberoftrue positiveUIPsat1%FDR.Aschematicoftheworkflowusedto carryoutallthreestepsispresentedinFig.1andasummary ofalltheresultsisgiveninTable1.
3.1. Singlesearchengineperformance
Wegenerated three workflows each incorporating a single searchengine,onebasedonMascot(M),oneonOMSSA(O) andoneonX!Tandem(X).TheX!TandemK-scoreplug-inwas usedincombinationwithX!Tandemthroughout.Weassessed eachrespectiveperformancebychangingparameterssuchas
theinstrumenttype,masserrorsforthefragmention(FME) andfortheparention(PME;seeFig.1).
Across all theparameter settings tested, the numberof identifiedpeptideswasconsistentlyhigherfordataacquired on the Orbitrapinstrument compared todatafrom the FT (suppl.Figs.S3–5).Thisobservationshouldnotbeinterpreted asonetypeofinstrumentbeingsuperiortoanother,itrather simplymeansthatoneinstrumentwasbetteroptimizedcom- pared to the other at the time of acquisition. The other parametersthathadamajorinfluenceonsearchperformance werethemasserrorsFMEandPME.Theresultsindicatedthat loweringtheFMEincreasedthesearchperformance,butthat reducingthePMEbelow15ppmcausedalossofperformance (suppl.Figs.S6andS7).Thedataforthesingledatabasesearch engineworkflowsareshowninFig.2.EachboxplotinFig.2 consistsof54 independentsearchresults,oneperparame- ter set. Thefigureshows the number ofcorrectUIPs. Two out ofthe 756 dataanalysesfailed becausethe qualitative requirementsformodelingthenegativepeptidedistribution inPeptideProphetwerenotfulfilled.
The best performance among the single search engine workflowswasachievedwithOMSSA(Table1).Atotalof6489 correctUIPswere foundwhenasmallFMEof0.4Da anda moderatePMEof15ppmwereused.Non-optimalparameter settings,suchaslargermasserrors(FME:0.8Da,PME:25ppm), causedadropinthenumberofcorrectUIPsto614,thelow- estidentificationrateinthisstudy.Inaddition,theuseofa smallermasserrorissub-optimal,asusingaPMEof5ppm resultedin5900correctUIPs,comparedtothe6489identifi- cationsat15ppmPME.Anotherparameterwithanoticeable impactonperformancewasthetypeofmassspectrometer used.Applyingthesamemasserrorsettingsresultedinadrop from6489correctUIPs(Orbitrap)to6097correctUIPs(FT).
TheMascot(M)workflowidentifiedamaximum of6401 correctUIPs(Table1)overallparameterstested.Incontrast toOMSSA,MascotperformedbetterwithalessstrictPMEof 25ppminconjunctionwithasmallFMEof0.4Da,althoughthe performancedifferencebetweenaPMEof15ppmand25ppm wasinsignificant,with6401comparedto6391correctUIPs, respectively. Other parameters had a smallereffect on the performance.
ThethirdsearchenginewetestedwasX!Tandem.Com- paredtotheO-andM-workflows,X!Tandemwasmorerobust whentherewere changesofthemasserrorsFMEandPME.
Inparticular,varyingFMEdidnothaveasignificantimpact.
Forexample,themaximalperformanceof6219correctUIPs was achievedwhenapplyingaPMEof15ppm,but didnot changewhentheFMEwasvariedbetween0.4,0.6and0.8Da.
The same finding applied to data analyses with aPME of 25ppmand5ppm,whichresultedin6170and5856correct UIPs,respectively,andthesewereunaffectedbyvariationof theFMEwithintherangetested(Table1).
3.2. Performanceofmultiplesearchenginesearches
We combined the output of multiple search engines to investigateifthecombinedoutputcouldimprovesearchper- formance. Alltwo-waycombinations (MO,MX,XO)and the combination ofallthree(MXO)weretestedusingthesame parameter settingsused forthesinglesearch engines.The
Table1–Performanceofthevarioustoolcombinations.Eachtoolcombinationwastestedusingseveralparametersets.
Searchengines Abbreviation Max/Min ofcorrect
UIPs
Maxof correctUps
(%)
Most influential parameter
Best parameters PME/FME/MC/PM
Note
OMSSA O 6489/614 32.28 FME 15/0.4/1/SPM Bestsingleengine
MASCOT M 6401/5330 31.84 PME/FME 25/0.4/1/SP Mostrobustengine
w.r.t.MS
X!Tandem X 6219/5244 30.94 MS 15/0.4–0.8/1/CPM Mostrobustengine
w.r.t.PME/FME
Mascot/OMSSA MO 6674/5240 33.20 PME/FME 15/0.4/1/CPM
Mascot/X!Tandem MX 6595/5330 32.81 PME/FME 25/0.4/1/PMC
X!Tandem/OMSSA XO 6769/5510 33.67 PME/FME 15/0.4/1/CPM
Mascot/X!Tandem/OMSSA MXO 6814/5846 33.90 PME/FME 25/0.4/1/CPM Bestsearch
combination
Denoise N 6807/5890 33.86 PME/FME 15/0.4/2/CPM
Deisotop I 6802/5844 33.84 PME/FME 15/0.4/1/CPM
Refine R 6910/5821 34.37 PME/FME 5/0.4/2/PMC Effectivefor
LTQ-Orbitrap
Denoise/Deisotop NI 6828/5919 33.97 PME/FME 15/0.4–0.6/1/CPM
Refine/Denoise NR 6910/5916 34.37 PME/FME 5/0.4/2/PMC Bestpreprocessing
forFT
Refine/Deisotop IR 6909/5869 34.37 PME/FME 5/0.4/1/PMC
Refine/Deonise/Deisotop NIR 6938/5945 34.51 MS 5/0.4–0.6/1/CPM Bestpreprocessing forLTQ-Orbitrap
Fig.2–Thenumberofcorrectlyidentifiedpeptidesperworkflowisshowninaboxplotrepresentation.Atotalof54 parametersetsandtwomassspectrometertypeswereusedforeachworkflow.Differentsingledatabasesearchengines wereused,Mascot(M),OMSSA(O)orX!Tandem(X),respectively,inadditiontocombinationsoftwoorthreesearch engines:Mascot-OMSSA(MO),Mascot-X!Tandem(MX),X!Tandem-OMSSA(XO)andMascot-X!Tandem-OMSSA(MXO).The upperwhiskerindicatesthenumberofpeptidesidentifiedusinganoptimalparametersetandtheredlinemarksthemean numberofpeptidesidentifiedfortheparametersetstestedwithinaworkflow.Theboxitselfcircumscribesthesearch resultsbetweenthefirstandthethirdquartile.Thelargerthespread,themoresensitivewasthesearchtotheparameters.
Thegreendaggersmarkmeasurementswhichareoutsideoftherangebetweenfirstandthirdquartile.(Forinterpretation ofthereferencestocolorinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)
searchengineresultswerecombinedusingiProphet[20].The resultsarepresentedinFig.2andTable1.Thedatashowed animprovedperformanceformulti-searchengineworkflows comparedtosingle-engineworkflows.
Theeffectsofcombiningsearchresultsareapparentfrom theXO-workflow,whichcombinedthefairlyrobustX!Tandem enginewiththe moresensitiveOMSSA tool.Thecombined workflowoutperformedthesingleengineresultsofOMSSA (maximal6489correctUIPs)and X!Tandem(maximal6219 correctUIPs), withamaximum of6769correct UIPsunder optimalsearchparameters(PME=15ppmandFME=0.4).The XO-workflowachieved5510correctUIPsusingtheleastopti- mal parameters tested, which was better than both the X-workflowwith5254correctUIPsandtheO-workflowwith 614 correct UIPs (Table 1). In the XO-workflow, X!Tandem largelycompensatedforthepoorperformanceofOMSSAin casesinwhichsub-optimalparameterswerebeingused,and theperformanceincreasedabovethelevelofasinglesearch enginewhenoptimalparametersettingswereapplied.Similar trendswereobservedforthetwoothertwo-enginecombina- tions.
We also tested the combination of the three database searchenginesinasingleworkflow(MXO),whichresultedin 6814correctUIPs,thehighestidentificationrateoftheseven workflows,if a moderate PME of15ppm and a small FME of0.4Dawasused(Table1).Additionally,withsub-optimal searchparametersettings,suchasaPMEof5ppmandaFME of0.8Da,aminimumof5846correctUIPswasscored.Thisis significantlyhigherwhencomparedtotheotherworkflows.
Theresulting spread of14.2% was the lowest of all work- flowsandthereforeindicatedthattheMXO-workflowwasthe leastdependent onthe search parametersettings.Fig. S12 displayspseudo-receiveroperatingcharacteristiccurves for theoptimalparametersettingsforeachofthesevensearch enginecombinations.Fig. S11 showaVENNdiagram com- paringtheresultsofthreeindividualsearchengineswiththe MXO-workflow.
3.3. Effectofpeaklistpre-processing
We investigated three types of peak list pre-processing:
deisotoping (filter=MS2Deisotope), denoising (fil- ter=MS2Denoise) and refining the mass of the precursor ion (filter=precursorRefine). We tested the performance of these filters using a single filter, the combination of two filtersand combining all three filters,and comparedthese resultsto results generatedwithout afilter. Thedata was processedwiththeMXO-workflowandthesame54parame- tersettingsweusedpreviously(suppl.Figs.S1andS2)plus additionalparametersetsinvestigatingmorenarrowparent andfragmentmasserrors(suppl.Fig.S10).Theresultsofthe pre-processingbenchmarkarepresentedinFig.3.
Pre-processing with MS2Denoise and MS2Deisotope showednoperformanceimprovementcomparedtothe‘no filter’setting.Thisissupportedbythe6807and6802correct UIPsachievedatPME=15ppmandFME=0.4Da,respectively, incomparisonwiththe6805correctUIPsobtainedwithno filter. However, the data indicated that both filters partly compensatedforsub-optimalsettings.Theminimalnumber of correct UIPs increased if a PME of 5ppm with a FME
of 0.4Da was used (for no filter: n=5812, for MS2Denoise:
n=5890andforMS2Deisotope:n=5844)(Fig.3,Table1).The samewasobservedifbothfilterswerecombined:Themax- imal identification rateof 6828correct UIPs (PME=15ppm, FME=0.6Da) was no improvement compared to the ‘no filter’ settings. In contrast, the lowest identification rate, forwhichaPMEof5ppmand aFMEof0.4Dawasapplied, increasedfrom 5812correctUIPs (nofilter) to 5919correct UIPs(MS2Deisotope+MS2Denoise).
TheuseofprecursorRefinehadanoticeableimpactonthe performance(Fig.3,suppl.Fig.S8),asthemaximalnumber ofcorrectUIPsincreasedfrom 6805(nofilter)to6910(pre- cursorRefine)(Table1).Moreover,thetopidentification rate wasachievedwhensmallmasserrorswereused(PME=5ppm, FME=0.4Da,instrument=Orbitrap).Thelowestidentification rate, 5821 correct UIPs, was also achieved by using small masserrors(PME=5ppm,FME=0.6Da,instrument=FT).For datafromtheOrbitrap,identificationratesincreasedandthe impact ofparameter settings wasreduced.Specifically, the rangeof3.2%wasreducedbyover50%comparedtothewider rangeof8.13%forthe‘nofilter’settings.Wedidnotobserve thesamepositiveeffectfordataobtainedfromtheFT,where theresultsweremorecomparabletothoseobtainedwhenno filterswereused(suppl.Fig.S9).
Wetestedthecombinationofallthreepre-processingfil- tersandthiscombinationperformedbest,with6938correct UIPs,whenusingPME=5ppmandFME=0.4Da(Table1).Using smallermasserrorsdidnotleadtofurtherimprovements.
3.4. Benchmarksample
Wepooled20,103syntheticpeptidestocreatethetestsam- ple(seeSection2).AnalyzingthesampleontheFTandon theOrbitrapgeneratedabout11,500spectraperfilefora90- mingradientandaround13,000spectraperfilefora120-min gradient,respectively(Table2).
Theuseofasyntheticpoolallowedustoidentifyassign- mentsabovethechosencutoffascorrectiftheymatchedthe syntheticpeptidesequences.However,thetypeofsynthetic peptidesweusedlimitedusindrawingfurtherconclusions aboutpotentiallyincorrectPSMs.Althoughwecouldcertifya PSMasbeingcorrect,itwasnotpossibletorejectPSMsastruly falsewhentheydidnotmatchasyntheticpeptidesequence.
Eachofthe 20,103crude syntheticpeptidesthatwepooled inordertogenerateoursampledatasetwereproducedusing SPOTsynthesis,whichmeansthattracesofby-productscould alsobefoundinthepool.By-productsaredominatedbypep- tidesequenceswhichhaveoneormoregaps(missingamino acids)atcertainpositionsinthetargetedpeptidesequence [37].Thequantityoftheseby-productsisusuallysignificantly lower than the amountoftargetedpeptide sequences [37].
This does not mean, however, that these peptide variants cannotbedetectedbymassspectrometry.Dependingonthe workflow,upto9%ofalluniquelyidentifiedpeptides(UIPs)on a1%FDRlevelmatchedoneoftheseby-products(upto6.15%
matchedaone-gapby-productandupto2.7%matchedatwo- gapby-product).Weperformedtwolabel-freequantification experimentsusingOpenMS[38],onewiththedefaultparame- tersandonewithrelaxedqualitycriteriaforfeaturedetection.
Inbothcases,wedetectedfeaturesfortheoriginalsynthetic
Fig.3–Boxplotsofthesearchresults(UIPs)fordifferentpre-filteringoptions.Eachboxplotcontainstheresultsfrom108 databasesearches(54differentparametersettings,eachfordatafilesacquiredwithtwodifferentmassspectrometers).
Pre-filteringtoolswereMS2Denoising(N),MS2Deisotoping(I)andprecursorRefine(R),whichwereappliedeither
individuallyorinvariouscombinations(NI,NR,IR,NIR).Thegreendaggersrepresentdatapointsoutsidetherangebetween thefirstandthirdquartile.(Forinterpretationofthereferencestocolorinthisfigurelegend,thereaderisreferredtotheweb versionofthisarticle.)
peptides(defaultparameter:n=5843;lessrestrictiveparam- eter:n=6000)andby-productswithuptotwogaps(default parameter:n=6027;lessrestrictiveparameter:n=6188).
4. Discussion
Here,wepresentalarge-scalestudyinvestigatingstrategiesto improveMS/MSpeptideidentificationandtomakedatabase searchesmorerobust.Acomplexpoolofsyntheticpeptides wascreatedinordertocomprehensivelybenchmarkthedif- ferentanalysisworkflowsbasedonthenumberofcorrectly identifiedpeptidesequencesfromthiswellcharacterized,but complex,sample.Theuse ofsyntheticpeptidesasarefer- ence sampleaddressedissues ofother sampletypes, such ascomplexityandcertaintyaboutthetruepositivematches.
Thisallowedanimprovedinterpretation ofthebenchmark resultsthatwasindependentofestimationsbasedondecoy databases.Weinvestigatedtheimpactof108differentsearch parametersettingsontheidentificationperformanceofthree searchengines,Mascot,X!TandemandOMSSA,theeffectof combiningmultiplesearchenginesinasingleapproachand theprocessingofionmassspectrapriortodatabasesearching, totalingaround1800distinctcombinations.
Firstly,wetestedandoptimizedtheparametersettingsfor eachsearchengineindividually.Overall,thedatashowedthat Mascot,X!TandemandOMSSAperformedatacomparable levelifoptimalsearchparameterswereused.However,there were distinct performancedifferencesusing otherparame- tersets.OMSSAachievedthebestperformanceofthesearch enginestestedbutwasmostsensitivetotheparametersett- ings.Mascotwasmorerobustinthefaceofparameterchanges
Table2–Samplepropertiesformeasurementsofthesyntheticpeptidedataset.
Measurement Massspectrometer LC-gradient(min) MS1scans MS2scans
1 LTQ-FTUltra 90 4899 11,491
2 LTQ-FTUltra 90 4890 11,752
3 LTQ-FTUltra 120 6570 12,963
4 LTQ-FTUltra 120 6509 12,760
5 LTQ-OrbitrapXL 90 4036 11,445
6 LTQ-OrbitrapXL 90 4040 11,244
7 LTQ-OrbitrapXL 120 5228 13,819
8 LTQ-OrbitrapXL 120 5206 13,795
butwasnegativelyaffectedwhenasmallPMEwasselected anddidnot benefittothe same extentfromthe increased sensitivityofanOrbitrapcomparedtoanFT.X!Tandemben- efitedmostfromtheincreasedinstrumentperformanceand wasleastaffectedbyparameterchanges.Itdid,however,iden- tifyfewercorrectUIPsunderoptimalconditionsthantheother searchengines.ThisislikelyaresultofX!Tandemsalgorithm thatbenefitsfromfindingmorethanonepeptidefromeach protein.Generally,thechoiceofoptimalparametersettings wasmoreimportantandhadalargerinfluenceonthenumber ofcorrectlyidentifiedPSMsthandidthechoiceofthesearch engineitself.
Wecontinuedourbenchmarkstudybytesting combina- tionsofthedifferentsearchenginesinasingleworkflow.Here, the datasupportedtwo mainconclusions. Firstly,integrat- ingtheresultsfrommultiplesearchenginesusingiProphet reducedtheimpactofsub-optimalparametersettings and, secondly,ithelpedtoincreasethenumberofUIPsidentified.
TheMXO-combinationperformedbetterthantheotherwork- flows tested, independently ofthe applied parameter sets.
Howeverthedataalsoindicatedthatperformancedoesnot increaselinearlywiththe numberofsearchenginesadded toa workflow.Under optimal conditions, the XO-workflow performedalmostaswellasthe MXO-workflow,suggesting thatthesecombinations,usedoptimally,identifyclosetothe maximalnumberofdetectableUIPsinadatasetusingthe currenttechnology.Ingeneral,weobservedthateachwork- flowreachedmaximumidentification rateswhen aPMEof 15–25ppmwasusedinconjunctionwithasmallFMEof0.4Da.
LoweringthePME,i.e.to5ppm,ledtoadecreaseinthenumber ofcorrectidentificationsforalltheworkflowstested.
After identifying the optimal combination of search engines, we investigated the potential benefit of pre- processingpeak lists prior to their processingwith search engines. We postulated that pre-processing of ion mass spectraleadstofurtherimprovementsinthe identification performancewhilesignificantlydecreasingtheimpactofany sub-optimalparametersettings.DatafromtheLFQ-FTUltra werebestanalyzedwithacombination ofMS2Denoiseand MS2DeisotopeinconjunctionwithamoderatePMEof15ppm andaFMEof0.6Da.Theoptimalpre-processingstrategyfor dataobtainedfrom the Orbitrap wasto apply all threefil- ters,namely,MS2Denoise,MS2DeisotopeandprecursorRefine, togetherwithasmallPMEof5ppmandaFMEof0.4or0.6Da.
Loweringtheerrorsfortheparentmassorthefragmentmass didnotfurtherincreasetheperformance.
Thepeptidesusedinthisstudyaretheso-calledcrudepep- tides,inwhichsomepeptide-basedby-productsarepresent atahigh enough concentrationtobe detectableby MS.X!
Tandem detecteda higher percentageof by-products than didMascotorOMSSAbecauseitsscoringalgorithmdoesnot relyoninformationabout thepeakintensityoftheprecur- sorions.Therefore,spectrawithweakersignals(asexpected forby-products)arenotandarethereforereportedasPSMs.
ThatthesePSMsare notfalsepositiveassignmentsissup- portedbytwofacts:Firstly,theothertwoidentificationtools, MascotandOMSSA,alsodetectedsomeoftheby-products at1%FDRonthe PSM-leveland,secondly,wewere ableto quantifysomeoftheseby-productsbylabel-freequantifica- tionwithOpenMS.Althoughwedemonstratedthatasmall
proportionofthesynthesisby-productsisdetectableandis probablycorrectlyassigned,wecouldnotdeterminewhether thesematchesweretrulycorrect.Thisiswhyweonlyconsid- eredPSMstobecorrectwhentheymatchedoneofthe20,103 syntheticpeptidesequences.
Several recent publications havedemonstrated that the results from pure decoy databases and the use of entrap- ment databases differ [39,40]. Asit is still unclear how to bestselecttheentrapmentdatabase,bothintermsofnumber ofsequences butalsowhichorganismsthatare sufficiently divergedfromtheorganismunderstudy.Wehaverefrained fromusingentrapmentdatabasesandoptedtousethemore traditional reversedecoy database [3]. Itisalsopossibleto use muchwider parentmasswindowsandinstead relyon filteringtheresults[41].Thisoptionwasnotexplored.Some parametershadnoimpactontheresultsasexemplifiedby the peptideProphetmodelsdespite previousreportson the topic [42].Itisstill unclearwhy ourresultsare notinline withtheliteratureandshouldbeexploredfurther.Bothmass spectrometersusedinthisstudyhaverelativelysimilarspec- ificationsanditisunclearhowthefindingsherewouldhold forothertypesofinstruments[43].
Insummary,theresultsofourbenchmarkstudyshowed thatthecorrectchoiceofparametersettingshasalargeinflu- enceontheidentificationperformanceofthesearchengines weevaluated.Thesearchparametercombinationstestedled toidentificationratesofbetween36%and93%onthePSM- levelandbetween14%and90%ontheUIP-level.Weassessed theinfluenceexertedbyeachoftheparametersandfoundthat themassspectrometertypeandtheallowedmasserrors(PME, FME)havethelargestimpactonvariationsintheresults.There are other untestedparameterssuchasthesizeofthepro- teindatabase,whichcaninfluencetheresult.Ingeneral,the searchenginesperformedbetterwhenthemassspectrometer hadahighersensitivity,suchastheOrbitrap,andwhenmass errorsof15ppm(PME)and0.4Da(FME)wereused.Reducing bothmasserrorsfurther,especiallythePME,ledtoadecreased identificationrate.Othersearchparameters,suchasthenum- ber ofallowedmissed cleavagesand thestatistical models ofPeptideProphet,onlyhadaminimalimpactonimproving theresultsfromthedatabasesearchengines.Theseresults demonstrated thatusingadvancedidentification workflows wasthekeytosuccessfullyimprovingtheunderstandingof themeasureddatawhilekeepingthefalse-positivehitstoa minimum.
Conflict of interest
Theauthorsdeclarenocompetingfinancialinterests.
Transparency document
TheTransparencydocumentassociatedwiththisarticlecan befoundintheonlineversion.
Authors’ contribution
AQ and LM conceived the project and wrote the initial manuscript.AQundertooktheexperimentplanning,sample
preparation,workflowdefinitionanddataanalysis.LEwrote prototypesofthenodesusedinthetestedworkflows.LEalso wrotethepost-processing toolsappliedinthe pipeline.AB implementedtheconceptofsuper-workflowsintoP-GRADE, whichwas usedtoperform thedataanalyses usedinthis study.HWpreparedthe OpenMSprocessingworkflow used forthelabel-freequantificationofthebenchmarksample.MB providedearlystageaccesstothepre-processingalgorithms inmsconvertandwrotepartsoftheMethodssectionexplain- ingthesefilters.PKsupportedtheprojectwithresourcesand wrotepartsofthemanuscript.RAsupportedandfinancedthis project,gavevaluablescientificinputandwrotemajorparts ofthismanuscript.
Acknowledgments
TheauthorswouldliketothankChristopherPaulseforimple- mentingthepre-processingfiltersin msconvert.Wewould alsoliketothankOliverRinnerforhelpingpreparethecrude peptides mixture usedin this paper, as well as Alexander Leitnerforfruitfuldiscussionsregardingtheinstrumentsett- ings and handling and Paola Picottifordiscussing matters regardingthesyntheticpeptides.Wewouldalsoliketoextend thankstotheSyBITprojectoftheSystemsX.chintiativeand theBrutussystemadministratorsforsupportwithcomputing infrastructureandotherIT-relatedresources.
Appendix A. Supplementary data
Supplementarydataassociatedwiththisarticlecanbefound, intheonlineversion,atdoi:10.1016/j.euprot.2014.10.001.
references
[1] NesvizhskiiAI,VitekO,AebersoldR.Analysisandvalidation ofproteomicdatageneratedbytandemmassspectrometry.
NatMethods2007;4(10):787–97.
[2] MatthiesenR.Methods,algorithmsandtoolsin computationalproteomics:apracticalpointofview.
Proteomics2007;7(16):2815–32.
[3] EliasJE,GygiSP.Target-decoysearchstrategyforincreased confidenceinlarge-scaleproteinidentificationsbymass spectrometry.NatMethods2007;4(3):207–14.
[4] BenjaminiYHY.Controllingthefalsediscoveryrate:a practicalandpowerfulapproachtomultipletesting.JRStat SocBMet1995;57(1):289–300.
[5] FrankA,PevznerP.PepNovo:denovopeptidesequencingvia probabilisticnetworkmodeling.AnalChem
2005;77(4):964–73.
[6] LamH,DeutschEW,EddesJS,EngJK,KingN,SteinSE,etal.
Developmentandvalidationofaspectrallibrarysearching methodforpeptideidentificationfromMS/MS.Proteomics 2007;7(5):655–67.
[7] KellerA,EngJ,ZhangN,LiXJ,AebersoldR.Auniform proteomicsMS/MSanalysisplatformutilizingopenXMLfile formats.MolSystBiol2005;1:0017.
[8] CottinghamK.Manualvalidationisahotproteomicstopic.
AnalChem2005;77(5):92.
[9] DeutschEW,ShteynbergD,LamH,SunZ,EngJK,CarapitoC, etal.Trans-proteomicpipelinesupportsandimproves
analysisofelectrontransferdissociationdatasets.
Proteomics2010;10(6):1190–5.
[10] KallL,CanterburyJD,WestonJ,NobleWS,MacCossMJ.
Semi-supervisedlearningforpeptideidentificationfrom shotgunproteomicsdatasets.NatMethods2007;4(11):923–5.
[11] KellerA,NesvizhskiiAI,KolkerE,AebersoldR.Empirical statisticalmodeltoestimatetheaccuracyofpeptide identificationsmadebyMS/MSanddatabasesearch.Anal Chem2002;74(20):5383–92.
[12] TannerS,ShuH,FrankA,WangLC,ZandiE,MumbyM,etal.
InsPecT:identificationofposttranslationallymodified peptidesfromtandemmassspectra.AnalChem 2005;77(14):4626–39.
[13] EngJK,McCormackAL,YatesJR.Anapproachtocorrelate tandemmassspectraldataofpeptideswithaminoacid sequencesinaproteindatabase.JAmSocMassSpectrom 1994;5(11):976–89.
[14] ColingeJ,MasselotA,GironM,DessingyT,MagninJ.OLAV:
towardshigh-throughputtandemmassspectrometrydata identification.Proteomics2003;3(8):1454–63.
[15] TabbDL,FernandoCG,ChambersMC.MyriMatch:highly accuratetandemmassspectralpeptideidentificationby multivariatehypergeometricanalysis.JProteomeRes 2007;6(2):654–61.
[16] PerkinsDN,PappinDJ,CreasyDM,CottrellJS.
Probability-basedproteinidentificationbysearching sequencedatabasesusingmassspectrometrydata.
Electrophoresis1999;20(18):3551–67.
[17] CraigR,BeavisRC.TANDEM:matchingproteinswith tandemmassspectra.Bioinformatics2004;20(9):1466–7.
[18] GeerLY,MarkeySP,KowalakJA,WagnerL,XuM,Maynard DM,etal.Openmassspectrometrysearchalgorithm.J ProteomeRes2004;3(5):958–64.
[19] QuandtA,MasselotA,HernandezP,HernandezC, MaffiolettiS,AppelRD,etal.SwissPIT:anworkflow-based platformforanalyzingtandem-MSspectrausingtheGrid.
Proteomics2009;9(10):2648–55.
[20] ShteynbergD,DeutschEW,LamH,EngJK,SunZ,TasmanN, etal.iProphet:multi-levelintegrativeanalysisofshotgun proteomicdataimprovespeptideandproteinidentification ratesanderrorestimates.MolCellProteomics2011;10(12).
M111.007690.
[21] ParkCY,KlammerAA,KallL,MacCossMJ,NobleWS.Rapid andaccuratepeptideidentificationfromtandemmass spectra.JProteomeRes2008;7(7):3022–7.
[22] NahnsenS,BertschA,RahnenfuhrerJ,NordheimA, KohlbacherO.Probabilisticconsensusscoringimproves tandemmassspectrometrypeptideidentification.J ProteomeRes2011;10(8):3332–43.
[23] TabbDL,MaZQ,MartinDB,HamAJ,ChambersMC.
DirecTag:accuratesequencetagsfrompeptideMS/MS throughstatisticalscoring.JProteomeRes2008;7(9):3838–46.
[24] KlimekJ,EddesJS,HohmannL,JacksonJ,PetersonA,Letarte S,etal.Thestandardproteinmixdatabase:adiversedata settoassistintheproductionofimprovedpeptideand proteinidentificationsoftwaretools.JProteomeRes 2008;7(1):96–103.
[25] IvanovAR,ColangeloCM,DufresneCP,FriedmanDB,Lilley KS,MechtlerK,etal.Interlaboratorystudiesandinitiatives developingstandardsforproteomics.Proteomics
2013;13(6):904–9.
[26] MarxH,LemeerS,SchliepJE,MatheronL,MohammedS, CoxJ,etal.Alargesyntheticpeptideandphosphopeptide referencelibraryformassspectrometry-basedproteomics.
NatBiotechnol2013;31(6):557–64.
[27] PicottiP,AebersoldR,DomonB.Theimplicationsof proteolyticbackgroundforshotgunproteomics.MolCell Proteomics2007;6(9):1589–98.
[28] FrankR.TheSPOT-synthesistechnique.Syntheticpeptide arraysonmembranesupports–principlesandapplications.
JImmunolMethods2002;267(1):13–26.
[29] TheUniversalProteinResource(UniProt)2009.NucleicAcids Res2009;37(Databaseissue):D169–74.
[30] ColingeJ,MasselotA,CarbonellP,AppelRD.InSilicoSpectro:
anopen-sourceproteomicslibrary.JProteomeRes 2006;5(3):619–24.
[31] PedrioliPG,EngJK,HubleyR,VogelzangM,DeutschEW, RaughtB,etal.Acommonopenrepresentationofmass spectrometrydataanditsapplicationtoproteomics research.NatBiotechnol2004;22(11):1459–66.
[32] SyBIThttp://www.sybit.net [33] MySQLhttp://www.mysql.com
[34] KessnerD,ChambersM,BurkeR,AgusD,MallickP.
ProteoWizard:opensourcesoftwareforrapidproteomics toolsdevelopment.Bioinformatics2008;24(21):2534–6.
[35] WeisserH,NahnsenS,GrossmannJ,NilseL,QuandtA, BrauerH,etal.Anautomatedpipelineforhigh-throughput label-freequantitativeproteomics.JProteomeRes 2013;12(4):1628–44.
[36] FarkasZKP.P-GRADEportal:agenericworkflowsystemto supportusercommunities.FutureGenerComputSyst 2011;27(5):454–65.
[37] PicottiP,RinnerO,StallmachR,DautelF,FarrahT,DomonB, etal.High-throughputgenerationofselected
reaction-monitoringassaysforproteinsandproteomes.Nat Methods2010;7(1):43–6.
[38] SturmM,BertschA,GroplC,HildebrandtA,HussongR, LangeE,etal.OpenMS–anopen-sourcesoftwareframework formassspectrometry.BMCBioinform2008;9:163.
[39] GranholmV,NobleWS,KallL.Onusingsamplesofknown proteincontenttoassessthestatisticalcalibrationofscores assignedtopeptide-spectrummatchesinshotgun
proteomics.JProteomeRes2011;10(5):2671–8.
[40] VaudelM,BurkhartJM,BreiterD,ZahediRP,SickmannA, MartensL.Acomplexstandardforproteinidentification, designedbyevolution.JProteomeRes2012;11(10):5065–71.
[41] BeausoleilSA,VillenJ,GerberSA,RushJ,GygiSP.A probability-basedapproachforhigh-throughputprotein phosphorylationanalysisandsitelocalization.Nat Biotechnol2006;24(10):1285–92.
[42] MaK,VitekO,NesvizhskiiAI.Astatisticalmodel-building perspectivetoidentificationofMS/MSspectrawith PeptideProphet.BMCBioinform2012;13(Suppl.16):S1.
[43] ColaertN,DegroeveS,HelsensK,MartensL.Analysisofthe resolutionlimitationsofpeptideidentificationalgorithms.J ProteomeRes2011;10(12):5555–61.