data analysis search parameters and software forMS/MS Using synthetic peptides to benchmark peptideidentiﬁcation ScienceDirect

(1)

Availableonlineatwww.sciencedirect.com

ScienceDirect

j o u r n a l ho me p ag e :h t t p : / / w w w . e l s e v i e r . c o m / l o c a t e / e u p r o t

Using synthetic peptides to benchmark peptide identiﬁcation software and search parameters for MS/MS data analysis

Andreas Quandt

^a

, Lucia Espona

^a^,^b

, Akos Balasko

^c

, Hendrik Weisser

^a

, Mi-Youn Brusniak

^d

, Peter Kunszt

^b^,¹

, Ruedi Aebersold

^a^,^e

,

Lars Malmström

^a^,∗,¹

aDepartmentofBiology,InstituteofMolecularSystemsBiology,ETHZurich,Switzerland

bSyBIT,SystemsX.ch,Switzerland

cMTASZTAKI,LaboratoryofParallelandDistributedSystems,Budapest,Hungary

dInstituteforSystemsBiology,Seattle,USA

eFacultyofScience,UniversityofZurich,Switzerland

a r t i c l e i n f o

Articlehistory:

Received4November2013 Receivedinrevisedform 17April2014

Accepted6October2014 Availableonline28October2014

Keywords:

Massspectrometry Dataanalysis

Classicaldatabasesearch Syntheticpeptides Searchengine

a bs t r a c t

Tandemmassspectrometryandsequencedatabasesearchingarewidelyusedinproteomics toidentifypeptidesincomplexmixtures.Herewepresentabenchmarkstudyinwhicha poolof20,103syntheticpeptideswasmeasuredandtheresultingdatasetwasanalyzed usingaround1800differentsoftwareandparametersetcombinations.Theresultsindicate astrongrelationshipbetweentheperformanceofananalysisworkﬂowandtheapplied parametersettings.Wepresentanddiscussstrategiestooptimizeparametersettingsin ordertosigniﬁcantlyincreasethenumberofcorrectlyassignedfragmentionspectraandto maketheanalysismethodrobust.

1. Introduction

Tandemmassspectrometry(MS/MS)isthemethodofchoice foridentifyingandquantifyingproteinsincomplexmixtures

Abbreviations: CPM,classicalparametricmodel;FDR,falsediscoveryrate;FME,fragmentmasserror;I,MS2Deisotope;LTQ,lineartrap quadrupole;FT,Fouriertransform;M,Mascot;MC,missedcleavage;N,MS2Denoise;O,OMSSA;PM,parametricmodel;PMC,parametric modelwithcorrectionofthenegativedistributionbasedondecoyhits;PME,parentmasserror;PSMs,peptidespectrummatches;PTMs, post-translationalmodifications;R,precursorRefine;SPM,thesemi-parametricmodel;TPP,Trans-ProteomicPipeline;UIPs,uniquely identifiedpeptides;X,X!Tandem.

∗ Correspondingauthor.Tel.:+41446332195.

E-mailaddress:lars@imsb.biol.ethz.ch(L.Malmström).

1 Currentaddress:S3IT,UniversityofZurich,Switzerland.

becauseofitshighthroughput,sensitivityandrelativeease ofuse.However,the optimalanalysisoftheresultingmass spectrometry data is complex and the subject of continu- ousresearch. Inthe mostfrequentdataanalysisworkﬂow, fragment ionspectra generatedfrom selectedpeptide ions

http://dx.doi.org/10.1016/j.euprot.2014.10.001

(2)

areassignedtotheircorrespondingpeptidesequencesusing software tools commonly referred to as database search engines.Numeroussearchengineshavebeendeveloped,each oneusingadifferentalgorithmtomaximizethenumberof peptide-spectrummatches(PSMs)andtoassess confidence inthecorrectnessoftheirassignments[1,2].Searchengines computeascoreforeachPSMthatreflectsthequalityofthe assignment;theuserdefinesacutoffthatoptimallyseparates correctfrom incorrectassignments.Inmorerecentstudies, thescorecutoffisselectedbyatarget-decoy strategy[3]to achieveaspecificfalsediscoveryrate(FDR)[4]. PSMsabove thecutoffcanbeeithertruepositivesorfalsepositives,and PSMsbelowthecutoffcanbeeitherfalsenegativesortrue negatives. Most search engines use a protein database to define which proteinsare expectedin the sample,thereby reducing the search space significantly. De novo sequenc- ingalgorithms[5]andspectrallibrarysearchengines[6]use nodatabaseoraspectrallibrarydatabase insteadofapro- teindatabase.Wedidnotusethesetypesofsearchengines inthis study sincethey are used less compared tosearch enginesthat rely on proteindatabases. Althoughdatabase searchenginesusevariationsonthesameprinciple,matching ameasuredtoatheoreticalspectrum,theirrespectivesearch resultsdifferevenifthe samedata setissearchedagainst thesamesequencedatabase[7].Searchenginesprovidedif- ferent results because they generate different fractions of correctandincorrectPSM-assignments.Alternatively,thisis probablytheresultofsearch enginesdifferinginthenum- berandtypeofcorrectassignmentstheymake.Determining correctlyidentifiedpeptidesandwronglyidentifiedpeptides ineachdataset,respectively,isthereforeimportanttoeval- uate the performance of a data analysis workflow. Most workflows rely on either manual, expert inspection ofthe searchresults oron softwaretoolstoestimatethe propor- tionoffalseidentifications.Themanualassessmentofthe quality of PSMs is error-prone, dependent on the level of experienceoftheevaluator,inconsistentbetweenevaluators and time-consuming [8]. For computer-based assessments of the quality of PSMs, there are two principal strategies that are primarily applied. The first uses statistical mod- elsasexemplifiedbyPeptideProphetintheTrans-Proteomic Pipeline (TPP) [7,9] or Percolator [10], and the second uses atarget-decoystrategy[3].PeptideProphetreliesonmixture modelstointegrate differenttypesofinformation, suchas thedistributionofsearchenginescores,thelikelihoodthat assigned peptides are present in the sample or the score differencebetweenthebestandsecond-bestassignmentof a spectrum. The mixture models are used to convert this informationinto searchengine-independent scores,reflect- ingtheprobabilitythataparticularPSMhasbeencorrectly assigned [11]. Theprinciple behind atarget-decoy strategy isbasedonthecalculationofFDRsusingthedecoypartof thesearchdatabase[4]toestimatehowmanyfalseassign- mentsareexpectedamongthehitsinthetarget-partofthe database atsomescorecutoffs. Theconsistentdetermina- tion of the FDR for different data sets provided either by statisticalmodelsorbytarget-decoy strategiesiscritical in makingmeaningfulcomparisonsofdifferentsearchengines andparametersets[6,12–18].Toincreasethefractionofcor- rectly assigned spectra and to increase confidence in the

reported resultstheoutput ofmultiplesearch engineswas combined[19–22].

Althoughbothmethodsforassessingthequalityofsearch resultsarewidelyused,itstillremainschallengingtoobjec- tivelyevaluatethedifferentanalysisworkflows,searchtools and parametersetsandtoprovethatonehasabetterper- formancethantheother.Thedifficultiesarisefortwomajor reasons. The first isthe absence ofa complex sample set ofknowncompositionalthoughthisdifficultyisincreasingly mitigatedbythefallingpricestocreatesyntheticpeptides.The secondistheinabilitytosystematicallyassesstheinfluenceof variousperturbationstotheanalysisworkflow.Moststudies thatpresentanewworkflowusespecificbiologicalsamples [23],orso-calledspike-insamples[24],andincreasinglycom- plex syntheticsamples[25,26] toevaluateits performance.

Usingabiologicalsamplesuchasadigestedcelllysatehas theadvantagethatpeptide-to-spectrummatchingiscarried out under realistic conditions, i.e. on a sample that con- tainsthousandsofpeptidescoveringawiderangeofsignal intensities. However,these studiesare limited because the truepeptidecompositionofsuchsamplesisunknown,partly duetothepresenceofpeptidesthathavepost-translational modifications (PTMs) or that are the result ofnon-specific andmissedcleavages[27].Itisoftennotpossibletocontrol these biologicalevents preventing areliablecomparisonof thesearchresults.PSMscannotbecategorizedascorrector incorrectwithconfidencebecausethereisnoevidencethat the matching peptide trulyexists inthe measured sample anditisdifficulttoestimatehowcloselythegeneratedsetof identifiedpeptidesmatchesthemaximallyachievableset.An alternativetobiologicalsamplesistheuseofspike-insamples.

Spike-insamplesusuallyconsistofamixofafewdozenpuri- fiedrecombinantproteinsofknownsequenceandquantity thataredigestedwithtrypsintogenerateapeptidemixtureof knowncomposition.Suchsamplesarethenanalyzedeitherby themselves,orinacomplexbackgroundsampleofunknown composition. ThepresenceofPTMsisnolonger aconcern [28].However,non-specificandmissedcleavages,artifactual modificationsgeneratedduringthesampleprocessingandthe presenceofproteinsintroducedasminorcontaminantsofthe purifiedreferenceproteinsarestillpossibleand,therefore,the peptidecompositionofsuchsamplesisstillunknown,evenif theyareanalyzedwithoutaddedbackground.Inaddition,if analyzedwithoutaddedbackground,thecomplexityofsuch samplesdoesnotmatchthecomplexityandintensitydistri- butionofmostbiologicalsamples.Thespectraproducedoften containfewersignalsfrompeptidesco-fragmentedwiththe targetpeptidesand,ifthesedooccur,theyshowalowersig- nalintensitycomparedtothatoftheusualbiologicalsamples.

Thesefactorsaffectthecomplexityofthespectrumpattern andlowerthethresholdatwhichtargetedpeptidesarecor- rectlymatchedusingasearchengine.Thesecondproblem, thatis,theinﬂuenceofperturbations–suchasvariationsin theparameterset–onthesearchresults,isoftennotsystem- aticallyaddressedintherelatedstudiesbecausemostdata analysisworkﬂowsarenotautomatedtoalevelwheremany differentsearchparametersetscanbeeasilytestedandcom- pared.

Herewepresentastudyonsystematicallyvaryingparam- etersandsearchengines,inwhichweinvestigatetheimpact

(3)

thesehaveonthesensitivityoftheanalysisofadatasetgen- eratedfromacomplexsyntheticsampleofmorethan20,100 peptides,previouslyobservedmyMSinhumansamples.The complexityofthesyntheticsampleissufficientforarealistic testandallowedustoaccuratelyestimateboththesensitivity andspecificityofthesearchresults.Thesampledoeshowever notmimicbiologicalsamplesintermsofdynamicrange.We propose a strategy to find optimal search parameters and presentdetailedinformationonhowthevariousparameters influencetheresults.Thepeaklistfilesandtheidentification resultsare publicly availableinthe PeptideAtlas repository (https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/PASS View?identifier=PASS00090).

2. Methods

2.1. Preparationofthebenchmarksample

20,103 unique peptides were synthesized by JPT Peptide Technologies GmbH using SPOT-synthesis [28] (suppl. synthetic peptidesequences.txt) and crude synthesis products werealiquotedataconcentrationofapproximately60nmol/␮l perpeptidein96wellmicrotiterplates.5␮lfromeachwell wasusedtocreateintermediatepoolsthatweresubsequently usedtocreateapoolofallpeptides,eachatanestimatedﬁnal concentrationofabout3pmol/␮lperpeptide.

2.2. Massspectrometry

The synthetic peptide pool was measured on two liquid chromatography–tandem mass spectrometers (LC–MS/MS), an LTQ-FT Ultra (Thermo Fischer Scientific) coupled to a TempoNanoLC(AppliedBiosystems)andanLTQ-OrbitrapXL (ThermoFischerScientific)witha1D-NanoLC-Ultrasystem (Eksigent).Bothsystemswereequippedwithastandardnano- electrospraysourceandthechromatographicseparationwas performedwiththesamebuffersystem:97%water,3%acetonitrile and 0.1%formicacid constituted mobile phaseA, whilemobilephaseBcomprised3%water,97%acetonitrile and 0.1% formicacid. For each LC–MS/MS run, 2␮l ofthe peptidepoolwereinjectedontoan11cm×0.075mmI.D.col- umnpackedin-housewithMagicC18material(3␮mparticle size, 200 ˚A pore size, Michrom Bioresources). Thepeptides wereloadedontotheLCcolumnataflowrateof300nl/min andelutedwitheitherofthe twofollowinggradients:Gra- dient #1: 0–5min=5% phase B solution, 5–95min=linear gradientfrom5–35%phaseBsolution,95–97min=lineargra- dient from 35–95%phase B solution and 97–107min=95%

phase B solution. Gradient #2: 0–5min=5% phase B solution,5–125min=lineargradientfrom5–35%phaseBsolution, 125–127min=linear gradientfrom 35–95%phase Bsolution and 127–137min=95% phase B solution. The ion source and transmission settings for both mass spectrometers were:Sprayvoltage=2kVwithcapillarytemperature=200^◦C, capillaryvoltage=60Vandtubelensvoltage=135V.Allmea- surementsofthesyntheticpeptidemix,onboththeFTand Orbitrapinstruments,wereacquiredindata-dependentmode, selectinguptoﬁveprecursorsfromanMS1scan(resolution fortheFT:100,000;resolutionfortheOrbitrap: 60,000) ina

rangeof350–1600m/zforcollision-induceddissociation(CID).

Theiontargetvalueswere1,000,000(ormaximum500msﬁll time)forfullscansand10,000(ormaximum200msﬁlltime (Orbitrap)and300ms(FT))forfragmentionscans,respectively.

Ionswithasingleor unknownchargestatewereautomat- icallyrejected. Thesyntheticpeptidemixwasmeasuredin duplicates on eithermass spectrometer,using each of the twospeciﬁedgradients,resultinginatotalofeightLC–MS/MS datasets.Themassspectrometerswereequilibratedusinga standardmixturebeforeeachsampleinjection.

2.3. Dataanalysis

Thedataanalysisworkﬂowsrelyonsearchingtheacquired fragment ionspectra againsta proteinsequence database.

WethereforederivedahumansubsetfromUniprotKB/Swiss- Prot,version57.1[29],performingthefollowingsteps.Firstly, thecomplete UniprotKB/Swiss-Protdatabasewasconverted from itsoriginalDAT formatinto aFASTAfileincluding all known splice variants and isoforms. For this process, we useduniprotdat2fasta.pl,which ispart ofInSilicoSpectro,a bioinformatics toolscollection usedformass spectrometry [30]. Secondly,thesubsetofhumanproteinsequences was extractedwithsubsetdb,whichispartoftheTPP[7].Inthe finalstep,atarget-decoydatabase withreverse decoyswas generatedusingfasta-decoy.pl,anothertoolincludedinInSil- icoSpectro.Aftergeneratingthedatabase,thedataacquired fromthemassspectrometerswereconvertedintopeaklistsof theformatmzXML[31].Thisprocesswasaccomplishedusing ReAdW.exe,whichispartoftheTPP.Next,wecreatedseven workflowsinordertosearchthesepeaklistsagainsttheprevi- ouslygenerateddatabase.Forthreeworkflowsthiswasdone usingasingledatabasesearchengine,specificallyMascot[16]

(version2.3) (M),OMSSA[18](version2.1.7)(O)andX!Tan- dem[17](version2009.04.01,k-scoreplugin)(X);forafurther threeworkflowstwodatabasesearchengineswerecombined, specificallyMascot-OMSSA(MO),Mascot-X!Tandem(MX)and X!Tandem-OMSSA(XO);andforthefinalworkflowthethree databasesearchenginestestedwerecombined(MXO).Ineach workflow,thesearch engineoutputwas convertedinto the pepXMLformatandscoredusingPeptideProphet[7].Thefinal scoredistributionmodelwascalculatedwithiProphet[20]and the resultconvertedfrom itspepXMLformat intoasimple textformat(CSV) usingpepxml2csv[32]. Inadditiontothe iProphetscore,weappliedthetarget-decoystrategyandcalcu- latedthecorrespondingFDRvalueswithfdr2probability[32].

Theresultingdatamatrixwasthenimportedinto aMySQL database[33]toevaluatetheperformanceofeachdataanaly- sisbasedonthenumberofuniquelyidentiﬁedpeptides(UIP) thatmatchedasyntheticpeptidesequenceat1%FDR.

Eachofthesevenworkflowswastestedwith54different parametersettingstoinvestigatetheinfluenceofthefollow- ing searchparameters: theprecursor masserror(PME),the fragment masserror (FME), the number ofallowedmissed cleavages(MC)andthedifferentscoringmodelsofPeptide- Prophet.ValuesforthePMEweresetto25ppm,15ppmand 5ppm, respectively. We chose values of 0.8Da, 0.6Da and 0.4Da forthe FME, respectively (Fig. 1). Wealso alloweda maximum of one or two missed cleavage events, respectively, and defined carbamidomethylation of cysteine as a

(4)

Fig.1–Overviewoftheexperimentaldesign.Thediagramshowsthestepsthatweresystematicallyvaried.Thespectral pre-processingwasonlycarriedoutontheoptimalcombinationofsearchengines.

(5)

fixedmodification.ForPeptideProphet,thefollowingstatisti- calmodelsweretested:theclassicalparametricmodel(CPM), theparametricmodel(PM)withcorrectionofthenegativedis- tributionbasedondecoyhits(PMC)andthesemi-parametric model(SPM).Intotal,756dataanalyseswereproducedinorder tostudytheperformancevariationofsingleandmulti-search engine workflows induced by changing parameter settings (suppl.Fig.S1) (Supplementary figures are availablefreeof chargeviatheInternetathttp://pubs.acs.org).

Inordertoinvestigatepeaklistpre-processing,westud- iedtheeffectsofthefollowingfilters:MS2Deisotope(I),which deisotopes;MS2Denoise(N),whichdenoises;andprecursor- Refine(R),whichrefinestheprecursorionmass;alloperating onMS2 spectra.All three were implementedinmsconvert (revision2238),whichformspartofProteoWizard[34].Since thesefiltershavetobeappliedintheprocessofconverting theoriginalRAWformattoapeaklistfile,weusedmscon- vertinsteadofReAdW.exeforthisinvestigation.Basedonthe originalinstrumentfiles,wegeneratedpeaklistfilesineight differentfiltercombinations,oneusingnofilter,threeusing asinglefilter,threewheretwofilterswerecombinedandone whereallthreefilterswerecombined.Thepeaklistfilesof eachfiltercombinationweretestedwiththeMXO-workflow usingthesamesequencedatabaseandthesame54parame- tersettingsaswereusedforthepreviousdataanalyses,plus anadditionalsetofparametercombinations(suppl.Figs.S2 andS10).Intotal,1074searcheswereperformedtoinvesti- gatetheeffectofpre-processingthepeaklistsonworkflow performance.

Detection ofpeptide synthesis by products was carried out bygeneratingadatabase withall permutationsofsin- gle amino acids missing. Thisdatabase was then used as describedabove.

OpenMSMS1featuredetectionandpeptidequantiﬁcation wascarriedoutasdescribedbyWeisseretal.[35].

Allthedataanalysespresentedinthis studywereauto- matedandexecutedusingtheworkﬂowsystemP-GRADE[36].

3. Results

Thebenchmark study we present was performed in three steps. Firstly, we explored the inﬂuence of critical search parametersontheperformanceofindividualsearchengines.

Secondly,wetestedhowthecombinationsoftheresultsfrom different search engines affected the identification performance.Thirdly,wemeasuredtheinfluenceofpre-processing thepeaklistsonsearchperformance.Theresultsoftheanaly- sesweperformedwerecomparedbasedonthenumberoftrue positiveUIPsat1%FDR.Aschematicoftheworkflowusedto carryoutallthreestepsispresentedinFig.1andasummary ofalltheresultsisgiveninTable1.

3.1. Singlesearchengineperformance

Wegenerated three workﬂows each incorporating a single searchengine,onebasedonMascot(M),oneonOMSSA(O) andoneonX!Tandem(X).TheX!TandemK-scoreplug-inwas usedincombinationwithX!Tandemthroughout.Weassessed eachrespectiveperformancebychangingparameterssuchas

theinstrumenttype,masserrorsforthefragmention(FME) andfortheparention(PME;seeFig.1).

Across all theparameter settings tested, the numberof identifiedpeptideswasconsistentlyhigherfordataacquired on the Orbitrapinstrument compared todatafrom the FT (suppl.Figs.S3–5).Thisobservationshouldnotbeinterpreted asonetypeofinstrumentbeingsuperiortoanother,itrather simplymeansthatoneinstrumentwasbetteroptimizedcom- pared to the other at the time of acquisition. The other parametersthathadamajorinfluenceonsearchperformance werethemasserrorsFMEandPME.Theresultsindicatedthat loweringtheFMEincreasedthesearchperformance,butthat reducingthePMEbelow15ppmcausedalossofperformance (suppl.Figs.S6andS7).Thedataforthesingledatabasesearch engineworkflowsareshowninFig.2.EachboxplotinFig.2 consistsof54 independentsearchresults,oneperparame- ter set. Thefigureshows the number ofcorrectUIPs. Two out ofthe 756 dataanalysesfailed becausethe qualitative requirementsformodelingthenegativepeptidedistribution inPeptideProphetwerenotfulfilled.

The best performance among the single search engine workflowswasachievedwithOMSSA(Table1).Atotalof6489 correctUIPswere foundwhenasmallFMEof0.4Da anda moderatePMEof15ppmwereused.Non-optimalparameter settings,suchaslargermasserrors(FME:0.8Da,PME:25ppm), causedadropinthenumberofcorrectUIPsto614,thelow- estidentificationrateinthisstudy.Inaddition,theuseofa smallermasserrorissub-optimal,asusingaPMEof5ppm resultedin5900correctUIPs,comparedtothe6489identifi- cationsat15ppmPME.Anotherparameterwithanoticeable impactonperformancewasthetypeofmassspectrometer used.Applyingthesamemasserrorsettingsresultedinadrop from6489correctUIPs(Orbitrap)to6097correctUIPs(FT).

TheMascot(M)workflowidentifiedamaximum of6401 correctUIPs(Table1)overallparameterstested.Incontrast toOMSSA,MascotperformedbetterwithalessstrictPMEof 25ppminconjunctionwithasmallFMEof0.4Da,althoughthe performancedifferencebetweenaPMEof15ppmand25ppm wasinsignificant,with6401comparedto6391correctUIPs, respectively. Other parameters had a smallereffect on the performance.

ThethirdsearchenginewetestedwasX!Tandem.Com- paredtotheO-andM-workﬂows,X!Tandemwasmorerobust whentherewere changesofthemasserrorsFMEandPME.

Inparticular,varyingFMEdidnothaveasigniﬁcantimpact.

Forexample,themaximalperformanceof6219correctUIPs was achievedwhenapplyingaPMEof15ppm,but didnot changewhentheFMEwasvariedbetween0.4,0.6and0.8Da.

The same ﬁnding applied to data analyses with aPME of 25ppmand5ppm,whichresultedin6170and5856correct UIPs,respectively,andthesewereunaffectedbyvariationof theFMEwithintherangetested(Table1).

3.2. Performanceofmultiplesearchenginesearches

We combined the output of multiple search engines to investigateifthecombinedoutputcouldimprovesearchper- formance. Alltwo-waycombinations (MO,MX,XO)and the combination ofallthree(MXO)weretestedusingthesame parameter settingsused forthesinglesearch engines.The

(6)

Table1–Performanceofthevarioustoolcombinations.Eachtoolcombinationwastestedusingseveralparametersets.

Searchengines Abbreviation Max/Min ofcorrect

UIPs

Maxof correctUps

(%)

Most inﬂuential parameter

Best parameters PME/FME/MC/PM

Note

OMSSA O 6489/614 32.28 FME 15/0.4/1/SPM Bestsingleengine

MASCOT M 6401/5330 31.84 PME/FME 25/0.4/1/SP Mostrobustengine

w.r.t.MS

X!Tandem X 6219/5244 30.94 MS 15/0.4–0.8/1/CPM Mostrobustengine

w.r.t.PME/FME

Mascot/OMSSA MO 6674/5240 33.20 PME/FME 15/0.4/1/CPM

Mascot/X!Tandem MX 6595/5330 32.81 PME/FME 25/0.4/1/PMC

X!Tandem/OMSSA XO 6769/5510 33.67 PME/FME 15/0.4/1/CPM

Mascot/X!Tandem/OMSSA MXO 6814/5846 33.90 PME/FME 25/0.4/1/CPM Bestsearch

combination

Denoise N 6807/5890 33.86 PME/FME 15/0.4/2/CPM

Deisotop I 6802/5844 33.84 PME/FME 15/0.4/1/CPM

Reﬁne R 6910/5821 34.37 PME/FME 5/0.4/2/PMC Effectivefor

LTQ-Orbitrap

Denoise/Deisotop NI 6828/5919 33.97 PME/FME 15/0.4–0.6/1/CPM

Reﬁne/Denoise NR 6910/5916 34.37 PME/FME 5/0.4/2/PMC Bestpreprocessing

forFT

Reﬁne/Deisotop IR 6909/5869 34.37 PME/FME 5/0.4/1/PMC

Reﬁne/Deonise/Deisotop NIR 6938/5945 34.51 MS 5/0.4–0.6/1/CPM Bestpreprocessing forLTQ-Orbitrap

Fig.2–Thenumberofcorrectlyidentifiedpeptidesperworkflowisshowninaboxplotrepresentation.Atotalof54 parametersetsandtwomassspectrometertypeswereusedforeachworkflow.Differentsingledatabasesearchengines wereused,Mascot(M),OMSSA(O)orX!Tandem(X),respectively,inadditiontocombinationsoftwoorthreesearch engines:Mascot-OMSSA(MO),Mascot-X!Tandem(MX),X!Tandem-OMSSA(XO)andMascot-X!Tandem-OMSSA(MXO).The upperwhiskerindicatesthenumberofpeptidesidentifiedusinganoptimalparametersetandtheredlinemarksthemean numberofpeptidesidentifiedfortheparametersetstestedwithinaworkflow.Theboxitselfcircumscribesthesearch resultsbetweenthefirstandthethirdquartile.Thelargerthespread,themoresensitivewasthesearchtotheparameters.

Thegreendaggersmarkmeasurementswhichareoutsideoftherangebetweenﬁrstandthirdquartile.(Forinterpretation ofthereferencestocolorinthisﬁgurelegend,thereaderisreferredtothewebversionofthisarticle.)

(7)

searchengineresultswerecombinedusingiProphet[20].The resultsarepresentedinFig.2andTable1.Thedatashowed animprovedperformanceformulti-searchengineworkﬂows comparedtosingle-engineworkﬂows.

Theeffectsofcombiningsearchresultsareapparentfrom theXO-workflow,whichcombinedthefairlyrobustX!Tandem enginewiththe moresensitiveOMSSA tool.Thecombined workflowoutperformedthesingleengineresultsofOMSSA (maximal6489correctUIPs)and X!Tandem(maximal6219 correctUIPs), withamaximum of6769correct UIPsunder optimalsearchparameters(PME=15ppmandFME=0.4).The XO-workflowachieved5510correctUIPsusingtheleastopti- mal parameters tested, which was better than both the X-workflowwith5254correctUIPsandtheO-workflowwith 614 correct UIPs (Table 1). In the XO-workflow, X!Tandem largelycompensatedforthepoorperformanceofOMSSAin casesinwhichsub-optimalparameterswerebeingused,and theperformanceincreasedabovethelevelofasinglesearch enginewhenoptimalparametersettingswereapplied.Similar trendswereobservedforthetwoothertwo-enginecombinations.

We also tested the combination of the three database searchenginesinasingleworkflow(MXO),whichresultedin 6814correctUIPs,thehighestidentificationrateoftheseven workflows,if a moderate PME of15ppm and a small FME of0.4Dawasused(Table1).Additionally,withsub-optimal searchparametersettings,suchasaPMEof5ppmandaFME of0.8Da,aminimumof5846correctUIPswasscored.Thisis significantlyhigherwhencomparedtotheotherworkflows.

Theresulting spread of14.2% was the lowest of all work- flowsandthereforeindicatedthattheMXO-workflowwasthe leastdependent onthe search parametersettings.Fig. S12 displayspseudo-receiveroperatingcharacteristiccurves for theoptimalparametersettingsforeachofthesevensearch enginecombinations.Fig. S11 showaVENNdiagram com- paringtheresultsofthreeindividualsearchengineswiththe MXO-workflow.

3.3. Effectofpeaklistpre-processing

We investigated three types of peak list pre-processing:

deisotoping (filter=MS2Deisotope), denoising (filter=MS2Denoise) and refining the mass of the precursor ion (filter=precursorRefine). We tested the performance of these filters using a single filter, the combination of two filtersand combining all three filters,and comparedthese resultsto results generatedwithout afilter. Thedata was processedwiththeMXO-workflowandthesame54parame- tersettingsweusedpreviously(suppl.Figs.S1andS2)plus additionalparametersetsinvestigatingmorenarrowparent andfragmentmasserrors(suppl.Fig.S10).Theresultsofthe pre-processingbenchmarkarepresentedinFig.3.

Pre-processing with MS2Denoise and MS2Deisotope showednoperformanceimprovementcomparedtothe‘no filter’setting.Thisissupportedbythe6807and6802correct UIPsachievedatPME=15ppmandFME=0.4Da,respectively, incomparisonwiththe6805correctUIPsobtainedwithno filter. However, the data indicated that both filters partly compensatedforsub-optimalsettings.Theminimalnumber of correct UIPs increased if a PME of 5ppm with a FME

of 0.4Da was used (for no ﬁlter: n=5812, for MS2Denoise:

n=5890andforMS2Deisotope:n=5844)(Fig.3,Table1).The samewasobservedifbothfilterswerecombined:Themax- imal identification rateof 6828correct UIPs (PME=15ppm, FME=0.6Da) was no improvement compared to the ‘no filter’ settings. In contrast, the lowest identification rate, forwhichaPMEof5ppmand aFMEof0.4Dawasapplied, increasedfrom 5812correctUIPs (nofilter) to 5919correct UIPs(MS2Deisotope+MS2Denoise).

TheuseofprecursorRefinehadanoticeableimpactonthe performance(Fig.3,suppl.Fig.S8),asthemaximalnumber ofcorrectUIPsincreasedfrom 6805(nofilter)to6910(pre- cursorRefine)(Table1).Moreover,thetopidentification rate wasachievedwhensmallmasserrorswereused(PME=5ppm, FME=0.4Da,instrument=Orbitrap).Thelowestidentification rate, 5821 correct UIPs, was also achieved by using small masserrors(PME=5ppm,FME=0.6Da,instrument=FT).For datafromtheOrbitrap,identificationratesincreasedandthe impact ofparameter settings wasreduced.Specifically, the rangeof3.2%wasreducedbyover50%comparedtothewider rangeof8.13%forthe‘nofilter’settings.Wedidnotobserve thesamepositiveeffectfordataobtainedfromtheFT,where theresultsweremorecomparabletothoseobtainedwhenno filterswereused(suppl.Fig.S9).

Wetestedthecombinationofallthreepre-processingﬁl- tersandthiscombinationperformedbest,with6938correct UIPs,whenusingPME=5ppmandFME=0.4Da(Table1).Using smallermasserrorsdidnotleadtofurtherimprovements.

3.4. Benchmarksample

Wepooled20,103syntheticpeptidestocreatethetestsam- ple(seeSection2).AnalyzingthesampleontheFTandon theOrbitrapgeneratedabout11,500spectraperﬁlefora90- mingradientandaround13,000spectraperﬁlefora120-min gradient,respectively(Table2).

Theuseofasyntheticpoolallowedustoidentifyassign- mentsabovethechosencutoffascorrectiftheymatchedthe syntheticpeptidesequences.However,thetypeofsynthetic peptidesweusedlimitedusindrawingfurtherconclusions aboutpotentiallyincorrectPSMs.Althoughwecouldcertifya PSMasbeingcorrect,itwasnotpossibletorejectPSMsastruly falsewhentheydidnotmatchasyntheticpeptidesequence.

Eachofthe 20,103crude syntheticpeptidesthatwepooled inordertogenerateoursampledatasetwereproducedusing SPOTsynthesis,whichmeansthattracesofby-productscould alsobefoundinthepool.By-productsaredominatedbypep- tidesequenceswhichhaveoneormoregaps(missingamino acids)atcertainpositionsinthetargetedpeptidesequence [37].Thequantityoftheseby-productsisusuallysigniﬁcantly lower than the amountoftargetedpeptide sequences [37].

This does not mean, however, that these peptide variants cannotbedetectedbymassspectrometry.Dependingonthe workﬂow,upto9%ofalluniquelyidentiﬁedpeptides(UIPs)on a1%FDRlevelmatchedoneoftheseby-products(upto6.15%

matchedaone-gapby-productandupto2.7%matchedatwo- gapby-product).Weperformedtwolabel-freequantiﬁcation experimentsusingOpenMS[38],onewiththedefaultparame- tersandonewithrelaxedqualitycriteriaforfeaturedetection.

Inbothcases,wedetectedfeaturesfortheoriginalsynthetic

(8)

Fig.3–Boxplotsofthesearchresults(UIPs)fordifferentpre-ﬁlteringoptions.Eachboxplotcontainstheresultsfrom108 databasesearches(54differentparametersettings,eachfordataﬁlesacquiredwithtwodifferentmassspectrometers).

Pre-ﬁlteringtoolswereMS2Denoising(N),MS2Deisotoping(I)andprecursorReﬁne(R),whichwereappliedeither

individuallyorinvariouscombinations(NI,NR,IR,NIR).Thegreendaggersrepresentdatapointsoutsidetherangebetween theﬁrstandthirdquartile.(Forinterpretationofthereferencestocolorinthisﬁgurelegend,thereaderisreferredtotheweb versionofthisarticle.)

peptides(defaultparameter:n=5843;lessrestrictiveparameter:n=6000)andby-productswithuptotwogaps(default parameter:n=6027;lessrestrictiveparameter:n=6188).

4. Discussion

Here,wepresentalarge-scalestudyinvestigatingstrategiesto improveMS/MSpeptideidentificationandtomakedatabase searchesmorerobust.Acomplexpoolofsyntheticpeptides wascreatedinordertocomprehensivelybenchmarkthedif- ferentanalysisworkflowsbasedonthenumberofcorrectly identifiedpeptidesequencesfromthiswellcharacterized,but complex,sample.Theuse ofsyntheticpeptidesasarefer- ence sampleaddressedissues ofother sampletypes, such ascomplexityandcertaintyaboutthetruepositivematches.

Thisallowedanimprovedinterpretation ofthebenchmark resultsthatwasindependentofestimationsbasedondecoy databases.Weinvestigatedtheimpactof108differentsearch parametersettingsontheidentiﬁcationperformanceofthree searchengines,Mascot,X!TandemandOMSSA,theeffectof combiningmultiplesearchenginesinasingleapproachand theprocessingofionmassspectrapriortodatabasesearching, totalingaround1800distinctcombinations.

Firstly,wetestedandoptimizedtheparametersettingsfor eachsearchengineindividually.Overall,thedatashowedthat Mascot,X!TandemandOMSSAperformedatacomparable levelifoptimalsearchparameterswereused.However,there were distinct performancedifferencesusing otherparame- tersets.OMSSAachievedthebestperformanceofthesearch enginestestedbutwasmostsensitivetotheparametersett- ings.Mascotwasmorerobustinthefaceofparameterchanges

Table2–Samplepropertiesformeasurementsofthesyntheticpeptidedataset.

Measurement Massspectrometer LC-gradient(min) MS1scans MS2scans

1 LTQ-FTUltra 90 4899 11,491

2 LTQ-FTUltra 90 4890 11,752

3 LTQ-FTUltra 120 6570 12,963

4 LTQ-FTUltra 120 6509 12,760

5 LTQ-OrbitrapXL 90 4036 11,445

(9)

butwasnegativelyaffectedwhenasmallPMEwasselected anddidnot benefittothe same extentfromthe increased sensitivityofanOrbitrapcomparedtoanFT.X!Tandemben- efitedmostfromtheincreasedinstrumentperformanceand wasleastaffectedbyparameterchanges.Itdid,however,iden- tifyfewercorrectUIPsunderoptimalconditionsthantheother searchengines.ThisislikelyaresultofX!Tandemsalgorithm thatbenefitsfromfindingmorethanonepeptidefromeach protein.Generally,thechoiceofoptimalparametersettings wasmoreimportantandhadalargerinfluenceonthenumber ofcorrectlyidentifiedPSMsthandidthechoiceofthesearch engineitself.

Wecontinuedourbenchmarkstudybytesting combina- tionsofthedifferentsearchenginesinasingleworkﬂow.Here, the datasupportedtwo mainconclusions. Firstly,integrat- ingtheresultsfrommultiplesearchenginesusingiProphet reducedtheimpactofsub-optimalparametersettings and, secondly,ithelpedtoincreasethenumberofUIPsidentiﬁed.

TheMXO-combinationperformedbetterthantheotherwork- ﬂows tested, independently ofthe applied parameter sets.

Howeverthedataalsoindicatedthatperformancedoesnot increaselinearlywiththe numberofsearchenginesadded toa workflow.Under optimal conditions, the XO-workflow performedalmostaswellasthe MXO-workflow,suggesting thatthesecombinations,usedoptimally,identifyclosetothe maximalnumberofdetectableUIPsinadatasetusingthe currenttechnology.Ingeneral,weobservedthateachwork- flowreachedmaximumidentification rateswhen aPMEof 15–25ppmwasusedinconjunctionwithasmallFMEof0.4Da.

LoweringthePME,i.e.to5ppm,ledtoadecreaseinthenumber ofcorrectidentiﬁcationsforalltheworkﬂowstested.

After identifying the optimal combination of search engines, we investigated the potential benefit of pre- processingpeak lists prior to their processingwith search engines. We postulated that pre-processing of ion mass spectraleadstofurtherimprovementsinthe identification performancewhilesignificantlydecreasingtheimpactofany sub-optimalparametersettings.DatafromtheLFQ-FTUltra werebestanalyzedwithacombination ofMS2Denoiseand MS2DeisotopeinconjunctionwithamoderatePMEof15ppm andaFMEof0.6Da.Theoptimalpre-processingstrategyfor dataobtainedfrom the Orbitrap wasto apply all threefil- ters,namely,MS2Denoise,MS2DeisotopeandprecursorRefine, togetherwithasmallPMEof5ppmandaFMEof0.4or0.6Da.

Loweringtheerrorsfortheparentmassorthefragmentmass didnotfurtherincreasetheperformance.

Thepeptidesusedinthisstudyaretheso-calledcrudepep- tides,inwhichsomepeptide-basedby-productsarepresent atahigh enough concentrationtobe detectableby MS.X!

Tandem detecteda higher percentageof by-products than didMascotorOMSSAbecauseitsscoringalgorithmdoesnot relyoninformationabout thepeakintensityoftheprecur- sorions.Therefore,spectrawithweakersignals(asexpected forby-products)arenotandarethereforereportedasPSMs.

ThatthesePSMsare notfalsepositiveassignmentsissup- portedbytwofacts:Firstly,theothertwoidentiﬁcationtools, MascotandOMSSA,alsodetectedsomeoftheby-products at1%FDRonthe PSM-leveland,secondly,wewere ableto quantifysomeoftheseby-productsbylabel-freequantiﬁca- tionwithOpenMS.Althoughwedemonstratedthatasmall

proportionofthesynthesisby-productsisdetectableandis probablycorrectlyassigned,wecouldnotdeterminewhether thesematchesweretrulycorrect.Thisiswhyweonlyconsid- eredPSMstobecorrectwhentheymatchedoneofthe20,103 syntheticpeptidesequences.

Several recent publications havedemonstrated that the results from pure decoy databases and the use of entrap- ment databases differ [39,40]. Asit is still unclear how to bestselecttheentrapmentdatabase,bothintermsofnumber ofsequences butalsowhichorganismsthatare sufficiently divergedfromtheorganismunderstudy.Wehaverefrained fromusingentrapmentdatabasesandoptedtousethemore traditional reversedecoy database [3]. Itisalsopossibleto use muchwider parentmasswindowsandinstead relyon filteringtheresults[41].Thisoptionwasnotexplored.Some parametershadnoimpactontheresultsasexemplifiedby the peptideProphetmodelsdespite previousreportson the topic [42].Itisstill unclearwhy ourresultsare notinline withtheliteratureandshouldbeexploredfurther.Bothmass spectrometersusedinthisstudyhaverelativelysimilarspec- ificationsanditisunclearhowthefindingsherewouldhold forothertypesofinstruments[43].

Insummary,theresultsofourbenchmarkstudyshowed thatthecorrectchoiceofparametersettingshasalargeinflu- enceontheidentificationperformanceofthesearchengines weevaluated.Thesearchparametercombinationstestedled toidentificationratesofbetween36%and93%onthePSM- levelandbetween14%and90%ontheUIP-level.Weassessed theinfluenceexertedbyeachoftheparametersandfoundthat themassspectrometertypeandtheallowedmasserrors(PME, FME)havethelargestimpactonvariationsintheresults.There are other untestedparameterssuchasthesizeofthepro- teindatabase,whichcaninfluencetheresult.Ingeneral,the searchenginesperformedbetterwhenthemassspectrometer hadahighersensitivity,suchastheOrbitrap,andwhenmass errorsof15ppm(PME)and0.4Da(FME)wereused.Reducing bothmasserrorsfurther,especiallythePME,ledtoadecreased identificationrate.Othersearchparameters,suchasthenum- ber ofallowedmissed cleavagesand thestatistical models ofPeptideProphet,onlyhadaminimalimpactonimproving theresultsfromthedatabasesearchengines.Theseresults demonstrated thatusingadvancedidentification workflows wasthekeytosuccessfullyimprovingtheunderstandingof themeasureddatawhilekeepingthefalse-positivehitstoa minimum.

Conﬂict of interest

Theauthorsdeclarenocompetingﬁnancialinterests.

Transparency document

TheTransparencydocumentassociatedwiththisarticlecan befoundintheonlineversion.

Authors’ contribution

AQ and LM conceived the project and wrote the initial manuscript.AQundertooktheexperimentplanning,sample

(10)

preparation,workflowdefinitionanddataanalysis.LEwrote prototypesofthenodesusedinthetestedworkflows.LEalso wrotethepost-processing toolsappliedinthe pipeline.AB implementedtheconceptofsuper-workflowsintoP-GRADE, whichwas usedtoperform thedataanalyses usedinthis study.HWpreparedthe OpenMSprocessingworkflow used forthelabel-freequantificationofthebenchmarksample.MB providedearlystageaccesstothepre-processingalgorithms inmsconvertandwrotepartsoftheMethodssectionexplain- ingthesefilters.PKsupportedtheprojectwithresourcesand wrotepartsofthemanuscript.RAsupportedandfinancedthis project,gavevaluablescientificinputandwrotemajorparts ofthismanuscript.

Acknowledgments

TheauthorswouldliketothankChristopherPaulseforimple- mentingthepre-processingﬁltersin msconvert.Wewould alsoliketothankOliverRinnerforhelpingpreparethecrude peptides mixture usedin this paper, as well as Alexander Leitnerforfruitfuldiscussionsregardingtheinstrumentsett- ings and handling and Paola Picottifordiscussing matters regardingthesyntheticpeptides.Wewouldalsoliketoextend thankstotheSyBITprojectoftheSystemsX.chintiativeand theBrutussystemadministratorsforsupportwithcomputing infrastructureandotherIT-relatedresources.

Appendix A. Supplementary data

Supplementarydataassociatedwiththisarticlecanbefound, intheonlineversion,atdoi:10.1016/j.euprot.2014.10.001.

references

[1] NesvizhskiiAI,VitekO,AebersoldR.Analysisandvalidation ofproteomicdatageneratedbytandemmassspectrometry.

NatMethods2007;4(10):787–97.

[2] MatthiesenR.Methods,algorithmsandtoolsin computationalproteomics:apracticalpointofview.

Proteomics2007;7(16):2815–32.

[3] EliasJE,GygiSP.Target-decoysearchstrategyforincreased conﬁdenceinlarge-scaleproteinidentiﬁcationsbymass spectrometry.NatMethods2007;4(3):207–14.

[4] BenjaminiYHY.Controllingthefalsediscoveryrate:a practicalandpowerfulapproachtomultipletesting.JRStat SocBMet1995;57(1):289–300.

[5] FrankA,PevznerP.PepNovo:denovopeptidesequencingvia probabilisticnetworkmodeling.AnalChem

2005;77(4):964–73.

[6] LamH,DeutschEW,EddesJS,EngJK,KingN,SteinSE,etal.

Developmentandvalidationofaspectrallibrarysearching methodforpeptideidentiﬁcationfromMS/MS.Proteomics 2007;7(5):655–67.

[7] KellerA,EngJ,ZhangN,LiXJ,AebersoldR.Auniform proteomicsMS/MSanalysisplatformutilizingopenXMLﬁle formats.MolSystBiol2005;1:0017.

[8] CottinghamK.Manualvalidationisahotproteomicstopic.

AnalChem2005;77(5):92.

[9] DeutschEW,ShteynbergD,LamH,SunZ,EngJK,CarapitoC, etal.Trans-proteomicpipelinesupportsandimproves

analysisofelectrontransferdissociationdatasets.

Proteomics2010;10(6):1190–5.

[10] KallL,CanterburyJD,WestonJ,NobleWS,MacCossMJ.

Semi-supervisedlearningforpeptideidentiﬁcationfrom shotgunproteomicsdatasets.NatMethods2007;4(11):923–5.

[11] KellerA,NesvizhskiiAI,KolkerE,AebersoldR.Empirical statisticalmodeltoestimatetheaccuracyofpeptide identiﬁcationsmadebyMS/MSanddatabasesearch.Anal Chem2002;74(20):5383–92.

[12] TannerS,ShuH,FrankA,WangLC,ZandiE,MumbyM,etal.

InsPecT:identiﬁcationofposttranslationallymodiﬁed peptidesfromtandemmassspectra.AnalChem 2005;77(14):4626–39.

[13] EngJK,McCormackAL,YatesJR.Anapproachtocorrelate tandemmassspectraldataofpeptideswithaminoacid sequencesinaproteindatabase.JAmSocMassSpectrom 1994;5(11):976–89.

[14] ColingeJ,MasselotA,GironM,DessingyT,MagninJ.OLAV:

towardshigh-throughputtandemmassspectrometrydata identiﬁcation.Proteomics2003;3(8):1454–63.

[15] TabbDL,FernandoCG,ChambersMC.MyriMatch:highly accuratetandemmassspectralpeptideidentiﬁcationby multivariatehypergeometricanalysis.JProteomeRes 2007;6(2):654–61.

[16] PerkinsDN,PappinDJ,CreasyDM,CottrellJS.

Probability-basedproteinidentiﬁcationbysearching sequencedatabasesusingmassspectrometrydata.

Electrophoresis1999;20(18):3551–67.

[17] CraigR,BeavisRC.TANDEM:matchingproteinswith tandemmassspectra.Bioinformatics2004;20(9):1466–7.

[18] GeerLY,MarkeySP,KowalakJA,WagnerL,XuM,Maynard DM,etal.Openmassspectrometrysearchalgorithm.J ProteomeRes2004;3(5):958–64.

[19] QuandtA,MasselotA,HernandezP,HernandezC, MafﬁolettiS,AppelRD,etal.SwissPIT:anworkﬂow-based platformforanalyzingtandem-MSspectrausingtheGrid.

Proteomics2009;9(10):2648–55.

[20] ShteynbergD,DeutschEW,LamH,EngJK,SunZ,TasmanN, etal.iProphet:multi-levelintegrativeanalysisofshotgun proteomicdataimprovespeptideandproteinidentiﬁcation ratesanderrorestimates.MolCellProteomics2011;10(12).

M111.007690.

[21] ParkCY,KlammerAA,KallL,MacCossMJ,NobleWS.Rapid andaccuratepeptideidentiﬁcationfromtandemmass spectra.JProteomeRes2008;7(7):3022–7.

[22] NahnsenS,BertschA,RahnenfuhrerJ,NordheimA, KohlbacherO.Probabilisticconsensusscoringimproves tandemmassspectrometrypeptideidentiﬁcation.J ProteomeRes2011;10(8):3332–43.

[23] TabbDL,MaZQ,MartinDB,HamAJ,ChambersMC.

DirecTag:accuratesequencetagsfrompeptideMS/MS throughstatisticalscoring.JProteomeRes2008;7(9):3838–46.

[24] KlimekJ,EddesJS,HohmannL,JacksonJ,PetersonA,Letarte S,etal.Thestandardproteinmixdatabase:adiversedata settoassistintheproductionofimprovedpeptideand proteinidentiﬁcationsoftwaretools.JProteomeRes 2008;7(1):96–103.

[25] IvanovAR,ColangeloCM,DufresneCP,FriedmanDB,Lilley KS,MechtlerK,etal.Interlaboratorystudiesandinitiatives developingstandardsforproteomics.Proteomics

2013;13(6):904–9.

[26] MarxH,LemeerS,SchliepJE,MatheronL,MohammedS, CoxJ,etal.Alargesyntheticpeptideandphosphopeptide referencelibraryformassspectrometry-basedproteomics.

NatBiotechnol2013;31(6):557–64.

[27] PicottiP,AebersoldR,DomonB.Theimplicationsof proteolyticbackgroundforshotgunproteomics.MolCell Proteomics2007;6(9):1589–98.

(11)

[28] FrankR.TheSPOT-synthesistechnique.Syntheticpeptide arraysonmembranesupports–principlesandapplications.

JImmunolMethods2002;267(1):13–26.

[29] TheUniversalProteinResource(UniProt)2009.NucleicAcids Res2009;37(Databaseissue):D169–74.

[30] ColingeJ,MasselotA,CarbonellP,AppelRD.InSilicoSpectro:

anopen-sourceproteomicslibrary.JProteomeRes 2006;5(3):619–24.

[31] PedrioliPG,EngJK,HubleyR,VogelzangM,DeutschEW, RaughtB,etal.Acommonopenrepresentationofmass spectrometrydataanditsapplicationtoproteomics research.NatBiotechnol2004;22(11):1459–66.

[32] SyBIThttp://www.sybit.net [33] MySQLhttp://www.mysql.com

[34] KessnerD,ChambersM,BurkeR,AgusD,MallickP.

ProteoWizard:opensourcesoftwareforrapidproteomics toolsdevelopment.Bioinformatics2008;24(21):2534–6.

[35] WeisserH,NahnsenS,GrossmannJ,NilseL,QuandtA, BrauerH,etal.Anautomatedpipelineforhigh-throughput label-freequantitativeproteomics.JProteomeRes 2013;12(4):1628–44.

[36] FarkasZKP.P-GRADEportal:agenericworkﬂowsystemto supportusercommunities.FutureGenerComputSyst 2011;27(5):454–65.

[37] PicottiP,RinnerO,StallmachR,DautelF,FarrahT,DomonB, etal.High-throughputgenerationofselected

reaction-monitoringassaysforproteinsandproteomes.Nat Methods2010;7(1):43–6.

[38] SturmM,BertschA,GroplC,HildebrandtA,HussongR, LangeE,etal.OpenMS–anopen-sourcesoftwareframework formassspectrometry.BMCBioinform2008;9:163.

[39] GranholmV,NobleWS,KallL.Onusingsamplesofknown proteincontenttoassessthestatisticalcalibrationofscores assignedtopeptide-spectrummatchesinshotgun

proteomics.JProteomeRes2011;10(5):2671–8.

[40] VaudelM,BurkhartJM,BreiterD,ZahediRP,SickmannA, MartensL.Acomplexstandardforproteinidentiﬁcation, designedbyevolution.JProteomeRes2012;11(10):5065–71.

[41] BeausoleilSA,VillenJ,GerberSA,RushJ,GygiSP.A probability-basedapproachforhigh-throughputprotein phosphorylationanalysisandsitelocalization.Nat Biotechnol2006;24(10):1285–92.

[42] MaK,VitekO,NesvizhskiiAI.Astatisticalmodel-building perspectivetoidentiﬁcationofMS/MSspectrawith PeptideProphet.BMCBioinform2012;13(Suppl.16):S1.

[43] ColaertN,DegroeveS,HelsensK,MartensL.Analysisofthe resolutionlimitationsofpeptideidentiﬁcationalgorithms.J ProteomeRes2011;10(12):5555–61.