• Nem Talált Eredményt

data analysis search parameters and software forMS/MS Using synthetic peptides to benchmark peptideidentification ScienceDirect

N/A
N/A
Protected

Academic year: 2022

Ossza meg "data analysis search parameters and software forMS/MS Using synthetic peptides to benchmark peptideidentification ScienceDirect"

Copied!
11
0
0

Teljes szövegt

(1)

Availableonlineatwww.sciencedirect.com

ScienceDirect

j o u r n a l ho me p ag e :h t t p : / / w w w . e l s e v i e r . c o m / l o c a t e / e u p r o t

Using synthetic peptides to benchmark peptide identification software and search parameters for MS/MS data analysis

Andreas Quandt

a

, Lucia Espona

a,b

, Akos Balasko

c

, Hendrik Weisser

a

, Mi-Youn Brusniak

d

, Peter Kunszt

b,1

, Ruedi Aebersold

a,e

,

Lars Malmström

a,∗,1

aDepartmentofBiology,InstituteofMolecularSystemsBiology,ETHZurich,Switzerland

bSyBIT,SystemsX.ch,Switzerland

cMTASZTAKI,LaboratoryofParallelandDistributedSystems,Budapest,Hungary

dInstituteforSystemsBiology,Seattle,USA

eFacultyofScience,UniversityofZurich,Switzerland

a r t i c l e i n f o

Articlehistory:

Received4November2013 Receivedinrevisedform 17April2014

Accepted6October2014 Availableonline28October2014

Keywords:

Massspectrometry Dataanalysis

Classicaldatabasesearch Syntheticpeptides Searchengine

a bs t r a c t

Tandemmassspectrometryandsequencedatabasesearchingarewidelyusedinproteomics toidentifypeptidesincomplexmixtures.Herewepresentabenchmarkstudyinwhicha poolof20,103syntheticpeptideswasmeasuredandtheresultingdatasetwasanalyzed usingaround1800differentsoftwareandparametersetcombinations.Theresultsindicate astrongrelationshipbetweentheperformanceofananalysisworkflowandtheapplied parametersettings.Wepresentanddiscussstrategiestooptimizeparametersettingsin ordertosignificantlyincreasethenumberofcorrectlyassignedfragmentionspectraandto maketheanalysismethodrobust.

©2014TheAuthors.PublishedbyElsevierB.V.onbehalfofEuropeanProteomics Association(EuPA).ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/3.0/).

1. Introduction

Tandemmassspectrometry(MS/MS)isthemethodofchoice foridentifyingandquantifyingproteinsincomplexmixtures

Abbreviations: CPM,classicalparametricmodel;FDR,falsediscoveryrate;FME,fragmentmasserror;I,MS2Deisotope;LTQ,lineartrap quadrupole;FT,Fouriertransform;M,Mascot;MC,missedcleavage;N,MS2Denoise;O,OMSSA;PM,parametricmodel;PMC,parametric modelwithcorrectionofthenegativedistributionbasedondecoyhits;PME,parentmasserror;PSMs,peptidespectrummatches;PTMs, post-translationalmodifications;R,precursorRefine;SPM,thesemi-parametricmodel;TPP,Trans-ProteomicPipeline;UIPs,uniquely identifiedpeptides;X,X!Tandem.

Correspondingauthor.Tel.:+41446332195.

E-mailaddress:lars@imsb.biol.ethz.ch(L.Malmström).

1 Currentaddress:S3IT,UniversityofZurich,Switzerland.

becauseofitshighthroughput,sensitivityandrelativeease ofuse.However,the optimalanalysisoftheresultingmass spectrometry data is complex and the subject of continu- ousresearch. Inthe mostfrequentdataanalysisworkflow, fragment ionspectra generatedfrom selectedpeptide ions

http://dx.doi.org/10.1016/j.euprot.2014.10.001

2212-9685/©2014TheAuthors.PublishedbyElsevierB.V.onbehalfofEuropeanProteomicsAssociation(EuPA).Thisisanopenaccess articleundertheCCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/3.0/).

(2)

areassignedtotheircorrespondingpeptidesequencesusing software tools commonly referred to as database search engines.Numeroussearchengineshavebeendeveloped,each oneusingadifferentalgorithmtomaximizethenumberof peptide-spectrummatches(PSMs)andtoassess confidence inthecorrectnessoftheirassignments[1,2].Searchengines computeascoreforeachPSMthatreflectsthequalityofthe assignment;theuserdefinesacutoffthatoptimallyseparates correctfrom incorrectassignments.Inmorerecentstudies, thescorecutoffisselectedbyatarget-decoy strategy[3]to achieveaspecificfalsediscoveryrate(FDR)[4]. PSMsabove thecutoffcanbeeithertruepositivesorfalsepositives,and PSMsbelowthecutoffcanbeeitherfalsenegativesortrue negatives. Most search engines use a protein database to define which proteinsare expectedin the sample,thereby reducing the search space significantly. De novo sequenc- ingalgorithms[5]andspectrallibrarysearchengines[6]use nodatabaseoraspectrallibrarydatabase insteadofapro- teindatabase.Wedidnotusethesetypesofsearchengines inthis study sincethey are used less compared tosearch enginesthat rely on proteindatabases. Althoughdatabase searchenginesusevariationsonthesameprinciple,matching ameasuredtoatheoreticalspectrum,theirrespectivesearch resultsdifferevenifthe samedata setissearchedagainst thesamesequencedatabase[7].Searchenginesprovidedif- ferent results because they generate different fractions of correctandincorrectPSM-assignments.Alternatively,thisis probablytheresultofsearch enginesdifferinginthenum- berandtypeofcorrectassignmentstheymake.Determining correctlyidentifiedpeptidesandwronglyidentifiedpeptides ineachdataset,respectively,isthereforeimportanttoeval- uate the performance of a data analysis workflow. Most workflows rely on either manual, expert inspection ofthe searchresults oron softwaretoolstoestimatethe propor- tionoffalseidentifications.Themanualassessmentofthe quality of PSMs is error-prone, dependent on the level of experienceoftheevaluator,inconsistentbetweenevaluators and time-consuming [8]. For computer-based assessments of the quality of PSMs, there are two principal strategies that are primarily applied. The first uses statistical mod- elsasexemplifiedbyPeptideProphetintheTrans-Proteomic Pipeline (TPP) [7,9] or Percolator [10], and the second uses atarget-decoystrategy[3].PeptideProphetreliesonmixture modelstointegrate differenttypesofinformation, suchas thedistributionofsearchenginescores,thelikelihoodthat assigned peptides are present in the sample or the score differencebetweenthebestandsecond-bestassignmentof a spectrum. The mixture models are used to convert this informationinto searchengine-independent scores,reflect- ingtheprobabilitythataparticularPSMhasbeencorrectly assigned [11]. Theprinciple behind atarget-decoy strategy isbasedonthecalculationofFDRsusingthedecoypartof thesearchdatabase[4]toestimatehowmanyfalseassign- mentsareexpectedamongthehitsinthetarget-partofthe database atsomescorecutoffs. Theconsistentdetermina- tion of the FDR for different data sets provided either by statisticalmodelsorbytarget-decoy strategiesiscritical in makingmeaningfulcomparisonsofdifferentsearchengines andparametersets[6,12–18].Toincreasethefractionofcor- rectly assigned spectra and to increase confidence in the

reported resultstheoutput ofmultiplesearch engineswas combined[19–22].

Althoughbothmethodsforassessingthequalityofsearch resultsarewidelyused,itstillremainschallengingtoobjec- tivelyevaluatethedifferentanalysisworkflows,searchtools and parametersetsandtoprovethatonehasabetterper- formancethantheother.Thedifficultiesarisefortwomajor reasons. The first isthe absence ofa complex sample set ofknowncompositionalthoughthisdifficultyisincreasingly mitigatedbythefallingpricestocreatesyntheticpeptides.The secondistheinabilitytosystematicallyassesstheinfluenceof variousperturbationstotheanalysisworkflow.Moststudies thatpresentanewworkflowusespecificbiologicalsamples [23],orso-calledspike-insamples[24],andincreasinglycom- plex syntheticsamples[25,26] toevaluateits performance.

Usingabiologicalsamplesuchasadigestedcelllysatehas theadvantagethatpeptide-to-spectrummatchingiscarried out under realistic conditions, i.e. on a sample that con- tainsthousandsofpeptidescoveringawiderangeofsignal intensities. However,these studiesare limited because the truepeptidecompositionofsuchsamplesisunknown,partly duetothepresenceofpeptidesthathavepost-translational modifications (PTMs) or that are the result ofnon-specific andmissedcleavages[27].Itisoftennotpossibletocontrol these biologicalevents preventing areliablecomparisonof thesearchresults.PSMscannotbecategorizedascorrector incorrectwithconfidencebecausethereisnoevidencethat the matching peptide trulyexists inthe measured sample anditisdifficulttoestimatehowcloselythegeneratedsetof identifiedpeptidesmatchesthemaximallyachievableset.An alternativetobiologicalsamplesistheuseofspike-insamples.

Spike-insamplesusuallyconsistofamixofafewdozenpuri- fiedrecombinantproteinsofknownsequenceandquantity thataredigestedwithtrypsintogenerateapeptidemixtureof knowncomposition.Suchsamplesarethenanalyzedeitherby themselves,orinacomplexbackgroundsampleofunknown composition. ThepresenceofPTMsisnolonger aconcern [28].However,non-specificandmissedcleavages,artifactual modificationsgeneratedduringthesampleprocessingandthe presenceofproteinsintroducedasminorcontaminantsofthe purifiedreferenceproteinsarestillpossibleand,therefore,the peptidecompositionofsuchsamplesisstillunknown,evenif theyareanalyzedwithoutaddedbackground.Inaddition,if analyzedwithoutaddedbackground,thecomplexityofsuch samplesdoesnotmatchthecomplexityandintensitydistri- butionofmostbiologicalsamples.Thespectraproducedoften containfewersignalsfrompeptidesco-fragmentedwiththe targetpeptidesand,ifthesedooccur,theyshowalowersig- nalintensitycomparedtothatoftheusualbiologicalsamples.

Thesefactorsaffectthecomplexityofthespectrumpattern andlowerthethresholdatwhichtargetedpeptidesarecor- rectlymatchedusingasearchengine.Thesecondproblem, thatis,theinfluenceofperturbations–suchasvariationsin theparameterset–onthesearchresults,isoftennotsystem- aticallyaddressedintherelatedstudiesbecausemostdata analysisworkflowsarenotautomatedtoalevelwheremany differentsearchparametersetscanbeeasilytestedandcom- pared.

Herewepresentastudyonsystematicallyvaryingparam- etersandsearchengines,inwhichweinvestigatetheimpact

(3)

thesehaveonthesensitivityoftheanalysisofadatasetgen- eratedfromacomplexsyntheticsampleofmorethan20,100 peptides,previouslyobservedmyMSinhumansamples.The complexityofthesyntheticsampleissufficientforarealistic testandallowedustoaccuratelyestimateboththesensitivity andspecificityofthesearchresults.Thesampledoeshowever notmimicbiologicalsamplesintermsofdynamicrange.We propose a strategy to find optimal search parameters and presentdetailedinformationonhowthevariousparameters influencetheresults.Thepeaklistfilesandtheidentification resultsare publicly availableinthe PeptideAtlas repository (https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/PASS View?identifier=PASS00090).

2. Methods

2.1. Preparationofthebenchmarksample

20,103 unique peptides were synthesized by JPT Peptide Technologies GmbH using SPOT-synthesis [28] (suppl. syn- thetic peptidesequences.txt) and crude synthesis products werealiquotedataconcentrationofapproximately60nmol/␮l perpeptidein96wellmicrotiterplates.5␮lfromeachwell wasusedtocreateintermediatepoolsthatweresubsequently usedtocreateapoolofallpeptides,eachatanestimatedfinal concentrationofabout3pmol/␮lperpeptide.

2.2. Massspectrometry

The synthetic peptide pool was measured on two liquid chromatography–tandem mass spectrometers (LC–MS/MS), an LTQ-FT Ultra (Thermo Fischer Scientific) coupled to a TempoNanoLC(AppliedBiosystems)andanLTQ-OrbitrapXL (ThermoFischerScientific)witha1D-NanoLC-Ultrasystem (Eksigent).Bothsystemswereequippedwithastandardnano- electrospraysourceandthechromatographicseparationwas performedwiththesamebuffersystem:97%water,3%ace- tonitrile and 0.1%formicacid constituted mobile phaseA, whilemobilephaseBcomprised3%water,97%acetonitrile and 0.1% formicacid. For each LC–MS/MS run, 2␮l ofthe peptidepoolwereinjectedontoan11cm×0.075mmI.D.col- umnpackedin-housewithMagicC18material(3␮mparticle size, 200 ˚A pore size, Michrom Bioresources). Thepeptides wereloadedontotheLCcolumnataflowrateof300nl/min andelutedwitheitherofthe twofollowinggradients:Gra- dient #1: 0–5min=5% phase B solution, 5–95min=linear gradientfrom5–35%phaseBsolution,95–97min=lineargra- dient from 35–95%phase B solution and 97–107min=95%

phase B solution. Gradient #2: 0–5min=5% phase B solu- tion,5–125min=lineargradientfrom5–35%phaseBsolution, 125–127min=linear gradientfrom 35–95%phase Bsolution and 127–137min=95% phase B solution. The ion source and transmission settings for both mass spectrometers were:Sprayvoltage=2kVwithcapillarytemperature=200C, capillaryvoltage=60Vandtubelensvoltage=135V.Allmea- surementsofthesyntheticpeptidemix,onboththeFTand Orbitrapinstruments,wereacquiredindata-dependentmode, selectinguptofiveprecursorsfromanMS1scan(resolution fortheFT:100,000;resolutionfortheOrbitrap: 60,000) ina

rangeof350–1600m/zforcollision-induceddissociation(CID).

Theiontargetvalueswere1,000,000(ormaximum500msfill time)forfullscansand10,000(ormaximum200msfilltime (Orbitrap)and300ms(FT))forfragmentionscans,respectively.

Ionswithasingleor unknownchargestatewereautomat- icallyrejected. Thesyntheticpeptidemixwasmeasuredin duplicates on eithermass spectrometer,using each of the twospecifiedgradients,resultinginatotalofeightLC–MS/MS datasets.Themassspectrometerswereequilibratedusinga standardmixturebeforeeachsampleinjection.

2.3. Dataanalysis

Thedataanalysisworkflowsrelyonsearchingtheacquired fragment ionspectra againsta proteinsequence database.

WethereforederivedahumansubsetfromUniprotKB/Swiss- Prot,version57.1[29],performingthefollowingsteps.Firstly, thecomplete UniprotKB/Swiss-Protdatabasewasconverted from itsoriginalDAT formatinto aFASTAfileincluding all known splice variants and isoforms. For this process, we useduniprotdat2fasta.pl,which ispart ofInSilicoSpectro,a bioinformatics toolscollection usedformass spectrometry [30]. Secondly,thesubsetofhumanproteinsequences was extractedwithsubsetdb,whichispartoftheTPP[7].Inthe finalstep,atarget-decoydatabase withreverse decoyswas generatedusingfasta-decoy.pl,anothertoolincludedinInSil- icoSpectro.Aftergeneratingthedatabase,thedataacquired fromthemassspectrometerswereconvertedintopeaklistsof theformatmzXML[31].Thisprocesswasaccomplishedusing ReAdW.exe,whichispartoftheTPP.Next,wecreatedseven workflowsinordertosearchthesepeaklistsagainsttheprevi- ouslygenerateddatabase.Forthreeworkflowsthiswasdone usingasingledatabasesearchengine,specificallyMascot[16]

(version2.3) (M),OMSSA[18](version2.1.7)(O)andX!Tan- dem[17](version2009.04.01,k-scoreplugin)(X);forafurther threeworkflowstwodatabasesearchengineswerecombined, specificallyMascot-OMSSA(MO),Mascot-X!Tandem(MX)and X!Tandem-OMSSA(XO);andforthefinalworkflowthethree databasesearchenginestestedwerecombined(MXO).Ineach workflow,thesearch engineoutputwas convertedinto the pepXMLformatandscoredusingPeptideProphet[7].Thefinal scoredistributionmodelwascalculatedwithiProphet[20]and the resultconvertedfrom itspepXMLformat intoasimple textformat(CSV) usingpepxml2csv[32]. Inadditiontothe iProphetscore,weappliedthetarget-decoystrategyandcalcu- latedthecorrespondingFDRvalueswithfdr2probability[32].

Theresultingdatamatrixwasthenimportedinto aMySQL database[33]toevaluatetheperformanceofeachdataanaly- sisbasedonthenumberofuniquelyidentifiedpeptides(UIP) thatmatchedasyntheticpeptidesequenceat1%FDR.

Eachofthesevenworkflowswastestedwith54different parametersettingstoinvestigatetheinfluenceofthefollow- ing searchparameters: theprecursor masserror(PME),the fragment masserror (FME), the number ofallowedmissed cleavages(MC)andthedifferentscoringmodelsofPeptide- Prophet.ValuesforthePMEweresetto25ppm,15ppmand 5ppm, respectively. We chose values of 0.8Da, 0.6Da and 0.4Da forthe FME, respectively (Fig. 1). Wealso alloweda maximum of one or two missed cleavage events, respec- tively, and defined carbamidomethylation of cysteine as a

(4)

Fig.1–Overviewoftheexperimentaldesign.Thediagramshowsthestepsthatweresystematicallyvaried.Thespectral pre-processingwasonlycarriedoutontheoptimalcombinationofsearchengines.

(5)

fixedmodification.ForPeptideProphet,thefollowingstatisti- calmodelsweretested:theclassicalparametricmodel(CPM), theparametricmodel(PM)withcorrectionofthenegativedis- tributionbasedondecoyhits(PMC)andthesemi-parametric model(SPM).Intotal,756dataanalyseswereproducedinorder tostudytheperformancevariationofsingleandmulti-search engine workflows induced by changing parameter settings (suppl.Fig.S1) (Supplementary figures are availablefreeof chargeviatheInternetathttp://pubs.acs.org).

Inordertoinvestigatepeaklistpre-processing,westud- iedtheeffectsofthefollowingfilters:MS2Deisotope(I),which deisotopes;MS2Denoise(N),whichdenoises;andprecursor- Refine(R),whichrefinestheprecursorionmass;alloperating onMS2 spectra.All three were implementedinmsconvert (revision2238),whichformspartofProteoWizard[34].Since thesefiltershavetobeappliedintheprocessofconverting theoriginalRAWformattoapeaklistfile,weusedmscon- vertinsteadofReAdW.exeforthisinvestigation.Basedonthe originalinstrumentfiles,wegeneratedpeaklistfilesineight differentfiltercombinations,oneusingnofilter,threeusing asinglefilter,threewheretwofilterswerecombinedandone whereallthreefilterswerecombined.Thepeaklistfilesof eachfiltercombinationweretestedwiththeMXO-workflow usingthesamesequencedatabaseandthesame54parame- tersettingsaswereusedforthepreviousdataanalyses,plus anadditionalsetofparametercombinations(suppl.Figs.S2 andS10).Intotal,1074searcheswereperformedtoinvesti- gatetheeffectofpre-processingthepeaklistsonworkflow performance.

Detection ofpeptide synthesis by products was carried out bygeneratingadatabase withall permutationsofsin- gle amino acids missing. Thisdatabase was then used as describedabove.

OpenMSMS1featuredetectionandpeptidequantification wascarriedoutasdescribedbyWeisseretal.[35].

Allthedataanalysespresentedinthis studywereauto- matedandexecutedusingtheworkflowsystemP-GRADE[36].

3. Results

Thebenchmark study we present was performed in three steps. Firstly, we explored the influence of critical search parametersontheperformanceofindividualsearchengines.

Secondly,wetestedhowthecombinationsoftheresultsfrom different search engines affected the identification perfor- mance.Thirdly,wemeasuredtheinfluenceofpre-processing thepeaklistsonsearchperformance.Theresultsoftheanaly- sesweperformedwerecomparedbasedonthenumberoftrue positiveUIPsat1%FDR.Aschematicoftheworkflowusedto carryoutallthreestepsispresentedinFig.1andasummary ofalltheresultsisgiveninTable1.

3.1. Singlesearchengineperformance

Wegenerated three workflows each incorporating a single searchengine,onebasedonMascot(M),oneonOMSSA(O) andoneonX!Tandem(X).TheX!TandemK-scoreplug-inwas usedincombinationwithX!Tandemthroughout.Weassessed eachrespectiveperformancebychangingparameterssuchas

theinstrumenttype,masserrorsforthefragmention(FME) andfortheparention(PME;seeFig.1).

Across all theparameter settings tested, the numberof identifiedpeptideswasconsistentlyhigherfordataacquired on the Orbitrapinstrument compared todatafrom the FT (suppl.Figs.S3–5).Thisobservationshouldnotbeinterpreted asonetypeofinstrumentbeingsuperiortoanother,itrather simplymeansthatoneinstrumentwasbetteroptimizedcom- pared to the other at the time of acquisition. The other parametersthathadamajorinfluenceonsearchperformance werethemasserrorsFMEandPME.Theresultsindicatedthat loweringtheFMEincreasedthesearchperformance,butthat reducingthePMEbelow15ppmcausedalossofperformance (suppl.Figs.S6andS7).Thedataforthesingledatabasesearch engineworkflowsareshowninFig.2.EachboxplotinFig.2 consistsof54 independentsearchresults,oneperparame- ter set. Thefigureshows the number ofcorrectUIPs. Two out ofthe 756 dataanalysesfailed becausethe qualitative requirementsformodelingthenegativepeptidedistribution inPeptideProphetwerenotfulfilled.

The best performance among the single search engine workflowswasachievedwithOMSSA(Table1).Atotalof6489 correctUIPswere foundwhenasmallFMEof0.4Da anda moderatePMEof15ppmwereused.Non-optimalparameter settings,suchaslargermasserrors(FME:0.8Da,PME:25ppm), causedadropinthenumberofcorrectUIPsto614,thelow- estidentificationrateinthisstudy.Inaddition,theuseofa smallermasserrorissub-optimal,asusingaPMEof5ppm resultedin5900correctUIPs,comparedtothe6489identifi- cationsat15ppmPME.Anotherparameterwithanoticeable impactonperformancewasthetypeofmassspectrometer used.Applyingthesamemasserrorsettingsresultedinadrop from6489correctUIPs(Orbitrap)to6097correctUIPs(FT).

TheMascot(M)workflowidentifiedamaximum of6401 correctUIPs(Table1)overallparameterstested.Incontrast toOMSSA,MascotperformedbetterwithalessstrictPMEof 25ppminconjunctionwithasmallFMEof0.4Da,althoughthe performancedifferencebetweenaPMEof15ppmand25ppm wasinsignificant,with6401comparedto6391correctUIPs, respectively. Other parameters had a smallereffect on the performance.

ThethirdsearchenginewetestedwasX!Tandem.Com- paredtotheO-andM-workflows,X!Tandemwasmorerobust whentherewere changesofthemasserrorsFMEandPME.

Inparticular,varyingFMEdidnothaveasignificantimpact.

Forexample,themaximalperformanceof6219correctUIPs was achievedwhenapplyingaPMEof15ppm,but didnot changewhentheFMEwasvariedbetween0.4,0.6and0.8Da.

The same finding applied to data analyses with aPME of 25ppmand5ppm,whichresultedin6170and5856correct UIPs,respectively,andthesewereunaffectedbyvariationof theFMEwithintherangetested(Table1).

3.2. Performanceofmultiplesearchenginesearches

We combined the output of multiple search engines to investigateifthecombinedoutputcouldimprovesearchper- formance. Alltwo-waycombinations (MO,MX,XO)and the combination ofallthree(MXO)weretestedusingthesame parameter settingsused forthesinglesearch engines.The

(6)

Table1–Performanceofthevarioustoolcombinations.Eachtoolcombinationwastestedusingseveralparametersets.

Searchengines Abbreviation Max/Min ofcorrect

UIPs

Maxof correctUps

(%)

Most influential parameter

Best parameters PME/FME/MC/PM

Note

OMSSA O 6489/614 32.28 FME 15/0.4/1/SPM Bestsingleengine

MASCOT M 6401/5330 31.84 PME/FME 25/0.4/1/SP Mostrobustengine

w.r.t.MS

X!Tandem X 6219/5244 30.94 MS 15/0.4–0.8/1/CPM Mostrobustengine

w.r.t.PME/FME

Mascot/OMSSA MO 6674/5240 33.20 PME/FME 15/0.4/1/CPM

Mascot/X!Tandem MX 6595/5330 32.81 PME/FME 25/0.4/1/PMC

X!Tandem/OMSSA XO 6769/5510 33.67 PME/FME 15/0.4/1/CPM

Mascot/X!Tandem/OMSSA MXO 6814/5846 33.90 PME/FME 25/0.4/1/CPM Bestsearch

combination

Denoise N 6807/5890 33.86 PME/FME 15/0.4/2/CPM

Deisotop I 6802/5844 33.84 PME/FME 15/0.4/1/CPM

Refine R 6910/5821 34.37 PME/FME 5/0.4/2/PMC Effectivefor

LTQ-Orbitrap

Denoise/Deisotop NI 6828/5919 33.97 PME/FME 15/0.4–0.6/1/CPM

Refine/Denoise NR 6910/5916 34.37 PME/FME 5/0.4/2/PMC Bestpreprocessing

forFT

Refine/Deisotop IR 6909/5869 34.37 PME/FME 5/0.4/1/PMC

Refine/Deonise/Deisotop NIR 6938/5945 34.51 MS 5/0.4–0.6/1/CPM Bestpreprocessing forLTQ-Orbitrap

Fig.2–Thenumberofcorrectlyidentifiedpeptidesperworkflowisshowninaboxplotrepresentation.Atotalof54 parametersetsandtwomassspectrometertypeswereusedforeachworkflow.Differentsingledatabasesearchengines wereused,Mascot(M),OMSSA(O)orX!Tandem(X),respectively,inadditiontocombinationsoftwoorthreesearch engines:Mascot-OMSSA(MO),Mascot-X!Tandem(MX),X!Tandem-OMSSA(XO)andMascot-X!Tandem-OMSSA(MXO).The upperwhiskerindicatesthenumberofpeptidesidentifiedusinganoptimalparametersetandtheredlinemarksthemean numberofpeptidesidentifiedfortheparametersetstestedwithinaworkflow.Theboxitselfcircumscribesthesearch resultsbetweenthefirstandthethirdquartile.Thelargerthespread,themoresensitivewasthesearchtotheparameters.

Thegreendaggersmarkmeasurementswhichareoutsideoftherangebetweenfirstandthirdquartile.(Forinterpretation ofthereferencestocolorinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)

(7)

searchengineresultswerecombinedusingiProphet[20].The resultsarepresentedinFig.2andTable1.Thedatashowed animprovedperformanceformulti-searchengineworkflows comparedtosingle-engineworkflows.

Theeffectsofcombiningsearchresultsareapparentfrom theXO-workflow,whichcombinedthefairlyrobustX!Tandem enginewiththe moresensitiveOMSSA tool.Thecombined workflowoutperformedthesingleengineresultsofOMSSA (maximal6489correctUIPs)and X!Tandem(maximal6219 correctUIPs), withamaximum of6769correct UIPsunder optimalsearchparameters(PME=15ppmandFME=0.4).The XO-workflowachieved5510correctUIPsusingtheleastopti- mal parameters tested, which was better than both the X-workflowwith5254correctUIPsandtheO-workflowwith 614 correct UIPs (Table 1). In the XO-workflow, X!Tandem largelycompensatedforthepoorperformanceofOMSSAin casesinwhichsub-optimalparameterswerebeingused,and theperformanceincreasedabovethelevelofasinglesearch enginewhenoptimalparametersettingswereapplied.Similar trendswereobservedforthetwoothertwo-enginecombina- tions.

We also tested the combination of the three database searchenginesinasingleworkflow(MXO),whichresultedin 6814correctUIPs,thehighestidentificationrateoftheseven workflows,if a moderate PME of15ppm and a small FME of0.4Dawasused(Table1).Additionally,withsub-optimal searchparametersettings,suchasaPMEof5ppmandaFME of0.8Da,aminimumof5846correctUIPswasscored.Thisis significantlyhigherwhencomparedtotheotherworkflows.

Theresulting spread of14.2% was the lowest of all work- flowsandthereforeindicatedthattheMXO-workflowwasthe leastdependent onthe search parametersettings.Fig. S12 displayspseudo-receiveroperatingcharacteristiccurves for theoptimalparametersettingsforeachofthesevensearch enginecombinations.Fig. S11 showaVENNdiagram com- paringtheresultsofthreeindividualsearchengineswiththe MXO-workflow.

3.3. Effectofpeaklistpre-processing

We investigated three types of peak list pre-processing:

deisotoping (filter=MS2Deisotope), denoising (fil- ter=MS2Denoise) and refining the mass of the precursor ion (filter=precursorRefine). We tested the performance of these filters using a single filter, the combination of two filtersand combining all three filters,and comparedthese resultsto results generatedwithout afilter. Thedata was processedwiththeMXO-workflowandthesame54parame- tersettingsweusedpreviously(suppl.Figs.S1andS2)plus additionalparametersetsinvestigatingmorenarrowparent andfragmentmasserrors(suppl.Fig.S10).Theresultsofthe pre-processingbenchmarkarepresentedinFig.3.

Pre-processing with MS2Denoise and MS2Deisotope showednoperformanceimprovementcomparedtothe‘no filter’setting.Thisissupportedbythe6807and6802correct UIPsachievedatPME=15ppmandFME=0.4Da,respectively, incomparisonwiththe6805correctUIPsobtainedwithno filter. However, the data indicated that both filters partly compensatedforsub-optimalsettings.Theminimalnumber of correct UIPs increased if a PME of 5ppm with a FME

of 0.4Da was used (for no filter: n=5812, for MS2Denoise:

n=5890andforMS2Deisotope:n=5844)(Fig.3,Table1).The samewasobservedifbothfilterswerecombined:Themax- imal identification rateof 6828correct UIPs (PME=15ppm, FME=0.6Da) was no improvement compared to the ‘no filter’ settings. In contrast, the lowest identification rate, forwhichaPMEof5ppmand aFMEof0.4Dawasapplied, increasedfrom 5812correctUIPs (nofilter) to 5919correct UIPs(MS2Deisotope+MS2Denoise).

TheuseofprecursorRefinehadanoticeableimpactonthe performance(Fig.3,suppl.Fig.S8),asthemaximalnumber ofcorrectUIPsincreasedfrom 6805(nofilter)to6910(pre- cursorRefine)(Table1).Moreover,thetopidentification rate wasachievedwhensmallmasserrorswereused(PME=5ppm, FME=0.4Da,instrument=Orbitrap).Thelowestidentification rate, 5821 correct UIPs, was also achieved by using small masserrors(PME=5ppm,FME=0.6Da,instrument=FT).For datafromtheOrbitrap,identificationratesincreasedandthe impact ofparameter settings wasreduced.Specifically, the rangeof3.2%wasreducedbyover50%comparedtothewider rangeof8.13%forthe‘nofilter’settings.Wedidnotobserve thesamepositiveeffectfordataobtainedfromtheFT,where theresultsweremorecomparabletothoseobtainedwhenno filterswereused(suppl.Fig.S9).

Wetestedthecombinationofallthreepre-processingfil- tersandthiscombinationperformedbest,with6938correct UIPs,whenusingPME=5ppmandFME=0.4Da(Table1).Using smallermasserrorsdidnotleadtofurtherimprovements.

3.4. Benchmarksample

Wepooled20,103syntheticpeptidestocreatethetestsam- ple(seeSection2).AnalyzingthesampleontheFTandon theOrbitrapgeneratedabout11,500spectraperfilefora90- mingradientandaround13,000spectraperfilefora120-min gradient,respectively(Table2).

Theuseofasyntheticpoolallowedustoidentifyassign- mentsabovethechosencutoffascorrectiftheymatchedthe syntheticpeptidesequences.However,thetypeofsynthetic peptidesweusedlimitedusindrawingfurtherconclusions aboutpotentiallyincorrectPSMs.Althoughwecouldcertifya PSMasbeingcorrect,itwasnotpossibletorejectPSMsastruly falsewhentheydidnotmatchasyntheticpeptidesequence.

Eachofthe 20,103crude syntheticpeptidesthatwepooled inordertogenerateoursampledatasetwereproducedusing SPOTsynthesis,whichmeansthattracesofby-productscould alsobefoundinthepool.By-productsaredominatedbypep- tidesequenceswhichhaveoneormoregaps(missingamino acids)atcertainpositionsinthetargetedpeptidesequence [37].Thequantityoftheseby-productsisusuallysignificantly lower than the amountoftargetedpeptide sequences [37].

This does not mean, however, that these peptide variants cannotbedetectedbymassspectrometry.Dependingonthe workflow,upto9%ofalluniquelyidentifiedpeptides(UIPs)on a1%FDRlevelmatchedoneoftheseby-products(upto6.15%

matchedaone-gapby-productandupto2.7%matchedatwo- gapby-product).Weperformedtwolabel-freequantification experimentsusingOpenMS[38],onewiththedefaultparame- tersandonewithrelaxedqualitycriteriaforfeaturedetection.

Inbothcases,wedetectedfeaturesfortheoriginalsynthetic

(8)

Fig.3–Boxplotsofthesearchresults(UIPs)fordifferentpre-filteringoptions.Eachboxplotcontainstheresultsfrom108 databasesearches(54differentparametersettings,eachfordatafilesacquiredwithtwodifferentmassspectrometers).

Pre-filteringtoolswereMS2Denoising(N),MS2Deisotoping(I)andprecursorRefine(R),whichwereappliedeither

individuallyorinvariouscombinations(NI,NR,IR,NIR).Thegreendaggersrepresentdatapointsoutsidetherangebetween thefirstandthirdquartile.(Forinterpretationofthereferencestocolorinthisfigurelegend,thereaderisreferredtotheweb versionofthisarticle.)

peptides(defaultparameter:n=5843;lessrestrictiveparam- eter:n=6000)andby-productswithuptotwogaps(default parameter:n=6027;lessrestrictiveparameter:n=6188).

4. Discussion

Here,wepresentalarge-scalestudyinvestigatingstrategiesto improveMS/MSpeptideidentificationandtomakedatabase searchesmorerobust.Acomplexpoolofsyntheticpeptides wascreatedinordertocomprehensivelybenchmarkthedif- ferentanalysisworkflowsbasedonthenumberofcorrectly identifiedpeptidesequencesfromthiswellcharacterized,but complex,sample.Theuse ofsyntheticpeptidesasarefer- ence sampleaddressedissues ofother sampletypes, such ascomplexityandcertaintyaboutthetruepositivematches.

Thisallowedanimprovedinterpretation ofthebenchmark resultsthatwasindependentofestimationsbasedondecoy databases.Weinvestigatedtheimpactof108differentsearch parametersettingsontheidentificationperformanceofthree searchengines,Mascot,X!TandemandOMSSA,theeffectof combiningmultiplesearchenginesinasingleapproachand theprocessingofionmassspectrapriortodatabasesearching, totalingaround1800distinctcombinations.

Firstly,wetestedandoptimizedtheparametersettingsfor eachsearchengineindividually.Overall,thedatashowedthat Mascot,X!TandemandOMSSAperformedatacomparable levelifoptimalsearchparameterswereused.However,there were distinct performancedifferencesusing otherparame- tersets.OMSSAachievedthebestperformanceofthesearch enginestestedbutwasmostsensitivetotheparametersett- ings.Mascotwasmorerobustinthefaceofparameterchanges

Table2–Samplepropertiesformeasurementsofthesyntheticpeptidedataset.

Measurement Massspectrometer LC-gradient(min) MS1scans MS2scans

1 LTQ-FTUltra 90 4899 11,491

2 LTQ-FTUltra 90 4890 11,752

3 LTQ-FTUltra 120 6570 12,963

4 LTQ-FTUltra 120 6509 12,760

5 LTQ-OrbitrapXL 90 4036 11,445

6 LTQ-OrbitrapXL 90 4040 11,244

7 LTQ-OrbitrapXL 120 5228 13,819

8 LTQ-OrbitrapXL 120 5206 13,795

(9)

butwasnegativelyaffectedwhenasmallPMEwasselected anddidnot benefittothe same extentfromthe increased sensitivityofanOrbitrapcomparedtoanFT.X!Tandemben- efitedmostfromtheincreasedinstrumentperformanceand wasleastaffectedbyparameterchanges.Itdid,however,iden- tifyfewercorrectUIPsunderoptimalconditionsthantheother searchengines.ThisislikelyaresultofX!Tandemsalgorithm thatbenefitsfromfindingmorethanonepeptidefromeach protein.Generally,thechoiceofoptimalparametersettings wasmoreimportantandhadalargerinfluenceonthenumber ofcorrectlyidentifiedPSMsthandidthechoiceofthesearch engineitself.

Wecontinuedourbenchmarkstudybytesting combina- tionsofthedifferentsearchenginesinasingleworkflow.Here, the datasupportedtwo mainconclusions. Firstly,integrat- ingtheresultsfrommultiplesearchenginesusingiProphet reducedtheimpactofsub-optimalparametersettings and, secondly,ithelpedtoincreasethenumberofUIPsidentified.

TheMXO-combinationperformedbetterthantheotherwork- flows tested, independently ofthe applied parameter sets.

Howeverthedataalsoindicatedthatperformancedoesnot increaselinearlywiththe numberofsearchenginesadded toa workflow.Under optimal conditions, the XO-workflow performedalmostaswellasthe MXO-workflow,suggesting thatthesecombinations,usedoptimally,identifyclosetothe maximalnumberofdetectableUIPsinadatasetusingthe currenttechnology.Ingeneral,weobservedthateachwork- flowreachedmaximumidentification rateswhen aPMEof 15–25ppmwasusedinconjunctionwithasmallFMEof0.4Da.

LoweringthePME,i.e.to5ppm,ledtoadecreaseinthenumber ofcorrectidentificationsforalltheworkflowstested.

After identifying the optimal combination of search engines, we investigated the potential benefit of pre- processingpeak lists prior to their processingwith search engines. We postulated that pre-processing of ion mass spectraleadstofurtherimprovementsinthe identification performancewhilesignificantlydecreasingtheimpactofany sub-optimalparametersettings.DatafromtheLFQ-FTUltra werebestanalyzedwithacombination ofMS2Denoiseand MS2DeisotopeinconjunctionwithamoderatePMEof15ppm andaFMEof0.6Da.Theoptimalpre-processingstrategyfor dataobtainedfrom the Orbitrap wasto apply all threefil- ters,namely,MS2Denoise,MS2DeisotopeandprecursorRefine, togetherwithasmallPMEof5ppmandaFMEof0.4or0.6Da.

Loweringtheerrorsfortheparentmassorthefragmentmass didnotfurtherincreasetheperformance.

Thepeptidesusedinthisstudyaretheso-calledcrudepep- tides,inwhichsomepeptide-basedby-productsarepresent atahigh enough concentrationtobe detectableby MS.X!

Tandem detecteda higher percentageof by-products than didMascotorOMSSAbecauseitsscoringalgorithmdoesnot relyoninformationabout thepeakintensityoftheprecur- sorions.Therefore,spectrawithweakersignals(asexpected forby-products)arenotandarethereforereportedasPSMs.

ThatthesePSMsare notfalsepositiveassignmentsissup- portedbytwofacts:Firstly,theothertwoidentificationtools, MascotandOMSSA,alsodetectedsomeoftheby-products at1%FDRonthe PSM-leveland,secondly,wewere ableto quantifysomeoftheseby-productsbylabel-freequantifica- tionwithOpenMS.Althoughwedemonstratedthatasmall

proportionofthesynthesisby-productsisdetectableandis probablycorrectlyassigned,wecouldnotdeterminewhether thesematchesweretrulycorrect.Thisiswhyweonlyconsid- eredPSMstobecorrectwhentheymatchedoneofthe20,103 syntheticpeptidesequences.

Several recent publications havedemonstrated that the results from pure decoy databases and the use of entrap- ment databases differ [39,40]. Asit is still unclear how to bestselecttheentrapmentdatabase,bothintermsofnumber ofsequences butalsowhichorganismsthatare sufficiently divergedfromtheorganismunderstudy.Wehaverefrained fromusingentrapmentdatabasesandoptedtousethemore traditional reversedecoy database [3]. Itisalsopossibleto use muchwider parentmasswindowsandinstead relyon filteringtheresults[41].Thisoptionwasnotexplored.Some parametershadnoimpactontheresultsasexemplifiedby the peptideProphetmodelsdespite previousreportson the topic [42].Itisstill unclearwhy ourresultsare notinline withtheliteratureandshouldbeexploredfurther.Bothmass spectrometersusedinthisstudyhaverelativelysimilarspec- ificationsanditisunclearhowthefindingsherewouldhold forothertypesofinstruments[43].

Insummary,theresultsofourbenchmarkstudyshowed thatthecorrectchoiceofparametersettingshasalargeinflu- enceontheidentificationperformanceofthesearchengines weevaluated.Thesearchparametercombinationstestedled toidentificationratesofbetween36%and93%onthePSM- levelandbetween14%and90%ontheUIP-level.Weassessed theinfluenceexertedbyeachoftheparametersandfoundthat themassspectrometertypeandtheallowedmasserrors(PME, FME)havethelargestimpactonvariationsintheresults.There are other untestedparameterssuchasthesizeofthepro- teindatabase,whichcaninfluencetheresult.Ingeneral,the searchenginesperformedbetterwhenthemassspectrometer hadahighersensitivity,suchastheOrbitrap,andwhenmass errorsof15ppm(PME)and0.4Da(FME)wereused.Reducing bothmasserrorsfurther,especiallythePME,ledtoadecreased identificationrate.Othersearchparameters,suchasthenum- ber ofallowedmissed cleavagesand thestatistical models ofPeptideProphet,onlyhadaminimalimpactonimproving theresultsfromthedatabasesearchengines.Theseresults demonstrated thatusingadvancedidentification workflows wasthekeytosuccessfullyimprovingtheunderstandingof themeasureddatawhilekeepingthefalse-positivehitstoa minimum.

Conflict of interest

Theauthorsdeclarenocompetingfinancialinterests.

Transparency document

TheTransparencydocumentassociatedwiththisarticlecan befoundintheonlineversion.

Authors’ contribution

AQ and LM conceived the project and wrote the initial manuscript.AQundertooktheexperimentplanning,sample

(10)

preparation,workflowdefinitionanddataanalysis.LEwrote prototypesofthenodesusedinthetestedworkflows.LEalso wrotethepost-processing toolsappliedinthe pipeline.AB implementedtheconceptofsuper-workflowsintoP-GRADE, whichwas usedtoperform thedataanalyses usedinthis study.HWpreparedthe OpenMSprocessingworkflow used forthelabel-freequantificationofthebenchmarksample.MB providedearlystageaccesstothepre-processingalgorithms inmsconvertandwrotepartsoftheMethodssectionexplain- ingthesefilters.PKsupportedtheprojectwithresourcesand wrotepartsofthemanuscript.RAsupportedandfinancedthis project,gavevaluablescientificinputandwrotemajorparts ofthismanuscript.

Acknowledgments

TheauthorswouldliketothankChristopherPaulseforimple- mentingthepre-processingfiltersin msconvert.Wewould alsoliketothankOliverRinnerforhelpingpreparethecrude peptides mixture usedin this paper, as well as Alexander Leitnerforfruitfuldiscussionsregardingtheinstrumentsett- ings and handling and Paola Picottifordiscussing matters regardingthesyntheticpeptides.Wewouldalsoliketoextend thankstotheSyBITprojectoftheSystemsX.chintiativeand theBrutussystemadministratorsforsupportwithcomputing infrastructureandotherIT-relatedresources.

Appendix A. Supplementary data

Supplementarydataassociatedwiththisarticlecanbefound, intheonlineversion,atdoi:10.1016/j.euprot.2014.10.001.

references

[1] NesvizhskiiAI,VitekO,AebersoldR.Analysisandvalidation ofproteomicdatageneratedbytandemmassspectrometry.

NatMethods2007;4(10):787–97.

[2] MatthiesenR.Methods,algorithmsandtoolsin computationalproteomics:apracticalpointofview.

Proteomics2007;7(16):2815–32.

[3] EliasJE,GygiSP.Target-decoysearchstrategyforincreased confidenceinlarge-scaleproteinidentificationsbymass spectrometry.NatMethods2007;4(3):207–14.

[4] BenjaminiYHY.Controllingthefalsediscoveryrate:a practicalandpowerfulapproachtomultipletesting.JRStat SocBMet1995;57(1):289–300.

[5] FrankA,PevznerP.PepNovo:denovopeptidesequencingvia probabilisticnetworkmodeling.AnalChem

2005;77(4):964–73.

[6] LamH,DeutschEW,EddesJS,EngJK,KingN,SteinSE,etal.

Developmentandvalidationofaspectrallibrarysearching methodforpeptideidentificationfromMS/MS.Proteomics 2007;7(5):655–67.

[7] KellerA,EngJ,ZhangN,LiXJ,AebersoldR.Auniform proteomicsMS/MSanalysisplatformutilizingopenXMLfile formats.MolSystBiol2005;1:0017.

[8] CottinghamK.Manualvalidationisahotproteomicstopic.

AnalChem2005;77(5):92.

[9] DeutschEW,ShteynbergD,LamH,SunZ,EngJK,CarapitoC, etal.Trans-proteomicpipelinesupportsandimproves

analysisofelectrontransferdissociationdatasets.

Proteomics2010;10(6):1190–5.

[10] KallL,CanterburyJD,WestonJ,NobleWS,MacCossMJ.

Semi-supervisedlearningforpeptideidentificationfrom shotgunproteomicsdatasets.NatMethods2007;4(11):923–5.

[11] KellerA,NesvizhskiiAI,KolkerE,AebersoldR.Empirical statisticalmodeltoestimatetheaccuracyofpeptide identificationsmadebyMS/MSanddatabasesearch.Anal Chem2002;74(20):5383–92.

[12] TannerS,ShuH,FrankA,WangLC,ZandiE,MumbyM,etal.

InsPecT:identificationofposttranslationallymodified peptidesfromtandemmassspectra.AnalChem 2005;77(14):4626–39.

[13] EngJK,McCormackAL,YatesJR.Anapproachtocorrelate tandemmassspectraldataofpeptideswithaminoacid sequencesinaproteindatabase.JAmSocMassSpectrom 1994;5(11):976–89.

[14] ColingeJ,MasselotA,GironM,DessingyT,MagninJ.OLAV:

towardshigh-throughputtandemmassspectrometrydata identification.Proteomics2003;3(8):1454–63.

[15] TabbDL,FernandoCG,ChambersMC.MyriMatch:highly accuratetandemmassspectralpeptideidentificationby multivariatehypergeometricanalysis.JProteomeRes 2007;6(2):654–61.

[16] PerkinsDN,PappinDJ,CreasyDM,CottrellJS.

Probability-basedproteinidentificationbysearching sequencedatabasesusingmassspectrometrydata.

Electrophoresis1999;20(18):3551–67.

[17] CraigR,BeavisRC.TANDEM:matchingproteinswith tandemmassspectra.Bioinformatics2004;20(9):1466–7.

[18] GeerLY,MarkeySP,KowalakJA,WagnerL,XuM,Maynard DM,etal.Openmassspectrometrysearchalgorithm.J ProteomeRes2004;3(5):958–64.

[19] QuandtA,MasselotA,HernandezP,HernandezC, MaffiolettiS,AppelRD,etal.SwissPIT:anworkflow-based platformforanalyzingtandem-MSspectrausingtheGrid.

Proteomics2009;9(10):2648–55.

[20] ShteynbergD,DeutschEW,LamH,EngJK,SunZ,TasmanN, etal.iProphet:multi-levelintegrativeanalysisofshotgun proteomicdataimprovespeptideandproteinidentification ratesanderrorestimates.MolCellProteomics2011;10(12).

M111.007690.

[21] ParkCY,KlammerAA,KallL,MacCossMJ,NobleWS.Rapid andaccuratepeptideidentificationfromtandemmass spectra.JProteomeRes2008;7(7):3022–7.

[22] NahnsenS,BertschA,RahnenfuhrerJ,NordheimA, KohlbacherO.Probabilisticconsensusscoringimproves tandemmassspectrometrypeptideidentification.J ProteomeRes2011;10(8):3332–43.

[23] TabbDL,MaZQ,MartinDB,HamAJ,ChambersMC.

DirecTag:accuratesequencetagsfrompeptideMS/MS throughstatisticalscoring.JProteomeRes2008;7(9):3838–46.

[24] KlimekJ,EddesJS,HohmannL,JacksonJ,PetersonA,Letarte S,etal.Thestandardproteinmixdatabase:adiversedata settoassistintheproductionofimprovedpeptideand proteinidentificationsoftwaretools.JProteomeRes 2008;7(1):96–103.

[25] IvanovAR,ColangeloCM,DufresneCP,FriedmanDB,Lilley KS,MechtlerK,etal.Interlaboratorystudiesandinitiatives developingstandardsforproteomics.Proteomics

2013;13(6):904–9.

[26] MarxH,LemeerS,SchliepJE,MatheronL,MohammedS, CoxJ,etal.Alargesyntheticpeptideandphosphopeptide referencelibraryformassspectrometry-basedproteomics.

NatBiotechnol2013;31(6):557–64.

[27] PicottiP,AebersoldR,DomonB.Theimplicationsof proteolyticbackgroundforshotgunproteomics.MolCell Proteomics2007;6(9):1589–98.

(11)

[28] FrankR.TheSPOT-synthesistechnique.Syntheticpeptide arraysonmembranesupports–principlesandapplications.

JImmunolMethods2002;267(1):13–26.

[29] TheUniversalProteinResource(UniProt)2009.NucleicAcids Res2009;37(Databaseissue):D169–74.

[30] ColingeJ,MasselotA,CarbonellP,AppelRD.InSilicoSpectro:

anopen-sourceproteomicslibrary.JProteomeRes 2006;5(3):619–24.

[31] PedrioliPG,EngJK,HubleyR,VogelzangM,DeutschEW, RaughtB,etal.Acommonopenrepresentationofmass spectrometrydataanditsapplicationtoproteomics research.NatBiotechnol2004;22(11):1459–66.

[32] SyBIThttp://www.sybit.net [33] MySQLhttp://www.mysql.com

[34] KessnerD,ChambersM,BurkeR,AgusD,MallickP.

ProteoWizard:opensourcesoftwareforrapidproteomics toolsdevelopment.Bioinformatics2008;24(21):2534–6.

[35] WeisserH,NahnsenS,GrossmannJ,NilseL,QuandtA, BrauerH,etal.Anautomatedpipelineforhigh-throughput label-freequantitativeproteomics.JProteomeRes 2013;12(4):1628–44.

[36] FarkasZKP.P-GRADEportal:agenericworkflowsystemto supportusercommunities.FutureGenerComputSyst 2011;27(5):454–65.

[37] PicottiP,RinnerO,StallmachR,DautelF,FarrahT,DomonB, etal.High-throughputgenerationofselected

reaction-monitoringassaysforproteinsandproteomes.Nat Methods2010;7(1):43–6.

[38] SturmM,BertschA,GroplC,HildebrandtA,HussongR, LangeE,etal.OpenMS–anopen-sourcesoftwareframework formassspectrometry.BMCBioinform2008;9:163.

[39] GranholmV,NobleWS,KallL.Onusingsamplesofknown proteincontenttoassessthestatisticalcalibrationofscores assignedtopeptide-spectrummatchesinshotgun

proteomics.JProteomeRes2011;10(5):2671–8.

[40] VaudelM,BurkhartJM,BreiterD,ZahediRP,SickmannA, MartensL.Acomplexstandardforproteinidentification, designedbyevolution.JProteomeRes2012;11(10):5065–71.

[41] BeausoleilSA,VillenJ,GerberSA,RushJ,GygiSP.A probability-basedapproachforhigh-throughputprotein phosphorylationanalysisandsitelocalization.Nat Biotechnol2006;24(10):1285–92.

[42] MaK,VitekO,NesvizhskiiAI.Astatisticalmodel-building perspectivetoidentificationofMS/MSspectrawith PeptideProphet.BMCBioinform2012;13(Suppl.16):S1.

[43] ColaertN,DegroeveS,HelsensK,MartensL.Analysisofthe resolutionlimitationsofpeptideidentificationalgorithms.J ProteomeRes2011;10(12):5555–61.

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

We utilize the distribution of code churn to describe the overall data distribution in the benchmark and understand to what extent it can be used to support software testing

Major research areas of the Faculty include museums as new places for adult learning, development of the profession of adult educators, second chance schooling, guidance

The decision on which direction to take lies entirely on the researcher, though it may be strongly influenced by the other components of the research project, such as the

In this article, I discuss the need for curriculum changes in Finnish art education and how the new national cur- riculum for visual art education has tried to respond to

 we want to analyse the complexity of generating all interesting sentences in terms of the number of evaluations of the interestingness predicate. - we show that it depends not

On Figure 3 the result of 1D continuous complex wavelet transform for time series of point from Oceania is presented.. This figure is built using Matlab’s Wavelet Toolbox, so we can

I examine the structure of the narratives in order to discover patterns of memory and remembering, how certain parts and characters in the narrators’ story are told and

In this work, we performed a systematic analysis of related literature, not limiting the search to any specific engineering field, with the aim to find solutions in non- software