IN SILICO ANALYSIS OF THIOTEMPLATE MULTIDOMAIN GENE CLUSTERS IN SACCHAROMONOSPORA AZUREA
Andrea Valasek 1 , Kitti Csepregi 1 , Zsuzsanna Tóth 1 , Ildikó Kerepesi 1 , Benjamin Frey 1 , Ágota Pénzes 1,2 , Ákos Juhász 2 , Balázs Horváth 3 , István Nagy 3 , Csaba Fekete 1
Department of General and Environmental Microbiology, University of Pécs, Hungary
1; PannonPharma Ltd, Pécsvárad, Hungary
2; Bay Zoltán Nonprofit Ltd., Szeged, Hungary
3.
Introduction
A wide range of biologically active products are synthesized by thiotemplate modular systems (TMSs) including polyketide synthases (PKSs), non-ribosomal peptide synthetases (NRPSs) and hybrid PKS-NRPS enzymes. The TMSs are multifunctional proteins that are structurally organized in modules. Each module consists of individual domains for distinctive functions. Variation of domains within the modules affords the structural diversity observed in the resultant products. Furthermore, these metabolites offer wide functional diversity such as antibiotics, immunosuppressive agents and antitumor drug properties.
Increasing administration of antibiotics has led to a growing number of antibiotic- resistant pathogens. As the problem of antimicrobial resistance becomes more widespread the need for new anti-infective agents is more urgent than ever.
Methods
To provide broad insights into the molecular basis of secondary metabolites biosynthesis, S. azurea strain SZMC 14600 was discovered. The genome sequencing was performed by combining cycled ligation sequencing on the SOLiD 3Plus system with 454 FLX pyrosequencing. The annotated draft genome sequence has been deposited at DDBJ/EMBL/GenBank under accession number AHBX00000000. In the present study we provide a comprehensive overview of a partial NRPS/PKS hybrid gene cluster, that localize on AHBX00000215 contig (100 kbp) and it have been named as sah genecluster. Complementary to the structural genomics, to get insight into the gene expression, digital transcriptome profiling (RNA-seq) has also been performed not only S. azurea SZMC 14600 but also a biologically non-active metabolite(s) producer, S. azurea NA-128(DSMZ 44631).
Results
Summary and Conclusions
In our work over 300 kbp span chromosomal region was analyzed that encode different thiotemplate multidomains. However, in that work ~100 kbp long partial hybrid NRPS-PKS genecluster was presented called sah. Genes and domains organization with proposed functions as well as predicted product were identified by computational analysis.
We hope that our structural genomics efforts will form a foundation for the subsequent research steps, like intelligent drug design and target discovery.
SahA (type I trans-at pks)
SahB (nrps-like protein) SahC (pks/nrps-like protein)
SahD (hybrid pks-nrps)
SahE (type I trans-at pks)
B
C
Figure 1. A, Genetic organisation of AHBX00000215 contig of S. azurea SZMC-14600.
Hypothetic NRPS/PKS hybrid genes were marked with dark green. Potential regulator, tailoring and resistance regions were labeled by light green.
B, SahA-SahH are corresponding proteins with the domain organisation. Green shading indicates regions encoding PKS modules. Orange shading indicates regions encoding NRPS modules. KS: ß-ketoacyl-ACP synthase; AT: acyltransferase; DH:
dehydratase; PP: peptidyl carrier protein; KR: ß-ketoacyl-ACP-reductase; A:
adenylation; C: condensation; MT: methyltransferase. Intermodular linkers are marked with black squares. C, Predicted core structure of the product is based on the composition of sah gene cluster.
Figures were generated by antiSMASH program tool.
Module 13
SahF (pks/nrps-like protein)
DH PP KS Module 14
SahG (nrps)
Gene product
Size
(aa) Protein homolog ID Proposed function
1 Sah1 518
Saccharomonospora cyanea NA-134
(ZP_09747286.1) 85% apolipoprotein N- acyltransferase
2 Sah2 257 Saccharomonospora cyanea NA-134
(ZP_09747285.1) 92% glycosyl transferase 3 Sah3 114 Saccharomonospora
marina XMU15
(ZP_09741799.1) 94% hypothetical protein 4 Sah4 213 Saccharomonospora
cyanea NA-134
(ZP_09747283.1) 92% thymidine kinase 5 Sah5 482 Saccharomonospora
cyanea NA-134
(ZP_09747282.1) 95% major facilitator superfamily permease
6 Sah6 260 Saccharomonospora cyanea NA-134
(ZP_09746722.1) 78% cobyric acid synthase 7 Sah7 167 Saccharomonospora
viridis DSM 43017
(YP_003134009.1) 71% hypothetical protein 8 Sah8 256 Nocardiopsis sp. FU40
(AEP40926.1) 63%
streptomycin biosynthesis operon
regulator 9 Sah9 261 Cordyceps militaris
CM01
(EGX87936.1) 38% FkbM family methyltransferase
10 Sah10 332 Mycobacterium sp.
JDM60111
(YP_004525157.1) 57% hypothetical protein 11 Sah11 313 Saccharopolyspora
erythraea NRRL2338
(YP_001103877.1) 58% alkanesulfonate monooxygenase
12 Sah12 398 Saccharopolyspora erythraea NRRL2338
(YP_001107923.1) 54% cytochrome P450-like enzyme
13 SahA 6029 Streptomyces albus
(ABS90475.1) 46% beta-ketoacyl synthase 14 SahB 1164 Streptomyces
coelicoflavus ZG0656
(EHN77489.1) 48% amino acid adenylation
15 Sah15 377
Streptomyces bingchenggensis
BCW-1
(YP_004967910.1)
71% hypothetical protein
16 SahC 893
Streptomyces bingchenggensis
BCW-1
(YP_004967911.1)
52% beta-ketoacyl synthase
17 SahD 3019
Streptomyces bingchenggensis
BCW-1
(YP_004967901.1)
52% amino acid adenylation protein
18 SahE 5212
Streptomyces bingchenggensis
BCW-1
(YP_004958923.1)
47%
mixed polyketide synthase/non- ribosomal peptide
synthetase
19 SahF 890
Bacillus
amyloliquefaciens CAU-B946 (YP_005130401.1)
43% polyketide synthase
Gene product
Size
(aa) Protein homolog ID Proposed function
20 Sah20 721 Streptomyces
bingchenggensis BCW-1
(YP_004967899.1) 41% polyketide synthase 21 SahG 1977 Brevibacillus brevis NBRC
100599
(YP_002773465.1) 48% non-ribosomal synthetase
22 Sah22 1133 Streptomyces
bingchenggensis BCW-1
(YP_004967897.1) 55% hypothetical protein
23 Sah23 465
Stackebrandtia nassauensis DSM 44728
(YP_003513810.1) 37% cytochrome P450 24 Sah23 461
Stackebrandtia nassauensis DSM 44728
(YP_003513810.1) 37% cytochrome P450
25 Sah25 317
Thermomonospora curvata DSM 43183
(YP_003299696.1) 57%
alcohol dehydrogenase
zinc-binding domain-containing
protein
26 Sah26 349
Mycobacterium avium 104
(YP_881053.1) 41%
dihydrodipicolinate reductase N- terminus domain- containing protein 27 Sah27 490
Saccharomonospora cyanea NA-134
(ZP_09747387.1) 83% subtilisin-like serine protease 28 Sah28 413 Mycobacterium sp. MCS
(YP_638505.1) 68% amidohydrolase 29 Sah29 230
Saccharomonospora glauca K62
(ZP_09688970.1) 84% hypothetical protein
30 Sah30 399
Saccharomonospora cyanea NA-134
(ZP_09747320.1) 81%
isochorismate synthase family
protein 31 SahH 550 Saccharomonospora
cyanea NA-134
(ZP_09747319.1) 89% peptide arylation enzyme
32 Sah32 223
Saccharomonospora cyanea NA-134
(ZP_09747318.1) 84% isochorismate hydrolase
33 Sah33 81
Saccharomonospora glauca K62
(ZP_09690350.1) 80% aryl carrier domain- containing protein
34 Sah34 298
Saccharomonospora cyanea NA-134
(ZP_09747316.1) 91% siderophore- interacting protein
35 Sah35 320
Saccharomonospora cyanea NA-134
(ZP_09747315.1) 85%
ABC-type Fe3+- hydroxamate transport system
36 Sah36 337
Saccharomonospora cyanea NA-134
(ZP_09747314.1) 84%
ABC-type Fe3+- siderophore transport system 37 Sah37 342
Saccharomonospora cyanea NA-134
(ZP_09747313.1) 87% transporter permease
38 Sah38 244
Saccharomonospora cyanea NA-134
(ZP_09747312.1) 87%
ABC-type cobalamin/Fe3+-
siderophore transport system
Module 15 Module 17
SahH (nrps-like protein)
*
* *
A PP Module 5
KS PP Module 6
KS AT DH cMT PP C A PP C A PP Module 7 Module 8
C A MT PP C Module 16
Fig. 2. Comparative transcriptome profiles of AHBX00000215 contig of S. azurea SZMC14600 (green) and AGIU02000015 contig of S. azurea NA-128(DSMZ 44631) (blue).
Predicted ORFs of the contig are numbered (1-39). Proposed functions for individual ORFs are summarized in Table 1. Significant differences were marked with * symbol (p<0,005).
4500 4200 800
100 100
0 75
50
25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
~ ~
~ ~
Normalized gene expression level
*
* *
Module 1 Module 2 Module 3 Module 4
KS AT DH KR PP KS AT DH KR PP KS AT DH KR PP KS DH PP
Module 9 Module 10 Module 11 Module 12
KS AT KR PP KS DH KR PP KS PP KS PP AT KR PP KS
Module 14
A
Module 17
1bp Saccharomonospora azurea AHBX00000215 contig 99165 bp
sahA sahB sahC sahD sahE sahF sahG
A
1 2 3 4 5 6 7 8 9 10 11 12 15 20 22 23 24 25 26 27 28 29 30 32 33 34 35 36 37 38
sahH
Table 1. Deduced functions of ORFs in the sah genecluster of S. azurea-SZMC14600 AHBX00000215 contig. Listed protein homologues are indicated according to h (in GenBank by BlastP). ID, indicate percentage identity (%)The color-coding indicates the main types of genes, that is of thiotemplate genes ( ), regulation ( ), tailoring enzymes ( ), and resistance ( ).
This work was supported in part by the grant of SROP-4.2.1.B-10/2/KONV-2010-0002, SROP-4.2.2/B-10/1-2010-0029 Developing the South-Transdanubian Regional University Competitiveness, Baross grant DA07-DA TECH 07-2008-0045, and NKTH Teller program grant OMFB-00441/2007.