This study successfully demonstrates the usability of TTIs as an excellent source for whole genomeanalysis by applying the UHD SNP array technique. Thus, TTIs can be considered as a valuable surrogate source for tumor tissue. Especially during the histopathological preparation, the produc- tion of TTIs does not add another laborious procedure besides the snap freezing of tumor tissue, which should be done in any case. The amount of cells on TTIs varied greatly between samples, resulting in a high variation of extracted DNA quantities per slide, as already reported in other studies [ 6 ] [ 7 ] [ 8 ] [ 10 ]. Compared to fresh or fresh frozen solid tumor pieces, the cell number on TTIs can be low, which can result in a limited DNA quantity. According to the Qubit data, the lowest concentration of dsDNA that could be processed successfully was 0.92 ng μL -1 (sample NB10; 4.59 ng dsDNA per slide). However, the data quality measure, i.e. the MAPD value, of sam- ple NB10 was still below the 0.25 threshold, indicating a good quality. All in all, the MAPD values show that the quality of TTI SNP profiles was comparable to SNP profiles from fresh or fresh fro- zen tumor samples. Thus, storage of the slides for up to 12 years did not significantly influence the SNP array data quality. This eliminates the need for technically challenging liquid nitrogen storage systems. The DNA of two samples (NB46 and NB47) could not be amplified by PCR. However, the reason for this failure remains unclear as quality and quantity of the DNA samples were com- parable to samples for which the SNP array procedure worked satisfactorily.
In this current study we sequenced and annotated the whole genome of the gerbil-adapted H. pylori-strain B8 (accession numbers: FN598874 for the genome, FN665651 for the plasmid). The genomeanalysis suggests that this type I-strain possibly has acquired the virulence mechanism encoded in the cag-PAI as well as other adja- cent unknown genes via horizontal gene transfer. This may have occurred during microevolution optimizing the adaptation to its hostile niche, the gastric mucosa. The relatively large number of singletons, the existence of length variable genes, and the large PZ may already reflect an adaptation-process to the gerbil stomach. Alto- gether, this pathogen may use its dynamic pool of genetic variants, representing a sufficient genetic diversity to allow H. pylori to occupy all of the potential niches in the stomach.
lication (oriC) was comparable to that of Bacillus subtilis in structure with two regions containing DnaA boxes. Similar prophages were identi ﬁed in the genomes of both C. chauvoei strains which also harbored hemolysin and bacterial spore formation genes. A CRISPR type I-B system with limited variations in the repeat number was identiﬁed. Sporulation and germination process related genes were homologous to that of the Clostridia clus- ter I group but novel variations for regulatory genes were identi ﬁed indicative for strain speciﬁc control of regu- latory events. Phylogenomics showed a higher relatedness to C. septicum than to other so far sequenced genomes of species belonging to the genus Clostridium. Comparative genomeanalysis of three C. chauvoei circular genome sequences revealed the presence of few inversions and translocations in locally collinear blocks (LCBs). The spe- cies genome also shows a large number of genes involved in proteolysis, genes for glycosyl hydrolases and metal iron transportation genes which are presumably involved in virulence and survival in the host. Three conserved ﬂagellar genes (ﬂiC) were identiﬁed in each of the circular genomes. In conclusion this is the ﬁrst comparative analysis of circular genomes for the species C. chauvoei, enabling insights into genome composition and virulence factor variation.
researchers to coordinate genomics activities and pool resources to achieve common goals in molecular breeding of Brassica crops. The primary aim of this initiative is the provision of freely available genetic resources for Brassica genomeanalysis, including mapping populations, integrated genetic maps, DNA marker sequences, genomic libraries, genomic sequences and gene expression data. Some of the genetic mapping data and one of the mapping populations described in this volume have already been deposited as public resources with the MBGP, via the internet portal http://www.brassica.info , and recently we have begun two new research projects that aim to contribute significant quantities of new genomic and transcriptomic data along with new genotyping tools and germplasm to this platform. Included in the resources we are developing are a large set of new SNP markers developed using next-generation sequencing techniques, along with a new high- density SNP map that will be sequence-annotated to the B. rapa genome sequence and to an ultradeep-expression map of B. napus seed development. Furthermore we are generating a substantial collection of over 450 genetically diverse B. napus inbred lines for association genetic studies. Included in this genotype diversity set are gene-bank core collections of winter and spring oilseed rape, fodder rape and swede varieties, a substantial set of genetically diverse, modern 00-quality winter oilseed rape varieties, and a collection of exotic lines containing rare alleles from Eastern European and Asian oilseed and vegetable accessions. The diversity set will be genotyped by colleagues at the Max Planck Institute for Breeding Research in Cologne with a large set of genome-wide microsatellite markers, and with a new, public, high-density SNP chip that is being developed by partners at the NRC-Plant Biotechnology Institute in Saskatoon, Canada. Together, this material and data will represent the most comprehensive public resource for B. napus association genetics that has been developed to date. As phenotypic, metabolomic and transcriptomic data for the diversity population is accumulated over the coming years and decades, it is anticipated that this association genetics platform will play a key role in the elucidation of important complex traits in B. napus, for example oil content, seed yield and quality traits along with resistance to biotic and abiotic stress factors.
Using the Southern blot analysis and DNA microarray we have established that rtxA is present only in weakly pathogenic Y. enterocolitica subsp. palearctica strains and is absent in highly and nonpathogenic bioserotypes. Reverse transcription analysis was carried out to determine the transcription of the genes of the RTX cluster. The positive transcripts indicate in vivo transcription of all four ORFs (ymp1, rtxH, rtxC, and rtxA) as a single mRNA. RtxA was immunoprecipitated and subsequently detected using antibody raised against a 79 kDa subfragment of RtxA that was purified from E. coli as a 6xHis-tagged protein. Two bands were detected by the RtxA-specific antibody that were not detected by the pre-immune serum. Together these bands consistent with the predicted full-length size of RtxA. These data demonstrate that the RtxA protein is synthesized and exported to culture supernatants but perhaps in a small quantity.
At the outset of sequencing the human genome, scientists indicated the need to sequence genomes of other organisms, especially those that can be experimentally manipulated in lab . These organisms, known as model organisms, included bacteria (most notably E. coli), the yeast S. cerevisiae, the fruit fly D. melanogaster, the roundworm C. elegans, and the laboratory mouse M. musculus. The objectives were several: (1) to represent through these model organisms a wide range of life forms, (2) to elucidate the func- tion of human DNA segments, such as genes, by performing experiments in these organisms using the segments themselves (by inserting them into the model genome) or their counterparts already existing in the model genome, and (3) to put forward an explanation for unexpected genomic puzzles, such as gene structure and genome organization, that have taken scientists by sur- prise. Before finishing the human genome project, many of these genomes became available. Accordingly, comparative sequence analysis of the human genome with those of these model organisms has been applied as an im- mensely valuable tool to identify regions of similarity and difference among genomes. These comparisons have already provided critical clues about the structure and function of many genomic segments, which, in many respects, have helped approach the objectives. However, these model organisms, es- pecially because they include few advanced ones, have not been able to pro- vide enough information to satisfactorily achieve the objectives. Thus, more genomes are being sequenced.
The genome of bat adenovirus 2 was sequenced and analyzed. It is similar in size (31,616 bp) to the genomes of bat adenovirus 3 and canine adenoviruses 1 and 2. These four viruses are monophyletic and share an identical genome organization, with one E3 gene and four E4 genes unique to this group among the mastadenoviruses. These findings suggest that canine adenoviruses may have originated by interspecies transfer of a vespertilionid bat adenovirus.
___________________________________________________________ 21 tissues. Various procedures have been developed to increase sensitivity and reduce the amount of RNA required. One strategy is target amplification by LQ YLWUR transcription (148). In addition, several rounds of LQYLWUR transcription can be combined with cDNA synthesis to enhance the amplification even further (149). Using these protocols, it is even possible to profile the transcripts of a single cell (150). Another strategy is post-hybridisation amplification using labelled antibodies or molecules carrying large numbers of fluorophors (151). Several studies have used target-amplification techniques to compare the expression profiles of defined cell populations extracted from tissue sections by laser-capture microdissection. However, suitable controls are required to ensure that amplification has not introduced significant experimental bias into the target preparation. This problem has been particularly evident in the expression profiling of tumour samples. In the case of solid tumours, obtaining pure populations of tumour cells for microarray analysis would require microdissection. However, a recent study using grossly dissected breast cancer specimens has demonstrated a way to circumvent the problem of sample heterogeneity (152). Expression profiles from whole solid tumours can be compared to profiles from potential untransformed infiltrating cell types, such as lymphocytes or endothelial cells, to identify a subset of genes with expression patterns that are specific to the tumour cells. Subsequent data analysis and sample clustering can then be carried out only on this “intrinsic gene subset”, which in the case of the recent study was sufficient for tumour classification (152).
In medical microbiology research, the main interest lies in the comparison of pathogenic and non-pathogenic organisms. This enables the identification of virulence factors and antibiotic resistances, which, in turn, aids targeted drug design and reverse vaccinology. In this favor, genomic comparisons of related bacterial strains in order to identify pathogenic and virulent traits have been carried out as soon as enough sequence material was available (Bolotin et al., 2004; Eppinger et al., 2004; Brzuszkiewicz et al., 2006). Due to the rapidly decreasing sequencing costs, analyses on the single nucleotide level for point mutations within many de facto identical outbreak strains became affordable (Niemann et al., 2009; Ford et al., 2011). They allow revealing the mutations responsible for enhanced or decreased fitness and prevalence. Access to multiple genomes of a pathogen unfolds all its potential antigens and enables prioritization in reverse vaccinology. Potential widely spread and potent antigen targets are selected via comparative bioinformatic analysis, engineered in the laboratory, and subsequently tested in preclinical and clinical studies (Tettelin, 2009; Sette and Rappuoli, 2010). Several in silico tools are available for reverse vaccinology, e.g. the online databases VIOLIN (Vaccine Investigation and Online Information Network) (Xiang et al., 2008) and Vaxign (He et al., 2010). The benefit of this relatively new approach is evident in several studies (De Groot and Rappuoli, 2004; Giuliani et al., 2006; Liu et al., 2009).
Parapockenvirus-Gesamtgenomanalysen stammen aus dem Jahr 2004 von Delhon und Kollegen. Sie untersuchten neben dem ORFV SA00 (Ziege) und IA82 (Schaf) auch einen Vertreter der Spezies BPSV (AR02) und konnten zeigen, dass insgesamt 132 (ORFV) bzw. 133 Gene (BPSV) kodiert werden. Die später sequenzierten Genome der PCPV-Stämme VR634 und F00.120R verfügen über 131 bzw. 134 Gene (Hautaniemi et al., 2010). Im Zentrum des DNA-Genoms aller Vertreter der Subfamilie Chordopoxvirinae findet man ein Set aus 88 konservierten Genen, die in der gleichen Reihenfolge und Orientierung auftreten. Diese Kern-Gene (engl. core genes), welche Homologien zu entsprechenden Genen im Vaccinia Virus-Genom aufweisen, kodieren im Wesentlichen essentielle Proteine der Transkriptions- und Replikationsmaschinerie (Gubser, 2004; Mercer et al., 2006; Upton et al., 2003). Demgegenüber sind am links- und rechtsterminalen Bereich der DNA nicht-essentielle, oftmals Genus- oder Spezies-spezifische Gene lokalisiert. Sie sind hauptsächlich für die Virulenz des Virus von Bedeutung oder spielen eine wichtige Rolle bei der Interaktion zwischen Virus und Wirt. So werden hier beispielsweise auch Proteine kodiert, die bei der Regulation der Immunantwort des Wirtes eine Rolle spielen (Seet et al., 2003a). Einzelne Gene bzw. offene Leserahmen (engl. open reading frames, ORFs) liegen meist nicht überlappend vor und zwischen ihnen finden sich lediglich sehr kurze nicht kodierende Abschnitte. Das Genom der Parapockenviren weist eine bemerkenswerte Plastizität auf. Bei der Adaptation des Virus an Zelllinien oder dem Passagieren in Zellkultur kann es bereits frühzeitig (nach nur wenigen Passagen) zur Deletion und Translokation von Genen bzw. ganzen Genombereichen kommen (Cottone et al., 1998; Hautaniemi et al., 2011).
de novo and template-guided assembly methods (44x and
77x), each set of contigs covered 100 % of the reference sequence and showed similar distribution patterns of contigs and repeat regions. Both sets of assembled contigs were vi- sualized using the Integrated Genomic Viewer (IGV) (https:// www.broadinstitute.org/igv/) and a consensus sequence with a length of 160,928 bp was exported (Norton-Consensus). Variant calling was performed using the filtered reads as input to the GenomeAnalysis ToolKit (GATK, https://www. broadinstitute.org/gatk/) and the tool "Unified Genotyper" with –glm switch to find both SNPs and indels in 'Norton' cp genome in comparison to the 'Maxxa' cp reference se- quence. The unfiltered output contained 426 variants, 73 of which are indels. GATK uses a modified phred quality score included in the variant output file, and a cutoff value of 1,000 was used to avoid false positive variant calls. This reduced the number of SNPs to 147, and did not affect the number of indels. Ninety nine of the 147 SNPs fall into intergenic regions, 48 are located in coding regions. About half of the 73 indels were 1 bp long, only 2 indels were longer than 10 bp. Overall the predicted variants are located only in the SSC and LSC regions of the chloroplast genome as shown in the Figure, which was produced with the program Genom- eVx (http://wolfe.ucd.ie/GenomeVx/). Similar distribution of variants has been shown in other plant species, including switchgrass (y oung et al. 2011) and poplar (k erSTen et al. 2016). Using the tool "Fasta Alternate Reference Maker" (GATK, Broad Institute), the predicted variants were incor- porated into the assembly sequence. The resulting 'Norton' chloroplast sequence is 160,903 bp long.
As described in Chapter II, efficient tools have been proposed to index and an- alyze pan-genomes. However, such methods and data structures do not cover all expected features for pan-genomeanalysis. Most of them operate only on draft or finished assemblies as input, while such assemblies are available only for a small fraction of species. Furthermore, hundreds or thousands of such assemblies might be required to characterize the pan-genome of a species, a number far much larger than what is available in most cases. By the end of February 2017, the Na- tional Center for Biotechnology Information (NCBI) Genome database (NCBI, 2017) contained 23,004 assembled genomes for which about 85 % only have one assembly available. However, unassembled reads abound in databases and rep- resent the vast majority of data available. By the end of February 2017, the NCBI Sequencing Read Archive (SRA) database (NCBI, 2007) contained about 9.8 petabases of reads. Also, methods using an assembly as reference introduce a bias in the analysis towards the reference. Finally, it has been shown that assem- bly errors can lead to an over-estimation of the number of genes inferred from an assembled genome (Denton et al., 2014). It might cause an over-estimation of the size and growth of the core, accessory and singleton genomes. Hence, an ideal data structure indexing a pan-genome should be reference-free and consider assemblies as well as reads as input to take advantage of all the data available in genomic databases.
Biosynthetic capabilities. Natronomonas contains several gene clusters involved in multistep pathways leading to the synthesis of arginine, lysine and branched-chain amino acids. Thus, in contrast to H. salinarum which lacks these gene clusters and whose growth is dependent on exogenous sources of these amino acids, N. pharaonis should exhibit a greater degree of nutritional self-sufficiency, as does Haloarcula hispanica (Hochuli et al. 1999). Genes for the complete synthesis of nicotinate, folate, thiamine, biotin, molybdopterin, cobalamine, hemes, and menaquinones are also present, sometimes clustered, and their presence confirms the observation that N. pharaonis is not dependent on exogenous vitamins for growth (Soliman and Truper 1982). Based on these findings, we have developed a very simple synthetic growth medium containing acetate and pyruvate as the sole carbon sources. Starting from a rich synthetic medium (M. Engelhardt, personal communication), we omitted those amino acids and vitamins for which we identified biosynthetic pathways in our genomeanalysis. Consistent with this, N. pharaonis is able to grow without external amino acids except leucine. Although all genes required for leucine synthesis are present, the 2-isopropylmalate synthase might not be fully functional, and leucine synthesis subsequently impaired. The 2-isopropylmalate synthase gene aligns to orthologous genes of several other species in its 5’-region but the upstream sequence does not contain a valid start codon (ATG or GTG).
In addition to the 6p21 group and normal cases, HMGA1 expression was analysed also in two UL with 12q14~15 aberrations (myoma 151B and myoma 154). Interestingly, the level of HMGA1 mRNA in these myomas (average=29.8) were also higher than normal group (average=7.2), and much closer to the range of 6p21 group (average=45) than that in normal UL (Fig. 3.6). As Williams et al. (1997) suggested these tumors may have acquired small mutations, undetectable by standard cytogenetic techniques that lead to the ectopic expression of HMGA1 in the absence of cytogenetic abnormalities. Aimed to reduce such missing, FISH analysis was performed by using HMGA1 probes. From the results, no split was revealed for HMGA1 gene in these two myomas. It is mentioned that, despite the apparent differences in interacting partners of both genes (Arlotta et al., 1997), they have a great extent of sequence and structural similarity (Tallini and Dal Cin, 1999) and a high homology in their DNA-interacting domains. Therefore, it can be suposed that HMGA1 and HMGA2 are able to replace each other functionally, at least in part. Findings of this study do also agree with Williams et al. (1997) concerning the lack of a significant correlation between HMGA1 levels and tumor size (Fig. 3.4A).
This study, to our knowledge, is the first to perform the genome-wide association study for 305-day milk production traits using the animal model (mixed model) for every SNP from 50K SNP in Dutch dairy cows. Using the animal model and accounting for all relationships between individuals in the pedigree is the most appropriate analysis for association study, since it can avoid inflated test statistic  . Daetwyler et al., Kolbehdari et al., Jiang et al. and Meredith et al. all performed association studies using the mixed model for milk production traits, accounting for the pedigree information among individuals in Canadian, Chinese and Ireland dairy populations [4, 11, 15, 16] . However, in Dutch Holstein-Friesian cows, Schopen et al. conducted the genome-wide association analysis for milk protein traits based on first lactation test-day samples using two steps: the first step using a general linear model and the second step using an animal model  . Significant SNPs from the first step were further analyzed in the animal model, and –log10 (P-value) ≥3 was the significant threshold in the second step  . They did not detect significant SNPs associated with PY using this 2 steps analysis method 
This work described two versions of an A. thaliana Nd-1 de novo genome assembly [53,213]. Although the assembly contiguity was substantially improved by long single molecule real-time (SMRT) sequencing reads, there are still genome regions missing in the second assembly. Almost 20 years after the release of the first A. thaliana genome sequence the currently available genome sequence is still incomplete. Centromeres and nucleolus organising regions (NORs) pose a challenge and require the routine generation of even longer reads or alternatively reads with substantially lower error rates [54,213]. There are first reports of single molecules sequenced via Oxford Nanopore Technologies (ONT) substantially exceeding the 2 Mbp mark . If the read length could be further increased, this technology might have the potential to finally enable the closure of the last remaining gaps in the A. thaliana genome sequence. Improvements of nanopore sequencing e.g. re-reading of the very same DNA strand  or coupling of two nanopores with different error profiles  might lead to the required improvements of ONT read quality. However, latest improvements of sequencing technologies require improved DNA extraction protocols to provide high molecular input material [213,473,474]. Therefore, the bottleneck in generating even longer reads is likely to be the DNA extraction process. Efficient separation of high molecular DNA molecules from smaller fragments would be required to harness the full potential of long read sequencing technologies.
All experiments took place at 21- 23 ◦ C temperature and 60- 70% relative humidity under bright fluorescent light or red light in case of cold-anesthesia resistance of punishment and relief learning. They were performed using a set-up comparable to that used in . The airflow to suck the odours or in some cases (see different behaviour sections) just air from outside through the arms of the set-up to the pump was produced by a vacuum pump (ME1, Vacuubrand, Aresing, Germany) and was adjusted to 4.5 l/min at the level of the pump for every experiment. For innate behaviours flies were tested in groups of ∼ 50, for learning 100-150 flies were used. The experimental setup  had four positions for processing four groups in parallel. The testing of each genotype at each position was balanced. This was critical because the position in the setup seemed to affect most behavioural scores (Kruskal-Wallis test on the data in Fig. 3.3, 3.11, 3.10 and 3.14, respectively, comparing scores between the four positions in the setup, after pooling across inbred strains: shock: H= 16.30, d.f.= 3, P = 0.001, N= 212, 218, 202, 220, BA: H= 39.36, d.f.= 3, P < 0.0001, N= 79, 83, 76, 80, OCT: H= 12.04, d.f.= 3, P = 0.0072, N= 79, 83, 76, 80, punishment: H=24.27, d.f.= 3, P < 0.0001, N= 93, 87, 90, 87, relief: H= 7.04, d.f.= 3, P = 0.0707, N= 235, 218, 239, 225). Please also note, that although male, female and unisex scores are calculated based on the same experiment, scores calculated with < 5 flies were excluded from further analysis. At 0:00 min of each assay, flies were gently introduced into a tube of 9 cm length and 1.5 cm inner diameter, coated inside with a copper wire coil, perforated at one end and this shock tube was then attached to the experimental setup. Further steps were individually designed for each behavioural assay.
A commonly used strategy for the evaluation of RNA-Seq data sets is the comparative analysis of transcriptome profiles. This approach captures all transcribed genes and detects differentially expressed genes (DEG). Based on this approach, transcripts of about 13,000 genes were detected in GV and MII oocytes and in all investigated embryonic stages. The number of DEG increased during embryonic development from 120 between GV and MII oocytes to approximately 3,000 between 16-cell embryos and blastocysts. The number of detected genes showed that the transcriptome of oocytes contained a large number of RNA species, which was stepwise altered during embryonic development. The increasing number of DEG during early embryonic development correlated inversely with the percentage of uniquely mapped reads, together indicating ongoing alterations of the transcriptome. A RNA-Seq study compared pools of normal bovine IVF blastocysts with degenerated ones ( Huang et al., 2010 ). In their study, they could identify approximately 16,000 genes as transcribed. They defined, that a gene was considered to be expressed if a single read was mapped to the coding region. Without using any biological replicates, they reported about 20% more transcribed genes in the study as presented in this thesis, but it is statistically questionable to take a single read in a single replicate as evidence for the presence of a transcribed gene.
In the previous chapter we mainly focused on a measure of gene family-free genome comparison for two genomes. Here, we go beyond pairwise comparisons and dis- cuss a gene family-free model for the reconstruction of a possible candidate for the common ancestor of three genomes. In doing so, we extend the gene family-based problem of computing the mixed multichromosomal breakpoint median to a gene family- free setting. The present chapter is similarly structured as the previous: After a short review of the gene family-based problem in the subsequent section, we propose a gene family-free generalization. We then discuss its computational complexity by proving that the presented problem is MAX SNP-hard. Further, we formulate a 0-1 linear program that allows us to compute exact solutions. Whereas our model for computing family-free adjacencies between two genomes tolerated events of gene duplication and loss, the herein presented model is susceptible to gene losses and resolves gene duplications only to a limited extent. We discuss the effects of gene family evolution in our presented model and proceed to present a 0-1 linear pro- gram for computing gene family-free adjacencies between three genomes, thereby extending results of the previous chapter. Our algorithm gives rise to a heuristic approach to construct a median of three genomes in a family-free setting. We then compare both methods in simulated datasets. Lastly, we use our heuristic method to reconstruct the genome sequence of the black death again from genome sequences of three Yersinia pestis strains. We compare our results to those of Rajaraman et al. .
11 th Young Scientists Meeting 2018, Braunschweig, Germany, November 14-16
Lörincz-Besenyei et al.
Potato improvement by genome editing
Enikö Lörincz-Besenyei 1,2 , Thorben Sprink 2 , Janina Metje 2 , Uwe Sonnewald 3 and Björn Krenz 1 1 Leibnitz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig 2 Julius Kühn Institute, Institute for Biosafety in Plant Biotechnology, Quedlinburg