High lncRNA inter-individual expression variability highlights another striking biology feature that distin- guishes lncRNAs from mRNAs. The finding that expres- sion variability is more prominent in new lncRNA loci and reduced in reference lncRNA annotations also indi- cates it can influence identification. Thus public annota- tions based on limited numbers of human donors or derived from single animal or plant inbred strains, may have reduced representation of variably expressed lncRNAs. We demonstrate this with the GEUVADIS LCL RNA-seq data derived from one cell type, by show- ing that adding more donors to the analysis identifies more lncRNA genes in the humangenome. The number of lncRNA loci increased continuously, with novel lncRNA showing a more striking increase than known lncRNAs. The MiTranscriptome study that used a donor number per tissue comparable to our LCL analysis  identified three-fold more novel lncRNAs than present in the three commonly used public databases (see above references). Our results also indicate that a granulocyte lncRNA annotation based on 10 donors, is most likely at the lower part of the donor saturation curve for this cell type. Moreover, our finding that the identification of novel lncRNA loci does not plateau even with 120 donors indicates that comprehensive annotation of lncRNAs in the humangenome requires as many indi- viduals as possible. The identification of high lncRNA intra- and inter-individual expression variability has im- plications for identifying lncRNAs and assessing their function and potential medical use. LncRNAs that lack consistent expression in some individuals are unlikely to be necessary for normal cell function, but may be func- tional in an age, environment, lifestyle, or disease related manner as shown for some protein-coding genes [54, 70]. At the same time, it cannot be assumed that a robustly expressed lncRNA has an important function in the cell type in which it is expressed. For example, the develop- mentally important Airn lncRNA retains robust expres- sion after performing its silencing function . Our results support the view that functional studies require an understanding of basic lncRNA biology in different indi- viduals before they can be interpreted [36, 72].
For a general overview on SNP frequencies in the humangenome, nine basic genomic regions were derived based on the genomic information provided by the UCSC genome browser. The information needed to calculate the genomic coordinates of these regions for every gene was downloaded from UCSC genome browser and includes chromosome, strand, transcription start site (TSS), transcription end site (TES), CDS start (coding start site—CSS), CDS end (coding end site—CES), exon starts and exon ends. These regions comprise: inter- genic region, CpG islands, promoter region, 5’ UTR, coding exons, 3’ UTR, all exons, introns, and intragenic region, see Fig 1 . The regions were defined in the following way: every gene is located between two intergenic regions. The first one is defined as the interval between the TSS of the considered gene and the mid-upstream position between this TSS and the TES of the closest upstream gene. The second intergenic region is defined analogously according to the TSS of the closest downstream gene. The intragenic region of a gene is defined as the part between its TSS and its TES. The gene promoter was defined as the region from 2000 bp upstream to 1000 bp downstream of the TSS and thus overlaps with the intergenic region. 5’ UTRs are defined as the exonic segments between the TSS and the CSS while 3’ UTRs are defined analogously as the exonic regions between the CES and the TES. Exons are defined as the intervals between the exon start positions and exon end positions as given in the file retrieved from the UCSC genome browser. Introns are defined as the regions between the exonic gene parts. Besides these nine general regions, we also considered narrow sequence windows of ±200 bps around transcription and translation start sites as well as in direct vicin- ity (–15 to +13 bps) of the TSS and CSS.
Other questions concern unintended genetic alterations to the germ line. The problem is particularly clear in the case of therapies which have to intervene in the early embryo, for instance when the embryo can only survive thanks to the therapeutic intervention (e.g. in the case of a genetic disorder affecting implan- tation in the uterus). Other examples involve complex malformations which man- ifest very early in embryonic development and cannot be corrected at a later stage. The earlier the genetic engineering intervention takes place, the more difficult it is to rule out an unintended alteration of the germ line. Hence this raises the question whether a germ line alteration of this kind can be tolerated as a side effect which is admissible, according to the Embryo Protection Act, in the case of the inoculation, chemotherapy and radiation of a born human.
Metazoan genomes feature a higher order organizational structure, which is not present in the well-characterized yeast model organisms 5–7 . Contrary to yeast, the positions of replication origins in metazoan DNA do not appear to be determined by DNA sequence 8,9 . Positions and activation times of individual origins can be related to various chromatin features 3,10–14 , and molecular analyses have shown that positions of active origins, inter-origin distances and the speed of replication fork movement can vary even within individual cells 15,16 . Biological analyses of replication progression throughout S-phase in mammalian cells led to a domino-like next-in-line model 17 where replication is triggered by replication of adjacent regions. Guilbaud et al. 18 described chromosomal regions in HeLa cells with sequentially activated origins that are neither clearly early nor clearly late replicating. The existence of a long-range control of otherwise stochastic or induced ﬁring of origins in the presence of replication forks was subsequently suggested. Genome-scale mapping of DNA replication origins demonstrated general plasticity of active origin positions, which was interpreted as replicon size ﬂexibility within a predetermined replicon cluster 19 . Accordingly, the replication programme in metazoans demonstrates a high level of plasticity, thus ensuring complete genome duplication in the face of developmental and environmental changes 1 . Models of genome duplication in metazoans, therefore, need to include stochastic mechanisms to account for origins initiated at non-predetermined sites 20 and a ﬂexible spatio-temporal structure of S-phase 13,21 . Recently, a quantitative model of humangenome replication was presented by Shaw et al. 22 . By introducing clusters of origins which are ﬁred together spontaneously or by activation from a neighbouring cluster, and by implementing the observed temporal variation of fork speed 23 , the authors reproduce S-phase dynamics and replication progression on a cluster scale. However, the formation of clusters is likely to emerge from more elementary processes. The interplay of deterministic and stochastic inﬂuences in these processes, which is yet unclear 24,25 , needs to be motivated by more detailed experimental data. Besides, an adequate model of genome duplication in eukaryotes must reproduce not only the temporal dynamics, but also the spatial characteristics of DNA replication in vivo. Here, we use domino- like DNA replication progression and random loop folding of chromatin to present a minimal model of DNA replication in higher eukaryotes that is able to reproduce spatial dynamics of the replication foci (RFi) throughout S-phase without need for replicon clustering at common synthetic centres as shown in Chagin et al. 26
predominance in mouse (Waterston et al., 2002) and human (Lander et al., 2001) (40% vs. 1.4%, over 50% vs. 1.2%, respectively). A recent study using a more sensitive strategy has suggested an even higher percentage of repetitive or repeat-derived sequences, up to 66%–69%, in humangenome (de Koning et al., 2011). With repetitive sequences accounting for over 2/3 of the humangenome, it could be speculated that they might overlap, to say the least, some nucleosome-retained DNA. On the other hand, studies regarding the function of these so-called “dark matter” of the genome have persisted for decades. Current opinions showed that some repetitive sequences were involved in regulation of gene expression. Tissue-specific transcription of SINE B2 repeat in mouse was required for gene activation of the growth hormone gene, by generating short, overlapping pol II-and pol III-driven transcripts, both of which were necessary and sufficient to enable a restructuring of the regulated locus into nuclear compartments (Lunyak et al., 2007). SINE B1 elements could influence the activity of downstream gene promoters, causing a repression effect (Estecio et al., 2012). LINE1 could be activated by satellite transcripts and lead to aberrant expression of neuroendocrine-associated genes proximal to LINE1 insertions (Ting et al., 2011). LINEs may also facilitate X chromosome inactivation by participating heterochromatin formation (Chow et al., 2010). However, the results remain fragmented and a clear panoramic functional view, as has been established regarding functional genes, is yet to be structured.
analysis methods cannot account for the very large and diverse variability among individuals of a same species. As an example, the 1000 Genomes Project (1000 Genomes Project Consortium, 2015) has sequenced more than 2,504 individuals from 26 populations representing a large set of 88 million variants. An ideal analysis of a newly sequenced humangenome would benefit from a compari- son with all 2,504 human genomes of such a project. In contrast to the hu- man species which has a low degree of polymorphism, Ciona savignyi is a sea squirt species which has a very high degree of polymorphism, known to be one of the highest of any species (Leffler et al., 2012). Hence, a single genome of C. savignyi cannot be the representative of its species. Also, Mosquera-Rendón et al. (2016) mention that most of the issues in the development of an effective vac- cine against Pseudomonas aeruginosa, an important pathogen in multiple types of infections, is because each strain of the species exhibits different mechanisms responsible for its pathogenesis. A study from Tettelin et al. (2005) indicates the same obstacles with the pathogen Streptococcus agalactiae. Therefore, devel- oping an efficient vaccine against P. aeruginosa and S. agalactiae is unfeasible using reference-centric analysis methods. Consequently, linear references do not fulfill the requirements necessary for these problematics and must be augmented with information of multiple genomes.
Optical inspection of the genome browser and the obvious association with promoter regions prompted us to investigate the genomic features associated with regions of significant binding in an unbiased manner. Therefore we compared binding peaks for H3K4me3 and H3K27me3 with the following genomic annota- tions: transcriptional start sites (TSS and TSS upstream region), transcriptional end sites (TES), exons, introns and intergenic regions. When comparing with the genomic background distribu- tion of these features it became evident that both H3K4me3 as well as H3K27me3 are strongly enriched at transcriptional start sites (Fig. 4A - dark blue bar), although this association is somewhat stronger for H3K4me3. In order to obtain a systematic overview about the binding across all annotated promoters within the humangenome, we used the binding profiles for H3K4me3 and H3K27me3 in a 10 kb window around the transcriptional start sites of all RefSeq (Reference Sequence) promoters and performed cluster analysis by k-means. The heat-map represen- tation clearly indicates 3 major classes of promoters (Fig. 4B), those completely devoid of any modification, those high in H3K4me3 and virtually free of H3K27me3 and those with high levels of H3K27me3 and low but detectable H3K4me3. We reasoned that these ‘‘promoter chromatin states’’ might be associated with distinct transcriptional output. Therefore we extracted expression values for all RefSeq transcripts from a recent RNA-seq data set from the ENCODE (Encyclopedia of DNA Elements ) consortium obtained from CD14+ monocytes (Fig. 4C). As expected we find the modification free promoters to be expressed at very low levels whereas the genes with strong H3K4me3 binding across the TSS (transcription start site) have significantly higher expression. In contrast bivalently modified promoters are associated with gene expression levels comparable to those of unmodified promoters very similar to what has been described for this specific promoter class .
Received: 8 July 2020; Accepted: 31 July 2020; Published: 12 August 2020
Abstract: While ionizing radiation (IR) is a powerful tool in medical diagnostics, nuclear medicine, and radiology, it also is a serious threat to the integrity of genetic material. Mutagenic effects of IR to the humangenome have long been the subject of research, yet still comparatively little is known about the genome-wide effects of IR exposure on the DNA-sequence level. In this study, we employed high throughput sequencing technologies to investigate IR-induced DNA alterations in human gingiva fibroblasts (HGF) that were acutely exposed to 0.5, 2, and 10 Gy of 240 kV X-radiation followed by repair times of 16 h or 7 days before whole-genome sequencing (WGS). Our analysis of the obtained WGS datasets revealed patterns of IR-induced variant (SNV and InDel) accumulation across the genome, within chromosomes as well as around the borders of topologically associating domains (TADs). Chromosome 19 consistently accumulated the highest SNVs and InDels events. Translocations showed variable patterns but with recurrent chromosomes of origin (e.g., Chr7 and Chr16). IR-induced InDels showed a relative increase in number relative to SNVs and a characteristic signature with respect to the frequency of triplet deletions in areas without repetitive or microhomology features. Overall experimental conditions and datasets the majority of SNVs per genome had no or little predicted functional impact with a maximum of 62, showing damaging potential. A dose-dependent effect of IR was surprisingly not apparent. We also observed a significant reduction in transition/transversion (Ti/Tv) ratios for IR-dependent SNVs, which could point to a contribution of the mismatch repair (MMR) system that strongly favors the repair of transitions over transversions, to the IR-induced DNA-damage response in human cells. Taken together, our results show the presence of distinguishable characteristic patterns of IR-induced DNA-alterations on a genome-wide level and implicate DNA-repair mechanisms in the formation of these signatures. Keywords: radiation doses; repair mechanism; translocation; transition transversion ratio; IR-induced variants; SNVs; InDels; topological associating domains
forming one cluster and the other ST18 strains the remaining ones. SNP-based phylogeny, therefore, shows a discriminatory power to the strain level higher than the MLST scheme, being useful for typing of YE serotype O:3. The Japanese strain D6 looks even more distantly related than the other samples; in fact, 1,116 SNPs out of 2,607 (42.8%) are singularly carried by this strain. A sub-grouping according to new or downloaded sequences appears, probably due to the different methods applied for variant calling on already assembled and re-sequenced genomes (section 3.7.6). Interestingly, the English strain O3-gb contains only 2 SNPs and is, therefore, strictly clustered with the German reference strain Y11 furthest from the root, indicating recent genetic diversification. In contrast, isolate 150 forms a discrete cluster on a branch closest to the root, compared to the previous whole-genome phylogeny. The two strains D1 and D2, which have been isolated from the same patient, are clustered together, as expected. Considering the 10 isolates from animals and the 10 isolates from humans, no detected SNPs are able to distinguish between the two groups, as well as no SNPs are shared by all 8 isolates from pigs. Moreover, the human isolates are mostly located on distal branches, indicating recent mutations, while the animal samples, especially from swine, mainly lie on branches closest to the root with older mutations, in both core-genome- and SNP-based trees. If pig carriers constitute the reservoir for YE serotype O:3, this may account for the different SNP-based phylogenetic ages observed in strains of human and animal origin.
Whole-genome sequences were generated by GATC Biotech (Konstanz, Germany) using a PacBio RS II sequencer (Pacific Bio- sciences, USA). De novo assembly utilizing HGAP3 (Pacific Bio- sciences) yielded one contig each, with 282- and 322-fold cover- age. To confirm PacBio results, Illumina sequencing was performed on a MiSeq using the 2 ⫻ 300-cycle version 3 kit, as recommended by the manufacturer (Illumina, San Diego, CA, USA). Illumina reads were mapped to the PacBio contigs by the use of Geneious version 8.1.7 (Biomatters Ltd., Auckland, New Zealand). Ring closures were verified by PCR. The genome se- quences of S. aureus 08-02119 and 08-02300 are 2,796,894 bp and 2,742,807 bp in length, respectively. According to the NCBI Pro- karyotic Genome Annotation Pipeline ( 5 ) 2,829 genes and 2,747 coding sequences were predicted for S. aureus 08-02119, while
Beim Keimbahn-Genome Editing wird die Krankheit eliminiert; kausal, aber korrekt betrachtet, handelt es sich eigentlich nicht um eine Therapie. Es geht eigentlich um eine Prävention, weil es ja noch Niemanden zum Therapieren gibt. Ethisch sind solche Keimbahneingriffe natürlich sehr umstritten. Man kann auch die Notwendigkeit in Frage stellen, da sie nach bisherigem Stand der Technik immer mit einer Präimplantationsdiagnostik (PID) verbunden sein muss. Daraus ergibt sich für die meisten potentiell behandelbaren Krankheiten die Frage, wenn ich ohnehin PID mache – warum sollte man dann nicht die gesunden Embryonen heraussuchen, was in Deutschland aber natürlich nur für schwerste Krankheiten erlaubt wäre.
Complete genome sequence of two Clostridium chauvoei strains was achieved for the type strain and an isolate of German origin which had been isolated from a diseased animal ( Table 2 ). The strategy of microbial genome sequencing with PacBio long reads and HGAP successfully ﬁn- ished both genomes with high coverage and quality values without the need of manual completion or additional short read data. Earlier studies have shown the advantages of using PacBio generated long reads based on single-library and non-hybrid assemblies for completing bacterial genomes ( Brown et al., 2014; Liao et al., 2015 ). The genomic sizes of both, the type strain and the ﬁeld isolate corroborated the pub- lished genome sequence of the Swiss C. chauvoei ﬁeld strain JF4335 ( Table 3 ). No plasmid representing contig was obtained for DSM 7528 T but a 3.9-kb plasmid was identi ﬁed in strain 12S0467. Like in other bacterial species the rRNA gene cluster is used as an important ge- netic marker to differentiate and identify C. chauvoei and C. septicum by PCR ( Sasaki et al., 2000 ). Commonly, most of the rRNA genes are posi- tioned close to each other, a constellation which is considered as the major hindrance for genome ﬁnishing ( Koren et al., 2013 ). The charac- terization of the region seems to be problematic without long read based sequencing techniques such as the PacBio RS II system capable of generating reads with N50 close to 20 kb ( Rhoads and Au, 2015 ). Using this technique we were able to resolve these dif ﬁculties, our ge- nome analysis showed the presence of 87 tRNA genes and 9 rRNA gene clusters ( Table 3 ).
Based on the genome sequence of 0WXEHUFXORVLV H37Rv, filter-based DNA arrays were produced representing all known ORFs. Since the deleted genes in 0ERYLV and 0ERYLV BCG Copenhagen as compared to 0WXEHUFXORVLV H37Rv have been reported recently, the quality of the DNA arrays in terms of sensitivity and reliability was assessed by comparing the hybridisation results using gDNA from the three mycobacterial strains. In addition to the known deletions, some sequence variations which have not been reported previously could be detected using the DNA arrays and the detection system (including radioactive labelling of the targets) applied in this study. Sequence variations were observed in Rv0050, Rv0278c, Rv0279c, Rv2090, and Rv3281. The sequence variations in Rv0050, Rv2090, and Rv3281 were also investigated in other members of 0 WXEHUFXORVLV complex: 0 WXEHUFXORVLV Beijing, 0 WXEHUFXORVLV Haarlem, 0 FDQHWWLL , and 0 DIULFDQXP . Different degrees of sequence variations could be observed in these strains for the investigated genes. Most deletions detected here comprised DNA regions less than 200 bp. This suggested that the DNA arrays and the detection system used in this study was more sensitive than the array systems used recently to perform similar study. The sequence polymorphism observed here may serve as potential markers to differentiate the investigated strains mentioned above. The gene expression study revealed that these genes were expressed LQYLWUR
composed of a small nuclear encoded subunit and a large chloroplast encoded subunit giving thereby evidence of endosymbiotic gene transfer during the process of secondary endosymbiosis (Ciugulea & Triemer 2010, Farmer 2009, Schwartzbach & Shigeoka 2017). To provide further information about this process and to understand the biology of phototrophic euglenoids in the last years, the chloroplast genomes (cpGenomes) of several euglenoids have been sequenced, annotated and published (Bennett et al. 2012, 2014 & 2017, Bennett & Triemer 2015, Dabbagh et al. 2017, Dabbagh & Preisfeld 2017, Gockel & Hachtel 2000, Hallick et al. 1993, Hrdá et al. 2012, Kasiborski et al. 2016, Pombert et al. 2012). The first cpGenome was the circular chromosome of the model organism Euglena gracilis strain Z by Hallick et al. (1993) with a size of 143 bp. Phylogenetic investigations indicated that the plastids of phototrophic euglenoids, surrounded by three membranes related to the members of the genus Pyramimonas (Turmel et al. 2009). Although the genome of E. gracilis is larger than that of the presumable closest relative Pyramimonas parkeae, during the process from a green algae chloroplast over an endosymbiont to the plastid organelle of another host, the chloroplast genome underwent distinct loss of genes, which were transferred to the nuclear genome (Schwartzbach & Shigeoka 2017). This means that although fewer genes exist in the genome of E. gracilis, the genome is much larger than the one of P. parkeae. It took some time to understand the contradiction, but nowadays it is known that the chloroplast genomes of phototrophic euglenoids have an unusual high number of introns in comparison to green algae. The genome is littered with so-called group II and group III introns. These introns can be located intergenic, within the coding region of protein-coding genes, within the genes that comprise the ribosomal RNA (rRNA) operon or even within other introns, resulting in twintrons or complex twintrons (Copertino & Hallick 1991, Copertino et al. 1991 & 1992, Doetsch et al. 1998 & 2001, Michel et al. 1989, Thompson et al. 1995 & 1997).
ABSTRACT Human parechoviruses (HPeV) circulate worldwide, causing a broad va- riety of symptoms, preferentially in early childhood. We report here the nearly com- plete genome sequence of a novel HPeV type, consisting of 7,062 nucleotides and encoding 2,179 amino acids. M36/CI/2014 was taxonomically classiﬁed as HPeV-17 by the picornavirus study group.
Medical Systems Biology, UCC, Medical Faculty Carl Gustav Carus, TU Dresden, Fetscherstr. 74, 01307 Dresden, Germany
The development of genome editing tools capable of modifying speciﬁc genomic sequences with unprece- dented accuracy has opened up a wide range of new possibilities in targeted gene manipulation. In particu- lar, the CRISPR/Cas9 system, a repurposed prokaryotic adaptive immune system, has been widely adopted because of its unmatched simplicity and ﬂexibility. In this review we discuss achievements and current limitations of CRISPR/Cas9 genome editing in hemato- poietic cells with special emphasis on its potential use in ex vivo gene therapy of monogenic blood disorders, HIV and cancer.
Zeitraum 2015 bis 2016 die Zahl der Artikel in auflagenstar- ken, überregionalen Zeitschriften und Zeitungen über alle An- wendungsfelder hinweg anstieg. Medizinische bzw. therapeuti- sche Anwendungen bilden dabei einen zentralen Fokus. Charak- teristisch für die Auseinandersetzung mit dem Genome Editing ist, dass die Felder der medizinischen und landwirtschaftlichen Anwendung (Rote und Grüne Gentechnik) beim Genome Edit- ing zum Teil gemeinsam verhandelt werden, wie es in früheren 2 Die Ergebnisse der Medienanalyse ( FAZ , ZEIT , SZ , Spiegel) wurden von Julia Diekämper und Lilian Marx-Stölting am 22. 11. 2016 auf einem Workshop von TAB und IAG Gen technologiebeicht präsentiert, deren Schlussfolgerungen werden im „Vierten Gen technologiebericht“ (erscheint 2018) veröffentlicht.
In the course of analysis of secondary metabolite biosynthetic gene clusters there is a constant need to visualize pathways, as well as to add and edit pathway-related meta information and to create an overview of pathways present in one or multiple genome(s). In addition, biosynthetic models need to be submitted to bioinformatics tools such as antiSMASH in order to create annotated models, and the results must be retrieved to feed them back into in-house databases and downstream analysis workflows. These requirements were addressed in the course of this project by the development of a BiosynML plugin for the Geneious software. Geneious is a cross-platform bioinformatics software suite developed by Biomatters for search, organize and analyse genomic and protein information via a desktop program (116). One of the main advantages is its strong focus on user-friendly interface and ease of use along with the seamless integration of a number of published bioinformatics methods. The newly devised BiosynML plugin originating from this work facilitates the creation of detailed biosynthetic pathway annotations, especially for modular gene clusters responsible for the production of microbial secondary metabolites. Typical tasks include refining the automatic predictions obtained by antiSMASH, addition of domains and meta-data based on manual analysis and/or experimental results (curation), such as grouping of domains into functional modules and assignment of biosynthetic building blocks. This information is key to establishing the connection between genes and chemical compounds and also a crucial prerequisite for the design of experiments to investigate the molecular basis for secondary metabolite formation. The core functionality of the BiosynML plugin for Geneious which has been implemented as part of this work is described in the following.