• Nem Talált Eredményt

+ A Simple and Effective Technique for Assisted Genome Assembly

N/A
N/A
Protected

Academic year: 2022

Ossza meg "+ A Simple and Effective Technique for Assisted Genome Assembly"

Copied!
1
0
0

Teljes szövegt

(1)

A Simple and Effective Technique for Assisted Genome Assembly

Krisztian Buza, Bartek Wilczyński, Norbert Dojer Computational Biology and Bioinformatics

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw (MIMUW) chrisbuza@yahoo.com, bartek@mimuw.edu.pl, dojer@mimuw.edu.pl

Background: assisted genome assembly

TAGACTGGTC GGTCAGATGT CTGGTCAGAT CAGATGTGCG

Chr1: ATCTGCGTGTAGATTGGTC...

Chr2: CGCGTACGCGATAGTTACA...

+

Input: short reads

+ genome of a related organism (reference genome)

GACTGGTCAG TAGACTGGTC

GGTCAGATGT

CAGATGTGCG AGATGTGCGC CTGGTCAGAT

AACTGCGTGT

contig1: AACTGCGTGTAGACTGGTCCTGGTCA GATGTGCGC...

assembler

Output: target genome

Chr1: ATCTGCGTGTAGATTGGTCGCGCATGAGTAG...

TGAGTAG...

n m

d

n GTAGATTGGT

m

d m n Quality score*

S. pombe 3000 1000 500

5

A. thaliana – 1000 500

5

* (Range of quality scores: 0..93)

Our approach: Simple Assistance

- Generate artificial reads from the reference with low quality scores (“real” reads have priority over the artificial ones)

- Add the artificial reads to the input of a (de-novo) assembler

Genome assembly

Assembly for mapping

1 M 2 M 3 M 4 M Number of reads (millions)

1 M 2 M 3 M 4 M Accuracy (%)

1 M 2 M 3 M 4 M

- Benchmark: assembly of the genome of S. Pombe-HP

- Gold standard: assembly produced by Amos using all the reads

- With Cov50* we mean the number of largest contigs that cover

together 50% of the gold standard.

- Our simple assistance (using

Velvet as assembler) outperforms both (i) the de novo assembler

Velvet, and (ii) the contigs of the assisted assembler Amos for the case when only few reads are

available.

12 M 10 M 8 M

Number of reads (millions)

Number of reads (millions) 6 M

4 M 2 M

30 K

20 K

10 K

Assembly Uniquely mappable “Extra

mappable”

(from 0.5M and

10M reads resp.) to the

assembly to the

reference S.

Pombe -HP

(~12M)

Amos, repl.1 465 581

426 465 22 295 Simple A., repl. 1 1 062 969 175 551 Amos, repl.2 421 997

365 802 25 400 Simple A., repl. 2 1 409 327 295 749 S.

Pombe -Mmi1 (~12M)

Amos, repl.1 681 272

692 239 21 019 Simple A., repl. 1 1 959 980 593 627 Amos, repl.2 1 126 555

1 118 799 26 939 Simple A., repl. 2 2 403 156 450 995

A.

thaliana- cell line (~150M)

Amos, sample 1 54 316 189

57 855 074 732 040 Simple A., s. 1 71 285 707 13 145 143 Amos, sample 2 72 239 517

76 318 470 762 523 Simple A., s. 2 87 399 447 10 818 192 Amos, sample 3 64 660 055

68 723 765 816 757 Simple A., s. 3 82 407 548 13 063 222 “Extra mappable” - reads that could not be mapped uniquely to

the reference directly, but could be mapped uniquely to the reference via mapping to the assembly

- We produced the assemblies from the input reads - We mapped the IP-reads with Bowtie2

Cov50*

ATCTGCGTGT GTCGCGCATG

GACTGGTCAGAGATGTGCGC

AACTGCGTGT

CGCATGAGTA GCGTGTAGAT

Acknowledgement Summary

References

Nathaniel Parrish, Benjamin Sudakov and Eleazar Eskin (2013): Genome reassembly with high- throughput sequencing data, The Eleventh Asia Pacific Bioinformatics Conference

Sante Gnerre, Eric S. Lander, Kerstin Lindblad-Toh, David B. Jaffe (2009): Assisted assembly:

how to improve a de novo genome assembly by using related species, Genome Biology, 10:R88

Mihai Pop, Adam Phillippy, Arthur L. Delcher, Steven L. Salzberg (2004): Comparative genome assembly, Briefings in Bioinformatics, Vol 5. No 3.

We propose a simple technique for assisted genome assembly.

Our technique is based on generation of artificial reads from the reference genome. According to our experiments, our

method outperforms Amos in cases where very few reads are available and the target genome is relatively closely related to the reference genome.

Number of

covered bases (millions)

This project was supported by the Foundation for Polish Science within the Skills programme co-financed by the European Union European

Cohesion Fund.

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

The Oxford Nanopore Technologies MinION sequencing yielded 7,370 reads with an average read length of 1,512 nts and average genome coverage of 1,285... Determination of the 5’-

Based on sequencing of 103 viral RNA genome, fi ve cases revealed genotype B3 (data were kindly provided by 104 Dr. Zita Rig´o, National Reference Laboratory for Measles and

Based on sequencing of viral RNA genome, fi ve cases revealed genotype B3 (data were kindly provided by Dr. Zita Rig´o, National Reference Laboratory for Measles and Rubella,

Our study reports the identification and complete genome characterization of a novel RNA virus in an amphibian, the agile frog, where the putative viral ORF1 protein

While research began on the genome sequencing of MAdV-2, a new MAdV type (MAdV-3) had been isolated from striped field mouse (Apodemus agrarius) in Slovakia.. The genome of

Of course, several options are available, a very simple and popular method of payment procedure will be introduced in my study: the payment

Sequence coverage is the fraction of the genome covered by reads. Coverage ~ 2 Coverage

The FastQ files can be mapped to any reference genome, while the bam files contain reads already aligned to the FJ616285.1 and hg19 genomes.. These aligned files can be analysed using