• Nem Talált Eredményt

The importance and illustration of the literature search and databases

N/A
N/A
Protected

Academic year: 2022

Ossza meg "The importance and illustration of the literature search and databases"

Copied!
63
0
0

Teljes szövegt

(1)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 1 Development of Complex Curricula for Molecular Bionics and Infobionics Programs within a consortial* framework**

Consortium leader

PETER PAZMANY CATHOLIC UNIVERSITY

Consortium members

SEMMELWEIS UNIVERSITY, DIALOG CAMPUS PUBLISHER

The Project has been realised with the support of the European Union and has been co-financed by the European Social Fund ***

**Molekuláris bionika és Infobionika Szakok tananyagának komplex fejlesztése konzorciumi keretben

***A projekt az Európai Unió támogatásával, az Európai Szociális Alap társfinanszírozásával valósul meg.

(2)

Explore the known information:

The importance and illustration of the literature search and databases

(Molekulák világa )

(Az ismert információk felfedezése: az irodalomkeresés és az adatbázisok fontossága és ezek bemutatása)

Compiled by dr. Péter Mátyus

(3)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 3

Table of Contents

1. Introduction 5 – 18

2. Useful tools and applications 19 – 26

3. Chemical databases 27 – 38

4. University databases 39 – 43

5. Free databases 44 – 48

6. Protein databases 49 – 62

(4)

Information about a relevant project - novelty test

- reproduction

- walk the known way as long as possible...’

- biological effect/property - new project

What is the pupose of literature searc?

(5)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 5

I. Introduction

(6)

Publications in a scientific journal

• letter / short communication

• full article

• review

Scientific lectures

• on a conference (abstract)

• posters (abstract)

Scientific Publications

(7)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 7

Impact factor

The impact factor (IF) is a measure reflecting the average number of citations to articles published in science journals. It is frequently used as a proxy for the relative importance of a journal within its field.

In a given year, the impact factor of a journal is the average number of citations received per paper published in that journal during the two preceding years.

28.751 26.372

The impact factor was devised by Eugene

Garfield, the founder of the Institute for Scientific Information (ISI). Impact factors are calculated yearly for those journals that are indexed in Thomson Reuter's Journal Citation Reports.

http://thomsonreuters.com/products_services/science/science_products/a-z/journal_citation_reports

(8)

Basic citation data:

Impact factor and citation

Citation is a reference to a published source. A citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose of acknowledging the relevance of the works of others to the topic of discussion at the spot where the citation appears.

A prime purpose of a citation is intellectual honesty; to attribute to other authors the ideas they have previously expressed, rather than give the appearance to the work's readers that the work's authors are the original wellsprings of those ideas.

(9)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 9

• the title of the publication

authors (the first is the most important)

abstract (summary)

In a citation, always should be given:

title of the journal(abbrevations!)

volume/issue number, page number(s)

yearof punlication

A scientific article in a database (Chemical Abstracts)

(10)

Available as a ‘Journal’ booklet or can be downloaded and printed (as a pdf file) A scientific journal usually has:

• an abstract

• an introduction chapter

• a materials and methods chapter

• a discussion chapter (results)

• a conclusions chapter

• a references (literature)

A scientific article printed form

(11)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 11

What we know:

Journal: Tetrahedron, volume) 66, th number of the page: 2331

get abstract

http://www.sciencedirect.com

EXAMPLE: find a scientific article in a database (ScienceDirect)

get the whole

article (as a pdf)

(12)

What we know:

keyword (term): tert-amino effect, (in abstract/title), an Author name: Matyus

http://www.sciencedirect.com

EXAMPLE: find a scientific article in a database (ScienceDirect)

(13)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 13

Patents

A patent is a set of exclusive rights granted by a state (national government) to an inventor (or inventors) or its (their) assignee for a limited period of time in exchange for a public disclosure of an invention.

The exclusive right granted to a patentee is the right to prevent others from making, using, selling, or distributing the patented invention without permission.

Typically, a patent application must include one or more claims defining the invention which must be new, non-obvious, and useful or industrially applicable. In many countries, certain subject areas are excluded from patents, such as business methods and mental acts, etc.

These rights vary widely between countries according to national laws and international agreements.

Forrás: http://www.msz.hu/

(14)

Patent offices

A patent office is a governmental (or intergovernmental) organization which controls the issue of patents. They are government bodies that may grant a patent or reject the patent application based on whether or not the application fulfils the requirements for patentability.

Hungarian Patent Office: http://www.msz.hu/

European Patent Office: http://ep.espacenet.com/

United States Patent and Trademark Office: http://www.uspto.gov/

(15)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 15

Some typical search options:

keyword (in title/abstract)

• patent application number

• patent publication number

applicant (institute/firm)

inventor (person)

A patent office’s database (European patent office)

(16)

A patent in printed form

Available as a pdf file, it can be downloaded from a patent office’s website

A patent usually has:

• a bibliography

• an abstract

• an description chapter

• a claims chapter

• a search report

(17)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 17

What we know:

patent publication number: WO9929655 http://ep.espacenet.com/

EXAMPLE: find a patent on a patent office’s website

(18)

What we know:

Keyword: SSAO, inventor: Matyus http://ep.espacenet.com/

EXAMPLE: find a patent on a patent office’s website

(19)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 19

II. Useful tools and applications

(20)

The International Union of Pure and Applied Chemistry (IUPAC) serves to advance the worldwide aspects of the chemical sciences and to contribute to the application of chemistry in the service of Mankind. As a scientific, international, non-governmental and objective body, IUPAC can address many global issues involving the chemical sciences.

IUPAC provides various types of electronic resources:

• Educational resources

• Databases

• Nomenclature and Terminology

• Other

IUPAC

(21)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 21

General

Principles of Chemical Nomenclature: a Guide to IUPAC Recommendations

Leigh, G.J.; Favre, H.A. and Metanomski, W.V.

Blackwell Science, 1998 [ISBN 0-86542-6856]

The Gold Book

Compendium of Chemical Terminology

Gold, V.; Loening, K.L.; McNaught, A.D. and Shemi, P.

Blackwell Science, 1987 [ISBN 0-63201-7651(8)]

IUPAC nomenclature books

(22)

The Blue Book

Nomenclature of Organic Chemistry Rigaudy, J. and Klesney, S.P.

Pergamon, 1979 [ISBN 0-08022-3699]

A Guide to IUPAC Nomenclature of Organic Compounds (recommendations 1993)

Panico, R.; Powell, W.H. and Richer, J-C.

Blackwell Science, 1993 [ISBN 0-63203-4882]

Corrections published in Pure Appl. Chem., Vol. 71, No. 7, pp.1327-

IUPAC nomenclature books

(23)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 23

http://www.chemaxon.com/

A free tool to generate IUPAC name for a compound

ChemAxon application:

MarvinSketch

„Free ongoing provision of all tools for teaching, including licenses to allow students of the department to use during

tuition”

(24)

Encode chemical structure with ASCII: SMILES

The simplified molecular input line entry specification (SMILES) is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.

In July 2006, the IUPAC introduced the InChI as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (e.g., graph theory) backing.

(25)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 25

The IUPAC International Chemical Identifier is a textual identifier for chemical substances, designed to provide a standard and human-readable way to encode molecular information and to facilitate the search for such information in databases and on the web. Developed by IUPAC and NIST during 2000–2005, the format and algorithms are non-proprietary and the software is freely available under the open source LGPL license (though the term "InChI" is a trademark of IUPAC).

C H3

OH

InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3

InChI Key=LFQSCWFLJHTTHZ-UHFFFAOYAB O

O

H OH

O O

H O

H InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1 InChI Key=CIWBSHSKHKDKBQ-JLAZNSOCBT

Encode chemical structure with ASCII: InChi, InChi keys

(26)

ACD Labs application:

ChemSketch

„Advanced Chemistry

Development (ACD/Labs) has donated free ChemSketch

licenses to numerous academic institutions.”

A free tool to generate SMILES/InChi codes for a compound

(27)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 27

III. Chemical databases

(28)

Some commercially available chemical database

Chemical Abstracts Database

• Company: Chemical Abstracts Service (CAS)

• Application: Chemical Abstracts Scholar

• http://www.cas.org/

Reaxys (Beilstein)

• Company: MDL ELSEVIER

• Application: Reaxys

• https://www.reaxys.com/

The Cambridge Structural Database (CSD)

• Cambridge Crystallographic Data Centre (CCDC)

(29)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 29

What is Chemical Abstracts?

http://www.cas.org

CAS (Chemical Abstracts Service) is a division of the American Chemical Society. CAS is the most authoritative and comprehensive source for chemical information.

…monitors, indexes, and

abstracts the world's chemistry- related literature and patents, updates this information daily, and makes it accessible…

(30)

Chemical Abstract Registration number (CASRN?

CAS Registry Numbers (often referred to as CAS RNs or CAS Numbers) are unique identifiers for chemical substances. A CAS Registry Number itself has no inherent chemical significance but provides an unambiguous way to identify a chemical substance or molecular structure when there are many possible systematic, generic, proprietary, or trivial names.

CAS RN 1219909-65-5 is the most recent CAS Registry Number

CAS Registry Numbers are used in many other public and private databases as well as chemical inventory listings and, of course, are

(31)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 31

Forrás: http://www.cas.org

CAS databases

Patent and journal references

from all scientific disciplines Substance information Chemical synthesis information CAplus > 32 million documents

MEDLINE>18 million references

>53 millionorganic and inorganic substances

>61 million sequences

>23 millionsingle- and multi- step reactions

CAplus-Journal articles and patent documents from chemistry and related sciences

• Proteomics

• Genomics

• Biochemistry

• Biochemical genetics

Organic

• Macromolecular

• Applied

• Physical, inorganic, analytical MEDLINE - Produced by NLM, and covers all areas in the broad field of biomedicine

Informationabout the many different types of substances, including:

• Synonyms

• Molecular formulas

• Nucleic acid and protein sequences

• Ring analysis data

• Structure diagrams

• Experimental and

calculated property data

Reaction information consisting of:

• Structure diagrams for reactants and products

• CAS Registry Numbers for all reactants products, reagents, solvents, and catalysts

• Yields for many products

extual reaction information

(32)

Patent and journal references

from all scientific disciplines Substance information Chemical synthesis information CAplus

1907 to present, plus many records from earlier years

More than 10,000scient. journals Patents from60 patent authorities

• Conference proceedings

• Technical reports

• Books

• Dissertations

• Reviews

• Meeting abstracts

• Electronic-only journals

• Web preprints MEDLINE

1947 to present

Complete coverage from 1957 to present

Many substances back to the early 1900s

New substances as identifiedby the CAS Registry System

GenBank sequences Organic and inorganic substances including:

• Alloys

• Coordination

• Compounds

• Minerals

• Mixtures

• Polymers

• Salts

1840 to present

Journals covered for Chemical AbstractsTM since 1985

Patentscovered for CA from 1991 to present

CAS databases

(33)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 33

CAS databases

(34)

SciFinder is a research discovery tool, suitable for both professional searchers and research scientists. You do not have to be an expert searcher

What is SciFinder?

(35)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 35

Refine

Too many hits…

Filter by:

• further keywords

• author name

• date of publication

• type of documents

• etc.

Analyze

Organize hit list by:

• author name

• company name

• date of publication

SciFinder – search by Research topic

(36)

SciFinder – search by Structure

SciFinder has it own built-in molecule drawing tool to carry out structure-based searches.

(37)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 37

• a graphical hit-list gives us a detailed information about the compound

• it is possible to save and organize results

SciFinder – search by Structure

(38)

Search for a reaction results a list of reaction schemes which generally gives information about the reaction conditions:

- reactant/reagents order of application

- reaction time - temperature - catalysts, etc.

SciFinder – search by Structure

(39)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 39

IV. University databases

(40)

Free databeses at the University

Semmelweis Egyetem Central Library

• Semmelweis University http://www.lib.sote.hu/

• Journal Vatabase

• Other database

‘Elektronikus Információszolgáltatás’ (EISZ)

• National program http://www.eisz.hu/

• Web of Science (WoS)

• Science Direct

(41)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 41

Databases available through (EISZ)

(42)

EISZ: Web of Science

http://thomsonreuters.com/products_services/scientific/Web_of_Science Web of Science® provides researchers, administrators, faculty, and students with quick, powerful access to the world's leading citation databases. Authoritative, multidisciplinary

(43)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 43

EISZ: ScienceDirect

http://www.sciencedirect.com/

(44)

V. Free databases

(45)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 45

Public Medline (PUBMED):

• U.S. National Library of Medicine

• includes over 19 million citations from MEDLINE and other life science journals

• http://www.pubmed.gov/

Free databeses available through the world wide web

Protein databases: Uniprot és PDB Org

• http://www.uniprot.com/

• http://www.pdb.org/

• Protein sequences, structures and protein relataed data

(46)

PubMed: biomedical literature

PubMed comprises approximately 20 million citations for biomedical literature from MEDLINE, life science journals, and online books. PubMed citations and abstracts include the fields of medicine, nursing, dentistry, veterinary medicine, the health care system, and preclinical sciences. PubMed also provides access to

additional relevant Web sites and links to the other NCBI molecular biology resources.

PubMed is a free resource that is developed and maintained by the National Center for Biotechnology Information (NCBI), at the U.S. National Library of Medicine (NLM), located at the National Institutes of Health (NIH).

(47)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 47

PubMed: Genome Project

(48)

http://pubchem.ncbi.nlm.nih.gov/

PubMed: Pubchem Project

PubChem provides information on the biological activities of small molecules. It is a component of

(49)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 49

VI. Protein databases

(50)

Structures deposited in the Protein Data Bank (PDB) are assigned a unique four letter code which is often called PDB accession code or PDB code. Because of the PDB's importance as the central repository for biological macromolecular structures, the PDB code is often used in the scientific literature to refer to a particular structure which has been used in a study.

By convention, the PDB code consists of a single numeric digit followed by three alphanumeric characters. The PDB code is not case sensitive, i.e. 1abc and 1ABC refer to the same structure. For classification purposes, e.g. for the directory structure of the PDB archive, the two middle characters (the second and third character of the PDB code) are sometimes used as an index to group PDB codes into not too large and equally sized bins. This two-letter code is preferred over the first and second character because the number of possible values for the first character is limited to the ten digits and the majority of PDB codes in use starts with the character '1'.

PDB identification code

(51)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 51

Accession number (AC)

Forrás: http://www.uniprot.org/

This subsection of the ‘Entry information’ section provides one or more accession number(s). These are stable identifiers and should be used to cite UniProtKB entries. Upon integration into UniProtKB, each entry is assigned a unique accession number, which is called ‘Primary (citable) accession number’.

UniProtKB accession numbers consist of 6 alphanumerical characters in the format:

1 2 3 4 5 6

[A-N,R-Z] [0-9] [A-Z] [A-Z, 0-9] [A-Z, 0-9] [0-9]

[O,P,Q] [0-9] [A-Z, 0-9] [A-Z, 0-9] [A-Z, 0-9] [0-9]

Examples: A2BC19, P12345, P4A123, Q1AAA9

(52)

Entry name

The UniProtKB/Swiss-Prot entry name consists of up to 11 uppercase

alphanumeric characters with a naming convention that can be symbolized as X_Y, where:

• X is a mnemonic protein identification code of at most 5 alphanumeric characters;

• The ’_’ sign serves as a separator;

• Y is a mnemonic species identification code of at most 5 alphanumeric characters.

The mnemonic code ‘X’ is an abbreviation of the protein/gene name, which does not necessarily correspond to the recommended protein name or to the gene name.

Code(X) Recommended protein name Gene name

B2MG Beta-2-microglobulin B2M

(53)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 53

The .pdb file format

www.uniprot.org

The Protein Data Bank (pdb) file format is a textual file format describing the three dimensional structures of molecules held in the Protein Data Bank. Most of the information in that database pertains to proteins, and the pdb format accordingly provides for rich description and annotation of protein properties. However, proteins are often crystallized in association with other molecules or ions such as water, ions, nucleic acids, drug molecules and so on.

(54)

The .pdb file format

HEADER, TITLE and AUTHOR records

provide information about the researchers who defined the structure;

numerous other types of records are available to provide other types of information

REMARK records

can contain free-form annotation, but they also accommodate standardized information; for example, how to compute the coordinates of the

experimentally observed multimer from those of the explicitly specified ones of a single repeating unit

(55)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 www.uniprot.org 55

SEQRES records

give the sequences of the peptide chains (named A, B and C etc.), which are veryshort in this example but usually span multiple lines

ATOM records

describe the coordinates of the atoms that are part of the protein. The first three floating point numbers are its x, y and z coordinates and are in units of Ångströms. The next three columns are the occupancy, temperature factor, and the element name, respectively

HETATM records

describe coordinates of hetero-atoms, that is those atoms which are not part of the protein molecule

The .pdb file format

(56)

UniProt Protein database

The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross- references, and clear indications of the quality of annotation in the form of

(57)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 57

A search UniProt example

(58)

A search UniProt example

(59)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 59

A search UniProt example

(60)

The Protein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, other animals, and humans. Understanding the shape of a molecule helps to understand how it

http://www.pdb.org/

RCSB Protein Data Bank

(61)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 61

RCSB Protein Data Bank

Advanced Search: Allows searches of all types - database fields, browsable ontologies, and text searches

Search organicm: browse based on NCBI Taxonomy

(62)

RCSB Protein Data Bank

Advanced Search Options

Author Name

Chain Length

Chemical ID

Chemical Name

Citation

Crystal Properties

Deposit Date

Enzyme Classification

Expression Organism

Keywords

Latest Released Structures

Macromolecule Name

Macromolecule Type

Molecular Weight

(63)

2011.10.07.. TÁMOP – 4.1.2-08/2/A/KMR-2009-0006 63

RCSB Protein Data Bank

PDB structures can be viewed on the site in 3D with free plugin PDB

viewers. Download of additional free software is required (or that the Web browser be configured correctly)

Several free interactive viewer software can be downloaded from the web:

KiNG Jmol WebMol QuickPDB

Protein Workshop

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

Fats notably contribute to the enrichment of the nutritional quality of food. The presence of fat provides a specific mouthfeel and pleasant creamy or oily

There are several possible origins for substances in soil inhibitory to micro- organisms, and in different soils different sets of factors may operate; but while specific

Qualitative and quantitative literature data on the isoflavonoid composition of the two Ononis species is limited to the compounds available as standard substances

The HYPER group performed a high-volume resistance training program designed to produce muscle growth and increase strength (HYPER: n = 18; males = 7 and females = 11), whereas the

The aim of this paper is to present two acoustic beamforming methods developed for rotating sources, namely the Rotat- ing Source Identifier (ROSI) and the Virtual

1.) We found a significant mastitis-predictive value of the elevated BHB level postpartum, but not to any other of NEB related changes in circulating levels of hormones

Az archivált források lehetnek teljes webhelyek, vagy azok részei, esetleg csak egyes weboldalak, vagy azok- ról letölthet ő egyedi dokumentumok.. A másik eset- ben

A WayBack Machine (web.archive.org) – amely önmaga is az internettörténeti kutatás tárgya lehet- ne – meg tudja mutatni egy adott URL cím egyes mentéseit,