• Nem Talált Eredményt

http://mekosztaly.oszk.hu/mia/doc/How to catalogue a web archive

N/A
N/A
Protected

Academic year: 2022

Ossza meg "http://mekosztaly.oszk.hu/mia/doc/How to catalogue a web archive"

Copied!
15
0
0

Teljes szövegt

(1)

05/23/2022 National Széchényi Library – Hungary 1

BIBLIOTHECA NATIONALIS HUNGARIAE

Márton Németh – László Drótos

How to catalogue a web archive?

Some solutions for metadata management at the web harvesting pilot project of

National Széchényi Library, Hungary

INFINT 2018, Bratislava, October 23, 2018

(2)

questions:

• what is the subject of the description?

• what kind of metadata is needed?

• in what format?

• how this data can be produced?

• what can this data be used for?

(3)

05/23/2022 National Széchényi Library – Hungary 3

BIBLIOTHECA NATIONALIS HUNGARIAE

Source: webverse.org

(4)

granularity

• the living web is one single document

(a huge, ever-changing, unlimited hypermedia)

• the archived web is a versioned (time-stamped) file depository

• collection method: selective / event-based /

domain-wide harvest, automatic submit, deposit

• levels of description: collection, sub-collection, website, website unit, document, file

• user needs, scale of archiving, available staff

(5)

05/23/2022 National Széchényi Library – Hungary 5

BIBLIOTHECA NATIONALIS HUNGARIAE

metadata types (website level)

• bibliographic: e.g. title (lots of variations),

creator/contributor/publisher (uncertain roles), rights (unclear legal status), dates (what kind of dates?), subject/type (very mixed content) ...

• administrative: e.g. curator, nominator, urgency, permission request, harvesting schedule, quality assurance, access ...

• technical: original CMS, harvester software, harvest parameters, size of the downloaded content,

storage, long-term preservation ...

(6)

recommendations

ISO/TR 14873:2013 – Statistics and quality issues for web archiving (collection level indicators)

Descriptive Metadata for Web Archiving / OCLC Web Archiving

Metadata Working Group (mostly site-level bibliographic data fields, based on the Dublin Core schema)

Metadata Application Profile for

Description of Websites with Archived Versions / New York Art Resources

Consortium (site-level, MARC/RDA)

(7)

05/23/2022 National Széchényi Library – Hungary 7

BIBLIOTHECA NATIONALIS HUNGARIAE

database plan for the Hungarian webarchive (website level)

(8)

our metadata records

• a small publicly available demo collection

• XSD (XML Schema Definition) and XSLT (Extensible Stylesheet Language Transformations) files

• predefined lists (e.g. genre, type, topic, subtopic, change frequency, harvest frequency, quality level)

• namespace links (person and geographic names)

• related sites (on the living web and in the archive)

• site-level and subcollection-level XML records

• manual data entry with XML Notepad (temporarily)

(9)

05/23/2022 National Széchényi Library – Hungary 9

BIBLIOTHECA NATIONALIS HUNGARIAE

the mia.xsd file (website level)

(10)
(11)

05/23/2022 National Széchényi Library – Hungary 11

BIBLIOTHECA NATIONALIS HUNGARIAE

metadata of the Óbuda Museum’s blog (original XML and converted HTML format)

(12)
(13)

05/23/2022 National Széchényi Library – Hungary 13

BIBLIOTHECA NATIONALIS HUNGARIAE

future plans

database and form-based data entry interface (as part of the new nation-wide library system)

cooperation with other memory institutions (e.g. shared cataloging)

automatic and semi-automatic metadata generation (mostly technical and administrative data)

automatic entity identification and extraction from the full text (e.g. names, events, concepts)

enriching metadata from external sources (e.g. DBpedia)

incorporate metadata of important archived websites into the national bibliography

faceted full text hit lists by metadata

(14)
(15)

05/23/2022 National Széchényi Library – Hungary 15

BIBLIOTHECA NATIONALIS HUNGARIAE

thank you for your attention!

project homepage: http://mekosztaly.oszk.hu/mia/

project description in English:

http://netpreserve.org/about-us/members/orszagos-sz echenyi-konyvtar/

demo web archive:

http://mekosztaly.oszk.hu/mia/demo/

“404 not found” workshop (Budapest, November 15, 2018):

http://mekosztaly.oszk.hu/mia/404_workshop.html

contact e-mail address: mia@mek.oszk.hu

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

- „A Közgyűjteményi Digitalizálási Stratégia (KDS) célja, hogy biztosított legyen a nemzeti kultúrális kincseink, a közgyűjteményi tartalmak minél szélesebb

• Magyar webtér: A magyar doménregisztrálók által magyarországi domén alá bejegyzett címeken lévő webhelyek, valamint a külföldi doméneken magyar természetes vagy

Project homepage Demo collection Web archiving wiki First module of the curriculum Workflow of the archiving. (a figure from

Librarians, archivists, information scientists, professionals in Digital Humanities, data scientists and IT-developers can work together on analysing large archived web

 a regionális központok segítségével a közösségi alapú webarchiválás megszervezése, elsősorban a helyi vonatkozású oktatási, tudományos és. kulturális

 Szöveges vagy vizuális webes tartalmak illetve webes naplófájlok mint a big data elemzés tárgyai (pl...

Magyar Művészeti Akadémia Művészetelméleti és Módszertani Kutatóintézet Budapest, 2019... Az amerikai Internet

Hungarian Web Archiving Pilot Project..