Sangam: A Confluence of Knowledge Streams

Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020

Show simple item record

dc.contributor EPSRC - Engineering and Physical Sciences Research Council
dc.contributor University of Edinburgh
dc.contributor Havens, Lucy
dc.creator Havens, L
dc.creator Alex, B
dc.creator Bach, B
dc.creator Terras, M
dc.creator Renton, S
dc.creator Hosker, R
dc.creator Centre for Research Collections, The
dc.date 2020-11-19T16:21:29Z
dc.date 2020-11-19T16:21:29Z
dc.date.accessioned 2023-02-17T20:51:47Z
dc.date.available 2023-02-17T20:51:47Z
dc.identifier Havens, L; Alex, B; Bach, B; Terras, M; Renton, S; Hosker, R; Centre for Research Collections, The. (2020). Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020, [dataset]. University of Edinburgh. School of Informatics. https://doi.org/10.7488/ds/2953.
dc.identifier https://hdl.handle.net/10283/3794
dc.identifier https://doi.org/10.7488/ds/2953
dc.identifier.uri http://localhost:8080/xmlui/handle/CUHPOERS/243923
dc.description The dataset includes metadata descriptions extracted from the Centre for Research Collections' online archival catalog using OAI-PMH EAD harvesting. Metadata descriptions were extracted from four metadata fields: an identifier (<unitid>), Biographical / Historical (<bioghist>), Scope and Contents (<scopecontent>), and Processing Information (<processinfo>). The descriptions were extracted in October 2020. The dataset includes five files that will be annotated for instances of gender bias, in an effort to create a gold standard dataset on which an algorithm can be trained to identify and classify gender bias in text. ## Acknowledgments ## This dataset has been created for a PhD project conducted in collaboration with Beatrice Alex, Benjamin Bach, and Melissa Terras (PhD supevisors); and with Rachel Hosker and the Centre for Research Collections (CRC). This group of collaborators will be involved in future uses of the data as this PhD project continues; specifically, for determining how to annotate the data for gender bias. Thanks are due to Scott Renton for his guidance in using the Open Archives Initiative - Protocol for Metadata Harvesting (OAI-PMH), which was necessary to extract selections of metadata in Encoded Archival Description (EAD) XML format from the CRC's online archives' catalog, ArchivesSpace.
dc.description Dataset created to annotate and train a classification algorithm on, in an effort to automate the identification and classification of types of gender bias in text. From the start of the first file to the end of the last file, the descriptions are in the same order they appeared when harvested using OAI-PMH. The descriptions were transformed from EAD, format to TXT format.
dc.format text/plain
dc.format text/plain
dc.format text/plain
dc.format text/plain
dc.format text/plain
dc.format text/plain
dc.language eng
dc.publisher University of Edinburgh. School of Informatics
dc.relation http://arxiv.org/abs/2011.05911
dc.relation Havens, L; Terras, M; Bach, B; Alex, B "Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research" (2020) arXiv:2011.05911v1 [cs.CL]
dc.rights Creative Commons Attribution 4.0 International Public License
dc.source https://archives.collections.ed.ac.uk/
dc.subject archives
dc.subject edinburgh
dc.subject metadata
dc.subject Mathematical and Computer Sciences
dc.title Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020
dc.type dataset
dc.coverage UK
dc.coverage UNITED KINGDOM


Files in this item

Files Size Format View
inventory.txt 1.807Kb text/plain View/Open
UoEArchivesMetadata_ID-SC-BH-PI_blindtestset.txt 386.7Kb text/plain View/Open
UoEArchivesMetadata_ID-SC-BH-PI_devset.txt 812.3Kb text/plain View/Open
UoEArchivesMetadata_ID-SC-BH-PI_trainingset1.txt 482.4Kb text/plain View/Open
UoEArchivesMetadata_ID-SC-BH-PI_trainingset2.txt 4.748Mb text/plain View/Open
UoEArchivesMetadata_ID-SC-BH-PI_trainingset3.txt 2.977Mb text/plain View/Open

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse