dc.contributor |
EPSRC - Engineering and Physical Sciences Research Council |
|
dc.contributor |
University of Edinburgh |
|
dc.contributor |
Havens, Lucy |
|
dc.creator |
Havens, L |
|
dc.creator |
Alex, B |
|
dc.creator |
Bach, B |
|
dc.creator |
Terras, M |
|
dc.creator |
Renton, S |
|
dc.creator |
Hosker, R |
|
dc.creator |
Centre for Research Collections, The |
|
dc.date |
2020-11-19T16:21:29Z |
|
dc.date |
2020-11-19T16:21:29Z |
|
dc.date.accessioned |
2023-02-17T20:51:47Z |
|
dc.date.available |
2023-02-17T20:51:47Z |
|
dc.identifier |
Havens, L; Alex, B; Bach, B; Terras, M; Renton, S; Hosker, R; Centre for Research Collections, The. (2020). Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020, [dataset]. University of Edinburgh. School of Informatics. https://doi.org/10.7488/ds/2953. |
|
dc.identifier |
https://hdl.handle.net/10283/3794 |
|
dc.identifier |
https://doi.org/10.7488/ds/2953 |
|
dc.identifier.uri |
http://localhost:8080/xmlui/handle/CUHPOERS/243923 |
|
dc.description |
The dataset includes metadata descriptions extracted from the Centre for Research Collections' online archival catalog using OAI-PMH EAD harvesting. Metadata descriptions were extracted from four metadata fields: an identifier (<unitid>), Biographical / Historical (<bioghist>), Scope and Contents (<scopecontent>), and Processing Information (<processinfo>). The descriptions were extracted in October 2020. The dataset includes five files that will be annotated for instances of gender bias, in an effort to create a gold standard dataset on which an algorithm can be trained to identify and classify gender bias in text.
## Acknowledgments ##
This dataset has been created for a PhD project conducted in collaboration with Beatrice Alex, Benjamin Bach, and Melissa Terras (PhD supevisors); and with Rachel Hosker and the Centre for Research Collections (CRC). This group of collaborators will be involved in future uses of the data as this PhD project continues; specifically, for determining how to annotate the data for gender bias. Thanks are due to Scott Renton for his guidance in using the Open Archives Initiative - Protocol for Metadata Harvesting (OAI-PMH), which was necessary to extract selections of metadata in Encoded Archival Description (EAD) XML format from the CRC's online archives' catalog, ArchivesSpace. |
|
dc.description |
Dataset created to annotate and train a classification algorithm on, in an effort to automate the identification and classification of types of gender bias in text. From the start of the first file to the end of the last file, the descriptions are in the same order they appeared when harvested using OAI-PMH. The descriptions were transformed from EAD, format to TXT format. |
|
dc.format |
text/plain |
|
dc.format |
text/plain |
|
dc.format |
text/plain |
|
dc.format |
text/plain |
|
dc.format |
text/plain |
|
dc.format |
text/plain |
|
dc.language |
eng |
|
dc.publisher |
University of Edinburgh. School of Informatics |
|
dc.relation |
http://arxiv.org/abs/2011.05911 |
|
dc.relation |
Havens, L; Terras, M; Bach, B; Alex, B "Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research" (2020) arXiv:2011.05911v1 [cs.CL] |
|
dc.rights |
Creative Commons Attribution 4.0 International Public License |
|
dc.source |
https://archives.collections.ed.ac.uk/ |
|
dc.subject |
archives |
|
dc.subject |
edinburgh |
|
dc.subject |
metadata |
|
dc.subject |
Mathematical and Computer Sciences |
|
dc.title |
Archival Metadata Descriptions from the University of Edinburgh Centre for Research Collections - Extracted October 2020 |
|
dc.type |
dataset |
|
dc.coverage |
UK |
|
dc.coverage |
UNITED KINGDOM |
|