The Human Know-How Dataset describes 211,696 human activities from many different domains. These activities are decomposed into 2,609,236 entities (each with an English textual label). These entities represent over two million actions and half a million pre-requisites. Actions are interconnected both according to their dependencies (temporal/logical orders between actions) and decompositions (decomposition of complex actions into simpler ones). This dataset has been integrated with DBpedia (259,568 links). For more information see:
- The project website: http://homepages.inf.ed.ac.uk/s1054760/prohow/index.htm
- The data is also available on datahub: https://datahub.io/dataset/human-activities-and-instructions
----------------------------------------------------------------
* Quickstart: if you want to experiment with the most high-quality data before downloading all the datasets, download the file "9of11_knowhow_wikihow", and optionally files "Process - Inputs", "Process - Outputs", "Process - Step Links" and "wikiHow categories hierarchy".
* Data representation based on the PROHOW vocabulary: http://w3id.org/prohow#
Data extracted from existing web resources is linked to the original resources using the Open Annotation specification
* Data Model: an example of how the data is represented within the datasets is available in the attached Data Model PDF file. The attached example represents a simple set of instructions, but instructions in the dataset can have more complex structures. For example, instructions could have multiple methods, steps could have further sub-steps, and complex requirements could be decomposed into sub-requirements.
----------------------------------------------------------------
Statistics:
* 211,696: number of instructions. From wikiHow: 167,232 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 44,464 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide).
* 2,609,236: number of RDF nodes within the instructions From wikiHow: 1,871,468 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 737,768 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide).
* 255,101: number of process inputs linked to 8,453 distinct DBpedia concepts (dataset Process - Inputs)
* 4,467: number of process outputs linked to 3,439 distinct DBpedia concepts (dataset Process - Outputs)
* 376,795: number of step links between 114,166 different sets of instructions (dataset Process - Step Links)
Instruction datasets:
* Datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow contain instructions from wikiHow. Instructions are allocated in the datasets in order of popularity. This means that the most popular and high-quality instructions are found in 9of11_knowhow_wikihow, while the least popular ones are in dataset 1of11_knowhow_wikihow. These instructions are also classified according to the hierarchy found in wikiHow categories hierarchy.
* Datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide contain instructions from Snapguide. Instructions coming from Snapguide are not sorted by their popularity.
Links datasets:
* The Process - Inputs datasets contain detailed information about the inputs of the sets of instructions, including links to DBpedia resources
* The Process - Outputs datasets contains detailed information about the outputs of the sets of instructions, including links to DBpedia resources
* The Process - Step Links datasets contains links between different sets of instructions
Other datasets:
*The wikiHow categories hierarchy dataset contains information on how the various wikiHow categories are hierarchically structured