[edit] What is Opening Data?

Opening data means taking an existing dataset in an already digital format (table, database or even pdf) and applying transformations to it in order to make it semantic/linked and enhance its usability. Usability enhancements come in two forms and are directed at two types of users:

  • humans - the ability to query linked data and display the results in multiple formats (tables, charts)
  • machines - the ability to use a standard API

[edit] What is the process on Opening Data?

  1. Obtaining a dataset and a license to use/publish it
    1. It's also about the goals of the person who wants to work with it. It's like adopting a kitten; while they're nice and cute, you have to have the time & means to take care of it.
  2. Determining if it is an:
    1. orphaned dataset. In case of an orphaned dataset - data is imported and semantified only once. The further editing and revisions of the dataset are done via the wiki.
    2. authored dataset. In case of an authored dataset - it is imported and synchronized at regular intervals. The wiki is used only to display the dataset and the dataset is read-only.
      1. "actively curated" might be a better description
  3. Creating wiki-classes for the dataset. Each class involves a set of forms and templates needed to edit and display a dataset. Publishing the class documentation in the dataset metadata.
  4. Creating and aligning the dataset properties and types.
    1. This is a mix of automation and crowdsourcing. This could be a way to give people active tasks of "we need you to help with these exact things", like "here's a list of things that could be the same".
  5. Creating wrapper mappings (xlWrap, d2R) for the semantifying the dataset. Publishing them in the dataset metadata.
  6. Importing the dataset into triple-store via a wrapper and using the previously defined mappings.
  7. Creating running bots to create pages for each subject in the dataset.

[edit] What is the technology behind the Opening Data process?

  1. Semantic MediaWiki
  2. SparqlExtension to the Semantic MediaWiki
  3. Semantic Triple Store (jena, joseki, tdb)
  4. xlWrap
  5. d2R
  6. Custom wiki bots (java, python, perl)
