Data in Enipedia

From Enipedia
Jump to: navigation, search


[edit] Introduction

Enipedia is an attempt to bring together the most complete data set on energy infrastructure and make it available for updating, expansion and use. Thus it can be considered a meta-database. Unique features of Enipedia are that all the data originates from public sources, that the data is stored in a Semantic Wiki, which allows it both to be queried, visualized and CORRECTED.

[edit] Sources

Most of the data on power plants comes from the project, which is no longer being updated. We have begun to integrate their data with other data sets such as eGRID for the US, and E-PRTR for Europe. Some of the data for the Netherlands has been entered by hand. See the To Do section below for some of the other sources that we are aware of. Feel free to add links to sources not included.

[edit] Additional Data

The Carma data set does not contain information about fuel types, so we've tried to intelligently fill this in. For example, if a power plant contains "Sewage" in its name then we assume that the fuel type is Biogas, and if its name contains "Hydroelectric", then the fuel type is Hydro. Assumed fuel types such as Hydro and Wind can be further checked to see if the CO2 emissions are zero. In addition to these techniques, fuel types for the US power plants are sourced mostly from eGRID data.

Wikipedia is also a very rich source of information, and we've worked on deducing fuel types by using pages such as List of power stations in India and List of hydroelectric power in Norway (in Norwegian). This typically involves a process of automated matching of the power plant names, with a manual check afterwards to locate any false positives or names that should have been matched, but haven't. Eventually, we would like to create a more automated system that could check for updates and conflicts in the data. A first quality check could be established by confronting, for entries that have already been matched, both data from Enipedia and DBpedia so that someone can review them for inconsistencies. We just had an example with Darling Wind Farm Powerplant for which a match had been established but data (start year, position) was still wrong here while accurate on Wikipedia.

A first iteration of implementing this functionality is at Comparing Enipedia and Wikipedia Power Plant Coordinates. It would probably make a bit more sense to implement this with a javascript function that calls sfautoedit so that users can click on a button to update the data with the value from Wikipedia.

[edit] Data Set Current Statistics

The numbers below are automatically generated to reflect the current state of the data.

  • 74,673 total power plants are on Enipedia.
  • 19,001 of the 74,673 power plants have a fuel type specified for them and 9,528 have a capacity specified.
  • 3,102 of the 13,814 powerplants in the US listed in the Carma dataset have been matched with their corresponding entries in the eGRID dataset.
  • 3,403 power plants have been linked to their corresponding Wikipedia entry. Via projects such as DBpedia, this enables us to automate some aspects of checking for information that has been updated about particular power plants.
  • 7,117 power plants have additional references specified for them. These references are usually a link to the owner's website, or to a list page on Wikipedia, such as List of power stations in India, which may mention the power plant.
  • 1,894 power plants have links to corresponding entries on Wikimapia.

[edit] Access to Raw Data

Behind the scenes, Enipedia uses Semantic Web standards such as RDF and SPARQL to help manage, query, and visualize data. This is achieved through the use of Semantic MediaWiki in combination with software we have developed (the SparqlExtension), which allows us to easily navigate information spread across the wiki.

The majority of the data you see on Enipedia can be queried directly from our SPARQL Endpoint and downloaded in a variety of formats. See Using SPARQL with Enipedia for our documentation on this. We realize this is a bit technical, and feel free to contact us for assistance, as we are quite eager to improve our documentation and make the site more usable for visitors.

There are several data sets we use that we expect to be updated, and we keep these on a separate SPARQL endpoint. For more information on these, see:

Personal tools