This a list of Energy and Industry data sets available on the world wide web. Feel free to edit this list. We are always looking for new sources of information that can help out.

Power plants

Feel free to edit this page and add sources that may be useful

[edit] Global resources

The verified carbon standard project database has many of the same entries and contains a map.

Power plants by region

Europe

Country Types of Data Links Other Remarks
Austria Energy Market Data
Belgium Generation units listed by TSO
Belgium Electrabel - subsidiary of GDF-Suez
Belgium EDF Luminus - subsidiary of EDF
Belgium various Belgium sources compiled for an energy hackathon sponsored by Essent Belgium.
Bulgaria Renewable energy in Bulgaria
Czech Republic Numerous renewable and fossil fuel plants Map of plants owned by the CEZ Group
Denmark All Danish wind turbines Master data register for wind turbines at end of December 2012 Data visualization by Alfredas
Denmark live view of power grid
France Units connected to RTE's network
France EDF power plants (continental)
France EDF power plants (overseas)
France CNR overview
Germany charts of energy production and spot market prices
Germany Electricity Sector Data for Policy-Relevant Modeling (Report) - extensive list of sources for Germany and also Europe
Germany All medium and large scale German power plants Bundesnetzagentur Kraftwerksliste
Germany Energy Market Data
Germany E.On
Germany RWE Alle Kraftwerksblöcke > 10 MW
Germany Planned power plants$file/120424%20Anlage%20zur%20PM%20Hannover_Kraftwerksliste%20aktuell.pdf$file/110404%20Anlage%20zur%20PM%20Hannover_Kraftwerksliste.pdf
Germany Very detailed German PV data, capacity per postcode Used to create visualization of installation of 860,000 PV systems across Germany. The source code used to generate the visualizations is available. This downloads the original Excel spreadsheets and processes them into a format more useful for large-scale analysis.
Germany huge amount of data on German power plants
(bulk download)
Germany extensive data from the distribution grid operator in Berlin.
Italy Enel power plant data
Italy Edison power plant data
Netherlands PV Data Contains the PV-regeling which documents installed PV systems per 6 digit postcode in 2012. The main page also has neighborhood level demographic information which can be useful for finding which factors are correlated with areas having PV installations.
Netherlands Time series data on production by fuel mix
Netherlands Installed capacity This initiative of EnergieNed also has a list of power plants sourced from TenneT (the Dutch TSO).
Netherlands Bio-installaties (by RVO) Bio-energy in the Netherlands.
Netherlands Diverse renewable energy sources in the Netherlands
Netherlands Bio-electricity installations.
Netherlands Wind farms
Netherlands Map of Essent facilities
Netherlands Table of Essent facilities
Netherlands Energy use per 6 digit postcode (city block level) More data sets being opened up over time
Norway Data on power plants of different fuel types
Norwegian Water Resources and Energy Directorate
Portugal Endesa
Portugal Interactive database of renewable energy projects
Portugal PDF report with tabular data on wind farms
Russia Russian power plant wiki
Russia Map of 800 small gas turbine units across Russia
Russia Most of the Russian power stations along with their owner company and capacity, as of 2006
Russia Comprehensive set of links for Russian energy
Russia Details on many of the power plants and heating stations around Moscow.
Russia links to a spreadsheet detailing gas consumption and emissions from several plants.
Scandinavia Electricity market data for Norway, Sweden, Finland, Denmark, Estonia
Scandinavia Fortum,
Slovakia Slovenské Elektrárne
Spain gasNatural Fenosa
Spain Iberdrola
Spain Endesa
Ukraine Good overview with listings & map Ukranian Wikipedia
Ukraine Map of the Crimean grid
United Kingdom Map of anaerobic digestion facilities in the UK
United Kingdom Scottish & Southern Energy see screenscraper here
United Kingdom E.On
United Kingdom Spreadsheet on UK power plants
Department of Energy and Climate Change
United Kingdom Renewables in the UK, Database:
United Kingdom good links for UK renewables
United Kingdom PV Installations animation of growth of installed capacity

North America

The Commission for Environmental Cooperation has a map showing 3000 plants in Canada, the US, and Mexico. Sources of the data are mentioned as well.

Country Types of Data Links Other Remarks
Canada Emissions
Canada Facilities Map
Canada Wind farms
United States Detailed yearly data - power plant locations, fuel types, emissions, efficiencies We convert this dataset to RDF to make it easy to query. See eGRID Linked Data for more details.
United States EIA API See rows below for examples
United States EIA Form EIA-923 contains extensive detailed data for the past decade. Further historical data is available from 1970 onwards.
United States Extensive maps for energy infrastructure in the US
United States INL Hydropower Resource Economics Database Data is ten years old, but has extensive details on over 2000 dams
United States Interactive power grid maps. Also contains power plant data.
United States New England Power Pool Generation Information System (NEPOOL GIS)
United States Hourly generation and emissions data for power plants, 1995-present. U.S. EPA Air Markets Program Data (FTP download site)

South America

Country Types of Data Links Other Remarks
Argentina List of generators with websites & addresses
Brazil pdf map with names, locations & capacity of all Eletrobras hydro Plants, including planned ones. Follow the link to "Mapa SIPOT fev_2012"
Brazil Generation Map (kml map)
See for screen scraper
Peru Plant capacities and generation
Peru Historical power plant installed capacity & generation data, 1998-2010

Asia

China
India
Indonesia
Japan
Pakistan
Malaysia

Africa

Egypt

Australia

Power plants by operator

Transnational power groups:

Also developers:

Power plant schemas / ontologies

The Common Information Model is used as a standard for describing aspects of the electric power industry. It's not currently in use on Enipedia, although it may be of value for future data alignment work. There's at least one project (in progress in the Netherlands) that aims to bring together energy data as Linked Data using the OWL version of the CIM.

To do

  • Link to companies on See here for an example entry.
  • Also look at companies such as ABB that build power systems. See here for info on hydro plants they've worked on (15 pages).
  • Look into the World Bank database on infrastructure projects.
  • The EIA provides information on the total production by country. This should be included as an indicator of completeness.
  • Integrate data from the Large Plants Combustion Directive and the EU-ETS.
  • Connect to Toxics Release Inventory
  • European plants need to be linked to their E-PRTR entries
  • The Renewable Energy Foundation (REF) has a database about renewable energy generators in the United Kingdom, and has the data available via a SPARQL endpoint. We should align our pages with the identifiers they use, link to their pages, and use this to update our own data.
  • Work on eGRID vs. CARMA data for the US. We've tried to match entries from the CARMA dataset with their corresponding entries in the eGRID dataset using a matching technique that compares plant names, owner names, coordinates, emissions, and power outputs. This has led to 3,102 matches for the 13,815 US powerplant entries in CARMA, although there are still several thousand power plants from the CARMA dataset without a corresponding eGRID entry, and vice versa. Where there are matches, data from eGRID should be given precedence, since some of the CARMA data base calculated using an estimation technique (see Calculating CARMA: Global Estimation of CO2 Emissions from the Power Sector).
  • Efforts so far have been focused on power plants, and not their owners. These should be fixed up as well and linked to appropriate sources. and DBpedia would allow us to link to unique identifiers.
    • See RWE Group for an initial attempt on organizing the relationship between a owner company and its subsidiaries.
  • We need to refactor the data to more clearly make the distinction between power plants (i.e. site/location) and the multiple power generating units that exist on that site. Currently these two concepts are mixed together. Searching for terms like "Ii" and "-2" will show this in the data. This is needed since we deal with a mix of data at both the site and power generation unit level.

Bringing this all together

  • See Elasticsearch on Enipedia for some our latest efforts.
  • Enipedia Data Quality Checks lists various checks that should eventually be automated to better increase data quality.
  • The OKFN Open Data Index and OKFN City Census shows an interesting interface that could be used to visualize the completeness of data for different countries.
    • For electricity, some of the main categories of data we could be interested in is generation capacity, fuel types, emissions, electricity generation, location, & power flows in the transmission/distribution system.
    • It would be interesting to be able to complement this with "data recipes" - in other words, types of analysis that can be done given the availability of certain types of information. For example, to look at increased emissions resulting from more electric vehicles, it's useful to know the merit order, which can be constructed using information on plant capacities, fuel costs, etc. By mapping out data dependencies, this can show which types of analysis may be difficult to do in certain countries given the datasets that we're aware of.

This page lists many of the useful data sets that are out there, and we'd like to bring together information from these into a single data set that gives a comprehensive view of the power industry.

One of the main issues we face is figuring out which entries in a data set link to which article on Enipedia. It's possible to automate this using the reconciliation feature of Google Refine, but this doesn't have a high rate of success with matching. This could be addressed through creating a Reconciliation Service API. The key thing is that we need to match on more than just name alone (i.e. there are often multiple power plants in large cities such as Berlin). You need to be able to match on a mix of the country, company name, capacity, etc in order to avoid false positives. We need to have a sort of "data fingerprint" of a power plant where there is enough information available that there is a high probability that there is not another power plant with the same combination of characteristics.

This Scraperwiki view created by Nono shows a great way to compare and align different data sets. The code behind the views that you can find here has some features that are quite powerful and have interesting implications:

  • It sources map layers from multiple mapping services (Google Maps and ItoWorld)
  • The "Edit in Enipedia" link uses a feature of Semantic Forms that allows for values to be set for parameters in a template of a new page.
    • We can improve this further by upgrading the Enipedia software to the latest version of Semantic Forms, as that would allow us to take advantage of the sfautoedit feature. This allows scripts to edit values of template parameters on existing pages without people even having to use the wiki. We can then still use the wiki for revision control of user edits.
      • Using the sfautoedit feature with the marker option draggable set to true, allows us to create a user-friendly interface to fix up coordinates.
  • Queries are run against remote SPARQL endpoints and then used to dynamically create content via the Google Maps API.
    • We need to expose the eGRID, EU-ETS and E-PRTR datasets via these. The original data publishers don't have anything close to this type of visual interface. We should set up an endpoint using 4store or virtuoso to better handle a large number of queries. Aligning data sets is a non-trivial long-term issue, and we should facilitate ways of allowing people to view the "original" data in addition to the combined data.
  • The SPARQL queries are run via AJAX calls, which means that the code can be adapted to create calls to our Reconciliation API. This would allow someone to click on an icon from an "outside" data set and then highlight the icons representing possible matches in Enipedia. This can really go a long way towards providing a better interface for aligning data sets. By using sfautoedit we can also automatically add unique identifiers to help keep track of the connections between these data sets.

The long-term view is that automation needs to be employed to keep track of data from different sources and the efforts to integrate it together. For inspiration, there's a few interesting projects such as ITO Map's comparison of the Vector Map District with OpenStreetMap, OpenStreetBugs, and MapDust.

General Energy

  • The openmod initiative aims to bring together links to multiple open source energy models and data sources.
  • Look at converting the BP Statistical Review into RDF, allow to query trends over different years. See Talk:LNGTrade for some interesting ideas about trends to show.
  • The Global Energy Assessment by the IIASA is out, covering data on current and future energy systems.
    The Global Energy Assessment (GEA), launched in 2012, defines a new global energy policy agenda – one that transforms the way society thinks about, uses, and delivers energy. Involving specialists from a range of disciplines, industry groups, and policy areas, GEA research aims to facilitate equitable and sustainable energy services for all, in particular the two billion people who currently lack access to clean, modern energy.
    There is also a Public GEA Scenario Database with all data and scenarios from the report. This could be used for future work on scenarios, or current work on the Enegy Mix and market.
  • Harvard's ChinaMap - Numerous energy maps from different sources.
  • "ODYSSEE on the one hand contains detailed data on the energy consumption drivers by end-use and, on the other hand, energy efficiency and CO2 related indicators"

Natural Gas

Power grids

See Electricity Transmission Network for a list of sources.

There's a huge lot of data lying in OpenStreetMap Some raw data can be seen here but a more visual way to inspect them is here. Those data must however be organized. This can be done by looking at overview maps of operator's grid, such as:

We've started initial efforts at Portal:OpenGridData where we're using Enipedia as a repository of data (currently for the Netherlands) that we use to run load flow calculations. Please contact us if you're interested in helping us to spread this to other countries as well.

