eGRID Linked Data

From Enipedia
Jump to: navigation, search


[edit] Overview

eGRID is a dataset published by the US Environmental Protection Agency and contains very extensive data on power plants in the US. The data is published in Excel files, and there is a separate file for each of the reporting years. As a result, this does not make it easy to spot long term trends, and it is difficult to run sophisticated queries over the data.

We've taken this data, and converted it to RDF, which means that anyone with a knowledge of how to run SPARQL queries can easily extract different slices of the data. For examples, see eGRID Example Queries and Navajo Powerplant.

In our experience in working with power plant data, eGRID and the related Form EIA-92 data are some of the richest and most comprehensive datasets out there on power generation. Compared to data sets for other countries, they are quite well structured, and the work that we've done is greatly enabled by this.

[edit] Example

[edit] Changes in capacity factor over time for generators

This looks at the capacity factors of generators in Texas based on their fuel type, capacity, and the year they were brought online. What we see over time is that the older plants get segmented into two parts - those that run most of the time, and those that are not active. This split is largely based on fuel type. A natural gas plant can ramp up production very quickly, meaning that it can produce power during the times when electricity prices are high due to large demand. Coal and nuclear plants cannot ramp up production as quickly, so they tend to run most of the time and supply the base load electricity. A further trend that can be seen is the installation of new wind facilities whose capacity factor is limited by when the wind is blowing.

This visualization is generated using a single query. If you were to use the original eGRID data, you would have to search through about 10 Excel spreadsheets (one per year), and join this data together somehow. With the query below, if we want to examine the data for a different state, we would replace the part of the query that specified "TX" with the two letter abbreviation of the state we were interested in.

[edit] Code - This is the code that we've used to convert the eGRID data to RDF. Even if you are not interested in the RDF version of the data, the github repository contains R code that downloads the spreadsheets from US EPA website, and then merges all of the plant, generator, and boiler worksheets together into three giant CSV files. We then take these CSV files and run them through OpenRefine, where we remap them to RDF using the RDF extension. The JSON files in the repository include all of the modifications and re-mappings that we do to the data. Occasionally these are updated as we find issues in the data.

Personal tools