Using SPARQL with Enipedia
If you want to extract various slices and subsets of the data on Enipedia, it's possible for you to write your own queries to do this. This page gives the background for how to do that. Feel free to post questions on the discussion page and share your finished queries, for example by creating a newpage linked to this page. Please include in Using_SPARQL_with_Enipedia#User contributed examples
This page is also available via this shortened URL: http://is.gd/sparql
[edit] What/Why?
Much of the information on Enipedia is structured so that it can be queried using SPARQL, a query language for the Semantic Web. In practical terms, this means that information spread over multiple pages can be easily gathered and analyzed once you know how to write these queries. These queries can be run through a SPARQL Endpoint which connects to a database behind the scenes.
[edit] Resources
- One of the best tutorials is the SPARQL by example presentation by Cambridge Semantics. This is quite comprehensive as it starts out from simple examples and slowly moves onto to more complex examples.
- The most authoritative reference is the SPARQL 1.1 documentation. This is not exactly beginner-friendly, but since it defines the standard, it is the most complete resource on the topic. Our SPARQL endpoints on Enipedia support SPARQL 1.1. Other SPARQL endpoints on the web may only support SPARQL 1.0 which does not have features such as aggregation, subqueries, and federated queries.
- SPARQL Queries For Statistics - list of sparql queries that allow you to quickly get a sense of what kind of data is available. This can tell you important things like what are the properties in use, and how often are they used.
- The Enipedia Blog covers several more advanced examples how SPARQL can be used with the data on Enipedia to perform various types of analysis about energy topics.
- Data Mining OpenEI.org, The US Department of Energy's Semantic Wiki gives a step by step example of how this can be done with other sources such as http://OpenEI.org.
- SparqlExtension - This is what we use on Enipedia to embed SPARQL queries on pages, and have their results shown in tables or various visualizations.
[edit] Data Sets
Open data from several sources are hosted on Enipedia as Linked Data.
We've made available the scripts that we use for converting these data sets from their original formats to RDF. For IAEA, eGRID, and E-PRTR, we use the rdf-extension for Google Refine. The .json files included in the links can be pasted into Google Refine to apply all the changes that we have made to the original data. See here for instructions. By having the conversion script available, this means that people are also free to set up their own custom conversion if it is more useful for them.
See also the Energy and Industry Data Sets page for all Data Sets in use and considered.
[edit] Enipedia SPARQL examples
[edit] User contributed examples
- example page goes here
[edit] Find all facts about one specific power plant
For every page on the wiki, there are several ways in which you can find out what facts are associated with it. The key idea is that everything is structured as a network of URLs that you use queries to navigate. For example, the basic structure of facts about a single power plant is like this:
To see this for yourself, go to our SPARQL Endpoint and enter in the text below (see results here). This can be done for any page to find the facts attached to it.
select * where {
<http://enipedia.tudelft.nl/wiki/Navajo_Powerplant> ?y ?z .
}
Another way to find out which information is available is to look at the factboxes that are included on many pages, an example of which is shown in the image on the right.
[edit] Find power plants by country
This query outputs the name, location and generation capacity of power plants in the Netherlands (see results here):
select ?Name ?Point ?Generation_capacity where {
?powerPlant prop:Country a:Netherlands .
?powerPlant rdfs:label ?Name .
?powerPlant prop:Point ?Point .
?powerPlant prop:Generation_capacity_electrical_MW ?Generation_capacity .
}
[edit] Exporting Data in Different File Formats
On our SPARQL Endpoint, query results can be retrieved in a variety of file formats, such as XML, JSON, plain text, CSV, TSV, etc.
[edit] How to explore SPARQL Endpoints
SPARQL Endpoints usually look like this or this, which is a bit intimidating and uninformative for first-time users of SPARQL. The good thing about SPARQL is that it can be used to quickly explore new datasets to get an idea of what is in there. A few examples are provided below, and an excellent extensive source is the list of SPARQL Queries For Statistics.
[edit] Find all distinct properties
select distinct ?p where {
?s ?p ?o .
}
[edit] Find number of values associated with each property
This query will take a long time to run since it looks at the whole dataset
SELECT count(?o) as ?objectCount WHERE {
?s ?p ?o .
} group by ?p
[edit] Find the types of things described in the dataset
SELECT distinct ?o WHERE {
?s rdf:type ?o .
}
[edit] Properties for Power Plants
This describes the common properties in use for the power plants on Enipedia. See Category:Powerplant for a full list or Portal:Power Plants for an overview of the project behind this data.
| Property Page on Wiki | Property usage in SPARQL | Notes |
|---|---|---|
| owl:sameAs | Links to the same entity in other databases | |
| Property:Availability | prop:Availability | |
| Property:Annual_Carbonemissions_kg | prop:Annual_Carbonemissions_kg | CO2 emissions in kg for 2007 (from Carma.org) |
| Property:Annual_Carbonemissions2000_kg | prop:Annual_Carbonemissions2000_kg | CO2 emissions in kg for 2000 (from Carma.org) |
| Property:Annual_Carbonemissionsnextdecade_kg | prop:Annual_Carbonemissionsnextdecade_kg | CO2 emissions in kg for 2020 (from Carma.org) |
| Property:CarmaId | prop:CarmaId | Identifier linking to data entry in Carma.org |
| Property:City | prop:City | |
| Property:Cooling_method | prop:Cooling_method | |
| Property:Country | prop:Country | |
| Property:DBpedia_Page | prop:DBpedia_Page | Link to corresponding entry in Dbpedia |
| Property:EGRID_ID | prop:EGRID_ID | Link to corresponding entry in eGRID database |
| Property:Annual_Energyoutput_MWh | prop:Annual_Energyoutput_MWh | Energy output in Joules for 2007, based on data from Carma.org. |
| Property:Annual_Energyoutput2000_MWh | prop:Annual_Energyoutput2000_MWh | Energy output in Joules for 2000, based on data from Carma.org. |
| Property:Annual_Energyoutputnextdecade_MWh | prop:Annual_Energyoutputnextdecade_MWh | Energy output in Joules for 2020, based on data from Carma.org. |
| Property:Fuel_type | prop:Fuel_type | Fuel in use, based on Category:Fuel |
| Property:Generation_capacity_thermal_MW | prop:Generation_capacity_thermal_MW | Heat production in Watts. |
| Property:Generation_capacity_electrical_MW | prop:Generation_capacity_electrical_MW | Electrical production in watts |
| Property:IAEA_Name | prop:IAEA_Name | unique identifier to entries in the IAEA database on nuclear power plants |
| Property:Intensity_kg_CO2_per_MWh_elec | prop:Intensity_kg_CO2_per_MWh_elec | Kilograms of CO2 per MWh in 2007, based on data from Carma.org. |
| Property:Intensity2000_kg_CO2_per_MWh_elec | prop:Intensity2000_kg_CO2_per_MWh_elec | Kilograms of CO2 per MWh in 2002, based on data from Carma.org. |
| Property:Intensitynextdecade_kg_CO2_per_MWh_elec | prop:Intensitynextdecade_kg_CO2_per_MWh_elec | Kilograms of CO2 per MWh in 2020, based on data from Carma.org. |
| Property:Latitude | prop:Latitude | Derived automatically from the value of Property:Point. |
| Property:Longitude | prop:Longitude | Derived automatically from the value of Property:Point. |
| Property:Operating_efficiency | prop:Operating_efficiency | |
| Property:Operator | prop:Operator | The company operating the power plant |
| Property:Ownercompany | prop:Ownercompany | The company owning the power plant |
| Property:Point | prop:Point | Geographic coordiantes |
| Property:Power_plant_type | prop:Power_plant_type | |
| Property:State | prop:State | The state or region within a country where the plant is located |
| Property:Wikipedia_page | prop:Wikipedia_page | Link to the plant's page on Wikipedia |
| Property:Year_built | prop:Year_built | Year in which the power plant was built – several values may be present |
| Property:Zipcode | prop:Zipcode | Zip code or postcode for the plant |
| rdf:type | Currently only Category:Powerplant. Retrieves categories that a page has been tagged with. | |
| rdfs:label | Name of the plant |
[edit] Advanced
[edit] Combining data from different graphs
Enipedia uses several named graphs to manage data sets from different sources. The key idea behind named graphs is that they allow for us to have interconnected data sets, while also being able to determine exactly which data sets the data is sourced from.
| Graph | Description |
|---|---|
| http://enipedia.tudelft.nl/wiki/ | This contains data that is synchronized with the contents of the wiki pages. |
| http://enipedia.tudelft.nl/data/eGRID | eGRID data set published by the US EPA |
| http://enipedia.tudelft.nl/data/EPRTR | E-PRTR published by the European Environment Agency |
| http://enipedia.tudelft.nl/data/EU-ETS | European Union Emissions Trading System |
| http://enipedia.tudelft.nl/data/IAEA | International Atomic Energy Association |
This example finds all power plants in Pennsylvania with links to the IAEA data, and then retrieves the IAEA data on electricity production per year.
PREFIX prop: <http://enipedia.tudelft.nl/wiki/Property:>
PREFIX a: <http://enipedia.tudelft.nl/wiki/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX iaea: <http://enipedia.tudelft.nl/data/IAEA/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?name ?plant ?iaeaName ?elecProduction ?year where {
GRAPH <http://enipedia.tudelft.nl/wiki/> {
?plant prop:State a:Pennsylvania .
?plant prop:IAEA_Name ?iaeaName .
}
GRAPH <http://enipedia.tudelft.nl/data/IAEA> {
?installation rdfs:label ?name .
filter(?name = ?iaeaName) .
?installation iaea:energyGWeh ?energyProdData .
?energyProdData iaea:amount ?elecProduction .
?energyProdData iaea:year ?year .
}
} order by ?name ?year
[edit] Power plant CO2 intensity versus average year its generators were built
This is a rough investigation into whether power plants using older generators are more inefficient with regards to the amount of CO2 emissions versus power output.
The eGRID data talks about the amount of CO2 emissions per MWh for a particular power plant. It also has information about the generators in use by a power plant. This query links these two sets of data and tries to make a connection. No data is available on the direct CO2 emissions from each generator.
PREFIX cat: <http://enipedia.tudelft.nl/wiki/Category:>
PREFIX prop: <http://enipedia.tudelft.nl/wiki/Property:>
PREFIX a: <http://enipedia.tudelft.nl/wiki/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX egridprop: <http://enipedia.tudelft.nl/data/eGRID/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?plant ?emissionsRate ?avgOnlineYear where {
GRAPH <http://enipedia.tudelft.nl/data/eGRID> {
?plant egridprop:Annual_Combustion_Output_Emission_Rate ?emissions .
?emissions rdfs:label "CO2" .
?emissions egridprop:Year ?year .
?emissions egridprop:Amount ?emissionsRate .
filter(?year = 2007) .
{
select ?plant (AVG(?generatorYearOnline) as ?avgOnlineYear) where {
?plant egridprop:Generator ?generator .
?generator egridprop:Year_Online ?generatorYearOnline .
} group by ?plant
}
}
} limit 100
[edit] Locate identifiers pointing to other databases
This will give you the CARMA identifier, and possible links to Wikipedia, the EU-ETS, eGRID, and the IAEA.
PREFIX prop: <http://enipedia.tudelft.nl/wiki/Property:>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
select * where {
?x prop:CarmaId ?carmaID .
OPTIONAL{?x owl:sameAs ?sameAs } .
OPTIONAL{?x prop:EU_ETS_ID ?EU_ETS_ID } .
OPTIONAL{?x prop:EGRID_ID ?eGRID_ID } .
OPTIONAL{?x prop:IAEA_Name ?IAEA_Name } .
} limit 10
[edit] What fuels are in eGRID?
Run this at http://enipedia.tudelft.nl/sparql
PREFIX egridprop: <http://enipedia.tudelft.nl/data/eGRID/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?fuel where {
GRAPH <http://enipedia.tudelft.nl/data/eGRID> {
?x egridprop:Annual_Net_Generation_By_Fuel ?annualNetGeneration .
?annualNetGeneration rdfs:label ?fuel .
}
}
[edit] Overview of renewable energy production in California in 2007
Run this at http://enipedia.tudelft.nl/sparql
PREFIX egridprop: <http://enipedia.tudelft.nl/data/eGRID/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?x ?fuel ?amount where {
GRAPH <http://enipedia.tudelft.nl/data/eGRID> {
#get me everything in California
?x egridprop:State_abbreviation ?stateAbbrev .
FILTER(?stateAbbrev = "CA") .
#which has a type of renewable fuel
?x egridprop:Annual_Net_Generation_By_Fuel ?annualNetGeneration .
?annualNetGeneration rdfs:label ?fuel .
FILTER(?fuel = "Hydro" || ?fuel = "Geothermal" || ?fuel = "Solar" || ?fuel = "Wind" || ?fuel = "Biomass") .
#only get data for 2007
?annualNetGeneration egridprop:Year ?year .
FILTER(?year = 2007) .
#get the amount of power generated for that year
?annualNetGeneration egridprop:Amount ?amount .
}
#sort the results from largest to smallest in terms of power output
} order by DESC(?amount)
[edit] Get distinct properties for eGRID power plants
Run this at http://enipedia.tudelft.nl/sparql
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX egrid: <http://enipedia.tudelft.nl/data/eGRID/>
select distinct ?y {
GRAPH <http://enipedia.tudelft.nl/data/eGRID> {
?x rdf:type egrid:Plant .
?x ?y ?z .
}
}
[edit] Get CO2 emissions, power output, and emissions intensity for all power plants in Chile
Query can be run here, or see results here
PREFIX a: <http://enipedia.tudelft.nl/wiki/>
PREFIX prop: <http://enipedia.tudelft.nl/wiki/Property:>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?plant ?name
#For 2000 get co2 in kg, elec output in MWh, kg co2/MWh intensity
?CO2_kg_2000 ?energy_2000_MWh ?intensity_2000
#For 2007 get co2 in kg, elec output in MWh, kg co2/MWh intensity
?CO2_kg_2007 ?energy_2007_MWh ?intensity_2007
#For 2020 ("next decade") get co2 in kg, elec output in MWh, kg co2/MWh intensity
?CO2_kg_2020 ?energy_2020_MWh ?intensity_2020
where {
#get me power plants in Chile
?plant prop:Country a:Chile .
#get the name
?plant rdfs:label ?name .
#CO2 emissions in kg
?plant prop:Annual_Carbonemissions_kg ?CO2_kg_2007 .
?plant prop:Annual_Carbonemissions2000_kg ?CO2_kg_2000 .
?plant prop:Annual_Carbonemissionsnextdecade_kg ?CO2_kg_2020 .
#energy output in Joules (converted to MWh above)
?plant prop:Annual_Energyoutput_MWh ?energy_2007_MWh .
?plant prop:Annual_Energyoutput2000_MWh ?energy_2000_MWh .
?plant prop:Annual_Energyoutputnextdecade_MWh ?energy_2020_MWh .
#kg CO2 per kWh
?plant prop:Intensity_kg_CO2_per_MWh_elec ?intensity_2007 .
?plant prop:Intensity2000_kg_CO2_per_MWh_elec ?intensity_2000 .
?plant prop:Intensitynextdecade_kg_CO2_per_MWh_elec ?intensity_2020 .
}
[edit] Download all power plant data
This query can be run via our SPARQL endpoint at http://enipedia.tudelft.nl/sparql. This will get the name of the power plant, its coordinates, the fuel type used, electrical output in MWh, and the installed capacity in MW. Some plants may have multiple fuel types, in which case there will be a row for each of the fuel types.
BASE <http://enipedia.tudelft.nl/wiki/>
PREFIX a: <http://enipedia.tudelft.nl/wiki/>
PREFIX prop: <http://enipedia.tudelft.nl/wiki/Property:>
PREFIX cat: <http://enipedia.tudelft.nl/wiki/Category:>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?plant_name ?latitude ?longitude ?fuel_used ?OutputMWh ?elec_capacity_MW where {
?plant rdf:type cat:Powerplant .
?plant rdfs:label ?plant_name .
?plant prop:Latitude ?latitude .
?plant prop:Longitude ?longitude .
OPTIONAL{?plant prop:Fuel_type ?fuel_type .
?fuel_type rdfs:label ?fuel_used } .
?plant prop:Annual_Energyoutput_MWh ?OutputMWh .
OPTIONAL{?plant prop:Generation_capacity_electrical_MW ?elec_capacity_MW }.
} order by ?plant ?fuel_type