Using SPARQL with Enipedia

From Enipedia
(Redirected from Learning Sparql)
Jump to: navigation, search

If you want to extract various slices and subsets of the data on Enipedia, it's possible for you to write your own queries to do this. This page gives the background for how to do that. Feel free to post questions on the discussion page and share your finished queries, for example by creating a newpage linked to this page. Please include in the User Contributed Examples section.

This page is also available via this shortened URL:


[edit] What/Why?

Much of the information on Enipedia is structured so that it can be queried using SPARQL, a query language for the Semantic Web. In practical terms, this means that information spread over multiple pages can be easily gathered and analyzed once you know how to write these queries. These queries can be run through a SPARQL Endpoint which connects to a database behind the scenes.

[edit] Resources

  • One of the best tutorials is the SPARQL by example presentation by Cambridge Semantics. This is quite comprehensive as it starts out from simple examples and slowly moves onto to more complex examples.
  • The most authoritative reference is the SPARQL 1.1 documentation. This is not exactly beginner-friendly, but since it defines the standard, it is the most complete resource on the topic. Our SPARQL endpoints on Enipedia support SPARQL 1.1. Other SPARQL endpoints on the web may only support SPARQL 1.0 which does not have features such as aggregation, subqueries, and federated queries.
  • SPARQL Queries For Statistics - list of sparql queries that allow you to quickly get a sense of what kind of data is available. This can tell you important things like what are the properties in use, and how often are they used.
  • The Enipedia Blog covers several more advanced examples how SPARQL can be used with the data on Enipedia to perform various types of analysis about energy topics.
  • Data Mining, The US Department of Energy's Semantic Wiki gives a step by step example of how this can be done with other sources such as
  • SparqlExtension - This is what we use on Enipedia to embed SPARQL queries on pages, and have their results shown in tables or various visualizations.

[edit] Data Sets

Open data from several sources are hosted on Enipedia as Linked Data.

We've made available the scripts that we use for converting these data sets from their original formats to RDF. For IAEA, eGRID, and E-PRTR, we use the rdf-extension for Google Refine. The .json files included in the links can be pasted into Google Refine to apply all the changes that we have made to the original data. See here for instructions. By having the conversion script available, this means that people are also free to set up their own custom conversion if it is more useful for them.

See also the Energy and Industry Data Sets page for all Data Sets in use and considered.

[edit] How to explore SPARQL Endpoints

SPARQL Endpoints usually look like this or this, which is a bit intimidating and uninformative for first-time users of SPARQL. The good thing about SPARQL is that it can be used to quickly explore new datasets to get an idea of what is in there. A few examples are provided below, and an excellent extensive source is the list of SPARQL Queries For Statistics.

[edit] Find all distinct properties

select distinct ?p where {
?s ?p ?o .

[edit] Find number of values associated with each property

This query will take a long time to run since it looks at the whole dataset

SELECT count(?o) as ?objectCount WHERE {
?s ?p ?o . 
} group by ?p

[edit] Find the types of things described in the dataset

SELECT distinct ?o WHERE {
?s rdf:type ?o . 

[edit] Exporting Data in Different File Formats

Query results can be retrieved in a variety of file formats

On our SPARQL Endpoint, query results can be retrieved in a variety of file formats, such as XML, JSON, plain text, CSV, TSV, etc.

[edit] Enipedia SPARQL examples

[edit] Find all facts about one specific power plant

For every page on the wiki, there are several ways in which you can find out what facts are associated with it. The key idea is that everything is structured as a network of URLs that you use queries to navigate. For example, the basic structure of facts about a single power plant is like this:

To see this for yourself, go to our SPARQL Endpoint and enter in the text below (see results here). This can be done for any page to find the facts attached to it.

select * where {
<> ?y ?z .
Example box listing all the facts that can be queried for the Navajo Generating Station

Another way to find out which information is available is to look at the factboxes that are included on many pages, an example of which is shown in the image on the right.

[edit] Find power plants by country

This query outputs the name, location and generation capacity of power plants in the Netherlands (see results here):

select ?Name ?Point ?Generation_capacity where {
?powerPlant prop:Country a:Netherlands .
?powerPlant rdfs:label ?Name .
?powerPlant prop:Point ?Point .
?powerPlant prop:Generation_capacity_electrical_MW ?Generation_capacity . 

[edit] Find power plants by specific fuel type

This query gets all of the hydro plants, in addition to their coordinates and generation capacity (if available). (results)

select * where {
?powerplant rdf:type cat:Powerplant . 
?powerplant rdfs:label ?Name .
?powerplant prop:Fuel_type a:Hydro . 
?powerplant prop:Point ?Point .
OPTIONAL{?powerplant prop:Generation_capacity_electrical_MW ?Generation_capacity }. 

[edit] Properties for Power Plants

This describes the common properties in use for the power plants on Enipedia. See Category:Powerplant for a full list or Portal:Power Plants for an overview of the project behind this data.

Property Page on Wiki Property usage in SPARQL Notes
owl:sameAs Links to the same entity in other databases
Property:Availability prop:Availability
Property:Annual_Carbonemissions_kg prop:Annual_Carbonemissions_kg CO2 emissions in kg for 2007 (from
Property:Annual_Carbonemissions2000_kg prop:Annual_Carbonemissions2000_kg CO2 emissions in kg for 2000 (from
Property:Annual_Carbonemissionsnextdecade_kg prop:Annual_Carbonemissionsnextdecade_kg CO2 emissions in kg for 2020 (from
Property:CarmaId prop:CarmaId Identifier linking to data entry in
Property:City prop:City
Property:Cooling_method prop:Cooling_method
Property:Continent prop:Continent
Property:Country prop:Country
Property:DBpedia_Page prop:DBpedia_Page Link to corresponding entry in Dbpedia
Property:EGRID_ID prop:EGRID_ID Link to corresponding entry in eGRID database
Property:Annual_Energyoutput_MWh prop:Annual_Energyoutput_MWh Energy output in Joules for 2007, based on data from
Property:Annual_Energyoutput2000_MWh prop:Annual_Energyoutput2000_MWh Energy output in Joules for 2000, based on data from
Property:Annual_Energyoutputnextdecade_MWh prop:Annual_Energyoutputnextdecade_MWh Energy output in Joules for 2020, based on data from
Property:Fuel_type prop:Fuel_type Fuel in use, based on Category:Fuel
Property:Generation_capacity_thermal_MW prop:Generation_capacity_thermal_MW Heat production in Watts.
Property:Generation_capacity_electrical_MW prop:Generation_capacity_electrical_MW Electrical production in watts
Property:IAEA_Name prop:IAEA_Name unique identifier to entries in the IAEA database on nuclear power plants
Property:Intensity_kg_CO2_per_MWh_elec prop:Intensity_kg_CO2_per_MWh_elec Kilograms of CO2 per MWh in 2007, based on data from
Property:Intensity2000_kg_CO2_per_MWh_elec‎ prop:Intensity2000_kg_CO2_per_MWh_elec‎ Kilograms of CO2 per MWh in 2002, based on data from
Property:Intensitynextdecade_kg_CO2_per_MWh_elec prop:Intensitynextdecade_kg_CO2_per_MWh_elec Kilograms of CO2 per MWh in 2020, based on data from
Property:Latitude prop:Latitude Derived automatically from the value of Property:Point.
Property:Longitude prop:Longitude Derived automatically from the value of Property:Point.
Property:Operating_efficiency prop:Operating_efficiency
Property:Operator prop:Operator The company operating the power plant
Property:Ownercompany prop:Ownercompany The company owning the power plant
Property:Point prop:Point Geographic coordiantes
Property:Power_plant_type prop:Power_plant_type
Property:State prop:State The state or region within a country where the plant is located
Property:Wikipedia_page prop:Wikipedia_page Link to the plant's page on Wikipedia
Property:Year_built prop:Year_built Year in which the power plant was built – several values may be present
Property:Zipcode prop:Zipcode Zip code or postcode for the plant
rdf:type Currently only Category:Powerplant. Retrieves categories that a page has been tagged with.
rdfs:label Name of the plant

[edit] Advanced

These queries can be run via our SPARQL endpoint at

[edit] Find date of last update to power plant data

The value for ?modificationDate is updated whenever a wiki page for a power plant is saved.

PREFIX swivt: <>
select ?modificationDate where {
?powerplant rdf:type cat:Powerplant . 
?powerplant swivt:wikiPageModificationDate ?modificationDate . 
} order by DESC(?modificationDate) limit 1

[edit] Download all power plant data

This will get the name of the power plant, its coordinates, the fuel type used, electrical output in MWh, and the installed capacity in MW. Only the primary fuel type (if it is specified) is returned. Some plants may actually use multiple fuel types. Using the query below you can download all the data as CSV.

PREFIX a: <>
PREFIX prop: <>
PREFIX cat: <>
PREFIX rdfs: <>
PREFIX rdf: <>
select ?plant_name ?latitude ?longitude ?fuel_used ?OutputMWh ?elec_capacity_MW where {
     ?plant rdf:type cat:Powerplant . 
     ?plant rdfs:label ?plant_name . 
     ?plant prop:Latitude ?latitude . 
     ?plant prop:Longitude ?longitude . 
     OPTIONAL{?plant prop:Primary_fuel_type ?fuel_type .
              ?fuel_type rdfs:label ?fuel_used } . 
     ?plant prop:Annual_Energyoutput_MWh ?OutputMWh . 
     OPTIONAL{?plant prop:Generation_capacity_electrical_MW ?elec_capacity_MW }. 
} order by ?plant ?fuel_type

[edit] Combining data from different graphs

Enipedia uses several named graphs to manage data sets from different sources. The key idea behind named graphs is that they allow for us to have interconnected data sets, while also being able to determine exactly which data sets the data is sourced from.

Graph Description This contains data that is synchronized with the contents of the wiki pages. eGRID data set published by the US EPA E-PRTR published by the European Environment Agency European Union Emissions Trading System International Atomic Energy Association

This example finds all power plants in Pennsylvania with links to the IAEA data, and then retrieves the IAEA data on electricity production per year.

PREFIX prop: <>
PREFIX a: <>
PREFIX rdf: <>
PREFIX iaea: <>
PREFIX rdfs: <>
select ?name ?plant ?iaeaName ?elecProduction ?year where {
    GRAPH <> { 
        ?plant prop:State a:Pennsylvania . 
        ?plant prop:IAEA_Name ?iaeaName . 
    GRAPH <> { 
        ?installation rdfs:label ?name .
        filter(?name = ?iaeaName) . 
        ?installation iaea:energyGWeh ?energyProdData . 
        ?energyProdData iaea:amount ?elecProduction . 
        ?energyProdData iaea:year ?year . 
} order by ?name ?year

[edit] Power plant CO2 intensity versus average year its generators were built

This is a rough investigation into whether power plants using older generators are more inefficient with regards to the amount of CO2 emissions versus power output.

The eGRID data talks about the amount of CO2 emissions per MWh for a particular power plant. It also has information about the generators in use by a power plant. This query links these two sets of data and tries to make a connection. No data is available on the direct CO2 emissions from each generator.

PREFIX cat: <>
PREFIX prop: <>
PREFIX a: <>
PREFIX rdf: <>
PREFIX egridprop: <>
PREFIX rdfs: <>
select ?plant ?emissionsRate ?avgOnlineYear where {
   GRAPH <> {
      ?plant egridprop:Annual_Combustion_Output_Emission_Rate ?emissions . 
      ?emissions rdfs:label "CO2" . 
      ?emissions egridprop:Year ?year . 
      ?emissions egridprop:Amount ?emissionsRate . 
      filter(?year = 2007) . 
         select ?plant (AVG(?generatorYearOnline) as ?avgOnlineYear) where {
            ?plant egridprop:Generator ?generator . 
            ?generator egridprop:Year_Online ?generatorYearOnline . 
         } group by ?plant
} limit 100

[edit] Locate identifiers pointing to other databases

This will give you the CARMA identifier, and possible links to Wikipedia, the EU-ETS, eGRID, and the IAEA.

PREFIX prop: <>
PREFIX owl: <>
select * where {
?x prop:CarmaId ?carmaID .
OPTIONAL{?x owl:sameAs ?sameAs } . 
OPTIONAL{?x prop:IAEA_Name ?IAEA_Name } . 
} limit 10

[edit] What fuels are in eGRID?

Run this at

PREFIX egridprop: <>
PREFIX rdfs: <>
select distinct ?fuel where {
   GRAPH <> {
      ?x egridprop:Annual_Net_Generation_By_Fuel ?annualNetGeneration . 
      ?annualNetGeneration rdfs:label ?fuel . 

[edit] Overview of renewable energy production in California in 2007

Run this at

PREFIX egridprop: <>
PREFIX rdfs: <>
select ?x ?fuel ?amount where {
   GRAPH <> {
	#get me everything in California
	?x egridprop:State_abbreviation ?stateAbbrev . 
	FILTER(?stateAbbrev = "CA") . 

	#which has a type of renewable fuel
	?x egridprop:Annual_Net_Generation_By_Fuel ?annualNetGeneration . 
	?annualNetGeneration rdfs:label ?fuel . 
	FILTER(?fuel = "Hydro" || ?fuel = "Geothermal" || ?fuel = "Solar" || ?fuel = "Wind" || ?fuel = "Biomass") . 

	#only get data for 2007
	?annualNetGeneration egridprop:Year ?year . 
	FILTER(?year = 2007) . 

	#get the amount of power generated for that year
	?annualNetGeneration egridprop:Amount ?amount . 
#sort the results from largest to smallest in terms of power output
} order by DESC(?amount)

[edit] Get distinct properties for eGRID power plants

Run this at

PREFIX rdf: <>
PREFIX egrid: <>
select distinct ?y {
   GRAPH <> {
      ?x rdf:type egrid:Plant . 
      ?x ?y ?z . 

[edit] Get CO2 emissions, power output, and emissions intensity for all power plants in Chile

Query can be run here, or see results here

PREFIX a: <>
PREFIX prop: <>
PREFIX rdfs: <>
PREFIX rdf: <>
select ?plant ?name 

#For 2000 get co2 in kg, elec output in MWh, kg co2/MWh intensity
?CO2_kg_2000 ?energy_2000_MWh ?intensity_2000

#For 2007 get co2 in kg, elec output in MWh, kg co2/MWh intensity
?CO2_kg_2007 ?energy_2007_MWh ?intensity_2007

#For 2020 ("next decade") get co2 in kg, elec output in MWh, kg co2/MWh intensity
?CO2_kg_2020 ?energy_2020_MWh ?intensity_2020 

where {

#get me power plants in Chile
?plant prop:Country a:Chile . 
#get the name
?plant rdfs:label ?name . 

#CO2 emissions in kg
?plant prop:Annual_Carbonemissions_kg ?CO2_kg_2007 . 
?plant prop:Annual_Carbonemissions2000_kg ?CO2_kg_2000 . 
?plant prop:Annual_Carbonemissionsnextdecade_kg ?CO2_kg_2020 . 

#energy output in Joules (converted to MWh above)
?plant prop:Annual_Energyoutput_MWh ?energy_2007_MWh . 
?plant prop:Annual_Energyoutput2000_MWh ?energy_2000_MWh . 
?plant prop:Annual_Energyoutputnextdecade_MWh ?energy_2020_MWh . 

#kg CO2 per kWh
?plant prop:Intensity_kg_CO2_per_MWh_elec ?intensity_2007 . 
?plant prop:Intensity2000_kg_CO2_per_MWh_elec ?intensity_2000 . 
?plant prop:Intensitynextdecade_kg_CO2_per_MWh_elec ?intensity_2020 . 

[edit] User contributed examples

Mapping the world’s nuclear power plants

Feel free to add your own examples that show various ways in which the data can be queried.

[edit] Spanish Power Plants Visualized using R

[edit] Nuclear Power Plants

[edit] Using Python

[edit] Using sgvizler

I haven't seen a page with a working map here, I'd appreciate if someone points me to one.

So instead I made maps with sgvizler:

It would be good to add a CORS header to this site, else one must specify type "jsonp" in sgvizler, which took me a couple of hours to figure out.

The queries are simple, eg

select (xsd:float(?lat) as ?lat) (xsd:float(?long) as ?long) ?name ?descr ?url {
?url a cat:Powerplant;
  rdfs:label ?label;
  prop:Fuel_type wiki:Nuclear;
  prop:Generation_capacity_electrical_MW ?MW;
  prop:Latitude ?lat;
  prop:Longitude ?long
  bind(str(replace(?label,' powerplant','','i')) as ?name)
  bind(str(concat(str(?MW),' MW')) as ?descr)
} order by desc(?MW) limit 50

--VladimirAlexiev (talk) 16:07, 16 December 2014 (CET)

[edit] Carma Data

  • This query returns the name, Enipedia URL, location, fuel type, capacity, status and CARMA energy output & CO2 estimates for all the power plants in Africa
select ?powerPlant+>rdfs:label as ?Name ?powerPlant as ?Enipedia_URL ?powerPlant*>prop:Latitude as ?Latitude ?powerPlant*>prop:Longitude as ?Longitude ?City*>rdfs:label as ?City
?State*>rdfs:label as ?State ?Country+>rdfs:label as ?Country ?Primary_fuel_type*>rdfs:label as ?Fuel_Type
?powerPlant*>prop:Generation_capacity_electrical_MW as ?Capcty_MW ?powerPlant*>prop:Status as ?Status
?powerPlant*>prop:Annual_Energyoutput2000_MWh as ?Pwr_out_2000
?powerPlant*>prop:Annual_Carbonemissions2000_kg as ?CO2_2000
?powerPlant*>prop:Annual_Energyoutput_MWh as ?Pwr_out_2007
?powerPlant*>prop:Annual_Carbonemissions_kg as ?CO2_2007
?powerPlant*>prop:Annual_Energyoutputnextdecade_MWh as ?Pwr_out_futr
?powerPlant*>prop:Annual_Carbonemissionsnextdecade_kg as ?CO2_futr{
?powerPlant prop:Continent a:Africa .
OPTIONAL{?powerPlant prop:City ?City  } .
OPTIONAL{?powerPlant prop:State ?State  } .
?powerPlant prop:Country ?Country .
OPTIONAL{?powerPlant prop:Primary_fuel_type ?Primary_fuel_type } .
} order by ?Country ?Pwr_out_futr
  • This query returns the name, Enipedia URL, lat, long, country and CARMA data for all power plants worldwide. I dont understand why the "CO2_int_2000" column comes back blank, perhaps someone can help me with that.
select ?powerPlant+>rdfs:label as ?Name ?powerPlant as ?Enipedia_URL ?powerPlant*>prop:Latitude as ?Latitude ?powerPlant*>prop:Longitude as ?Longitude ?Country+>rdfs:label as ?Country 
?powerPlant*>prop:Annual_Energyoutput2000_MWh as ?Pwr_out_2000_MWh
?powerPlant*>prop:Annual_Carbonemissions2000_kg as ?CO2_2000_kg
?powerPlant*>prop:Intensity2000_kg_CO2_per_MWh_elec‎ as ?CO2_int_2000
?powerPlant*>prop:Annual_Energyoutput_MWh as ?Pwr_out_2007_MWh
?powerPlant*>prop:Annual_Carbonemissions_kg as ?CO2_2007_kg
?powerPlant*>prop:Intensity_kg_CO2_per_MWh_elec as ?CO2_int_2007
?powerPlant*>prop:Annual_Energyoutputnextdecade_MWh as ?Pwr_out_futr_MWh
?powerPlant*>prop:Annual_Carbonemissionsnextdecade_kg as ?CO2_futr_kg
?powerPlant*>prop:Intensitynextdecade_kg_CO2_per_MWh_elec as ?CO2_int_futr{
?powerPlant prop:Country ?Country .
} order by ?Country ?Pwr_out_futr_MWh
  • The ?CO2_int_2000 issue may be a bug with virtuoso. The query above seems to be correct. The query below is equivalent and returns values for ?CO2_int_2000
select * where { 
?powerPlant rdfs:label ?Name . 
?powerPlant prop:Latitude ?Latitude . 
?powerPlant prop:Longitude ?Longitude . 
?Country rdfs:label ?CountryName  . 
?powerPlant prop:Annual_Energyoutput2000_MWh ?Pwr_out_2000_MWh . 
?powerPlant prop:Annual_Carbonemissions2000_kg ?CO2_2000_kg . 
?powerPlant prop:Annual_Energyoutput_MWh ?Pwr_out_2007_MWh . 
?powerPlant prop:Annual_Carbonemissions_kg ?CO2_2007_kg . 
?powerPlant prop:Intensity_kg_CO2_per_MWh_elec ?CO2_int_2007 . 
?powerPlant prop:Annual_Energyoutputnextdecade_MWh ?Pwr_out_futr_MWh . 
?powerPlant prop:Annual_Carbonemissionsnextdecade_kg ?CO2_futr_kg . 
?powerPlant prop:Intensitynextdecade_kg_CO2_per_MWh_elec ?CO2_int_futr .   
?powerPlant prop:Country ?Country . 
?powerPlant prop:Intensity2000_kg_CO2_per_MWh_elec ?CO2_int_2000 . 

The original Carma data (separate from modifications made on Enipedia), can be retrieved via:

PREFIX carmaprop: <>
select * where {
graph <> {
?plant carmaprop:plant_id ?plant_id .
?plant carmaprop:CARMA_URL ?CARMA_URL .
?plant carmaprop:carbon_2004 ?carbon_2004 .
?plant carmaprop:carbon_2009 ?carbon_2009 .
?plant carmaprop:carbon_Future ?carbon_Future .
?plant carmaprop:energy_2004 ?energy_2004 .
?plant carmaprop:energy_2009 ?energy_2009 .
?plant carmaprop:energy_Future ?energy_Future .
?plant carmaprop:intensity_2004 ?intensity_2004 .
?plant carmaprop:intensity_2009 ?intensity_2009 .
?plant carmaprop:intensity_Future ?intensity_Future .
?plant carmaprop:continent ?continent .
?plant carmaprop:country ?country .
?plant carmaprop:state ?state .
?plant carmaprop:latitude ?latitude .
?plant carmaprop:longitude ?longitude .
?plant carmaprop:name ?name .
OPTIONAL{ ?plant carmaprop:city_name ?city_name } .
OPTIONAL{ ?plant carmaprop:zip ?zip } .
OPTIONAL{ ?plant carmaprop:metroarea ?metroarea } .
OPTIONAL{ ?plant carmaprop:county ?county } .
OPTIONAL{ ?plant carmaprop:company ?company } .
} limit 100
Personal tools