Using SPARQL with Enipedia

From Enipedia
Revision as of 23:53, 16 July 2012 by ChrisDavis (Talk | contribs)

Jump to: navigation, search

If you want to extract various slices and subsets of the data on Enipedia, it's possible for you to write your own queries to do this. This page gives the background for how to do that. Feel free to post questions on the discussion page.

This page is also available via this shortened URL: http://is.gd/sparql

Contents

What/Why?

Much of the information on Enipedia is structured so that it can be queried using SPARQL, a query language for the Semantic Web. In practical terms, this means that information spread over multiple pages can be easily gathered and analyzed once you know how to write these queries. These queries can be run through a SPARQL Endpoint which connects to a database behind the scenes.

Resources

  • One of the best tutorials is the SPARQL by example presentation by Cambridge Semantics. This is quite comprehensive as it starts out from simple examples and slowly moves onto to more complex examples.
  • The most authoritative reference is the SPARQL 1.1 documentation. This is not exactly beginner-friendly, but since it defines the standard, it is the most complete resource on the topic. Our SPARQL endpoints on Enipedia support SPARQL 1.1. Other SPARQL endpoints on the web may only support SPARQL 1.0 which does not have features such as aggregation, subqueries, and federated queries.
  • SPARQL Queries For Statistics - list of sparql queries that allow you to quickly get a sense of what kind of data is available. This can tell you important things like what are the properties in use, and how often are they used.
  • The Enipedia Blog covers several more advanced examples how SPARQL can be used with the data on Enipedia to perform various types of analysis about energy topics.
  • Data Mining OpenEI.org, The US Department of Energy's Semantic Wiki gives a step by step example of how this can be done with other sources such as http://OpenEI.org.
  • SparqlExtension - This is what we use on Enipedia to embed SPARQL queries on pages, and have their results shown in tables or various visualizations.

Data Sets

Open data from several sources are hosted on Enipedia as Linked Data.

Enipedia SPARQL examples

Find all facts about one specific power plant

For every page on the wiki, there are several ways in which you can find out what facts are associated with it. The key idea is that everything is structured as a network of URLs that you use queries to navigate. For example, the basic structure of facts about a single power plant is like this:


To see this for yourself, go to our SPARQL Endpoint and enter in the text below (see results here). This can be done for any page to find the facts attached to it.

select * where {
<http://enipedia.tudelft.nl/wiki/Navajo_Powerplant> ?y ?z .
}
Example box listing all the facts that can be queried for the Navajo Generating Station

Another way to find out which information is available is to look at the factboxes that are included on many pages, an example of which is shown in the image on the right.

Find power plants by country

This query outputs the name, location and generation capacity of power plants in the Netherlands (see results here):

select ?Name ?Point ?Generation_capacity where {
?powerPlant prop:Country a:Netherlands .
?powerPlant rdfs:label ?Name .
?powerPlant prop:Point ?Point .
?powerPlant prop:Generation_capacity-23W ?Generation_capacity . 
}

Note the "-23W" after "prop:Generation_capacity". "-23W" indicates that the units should be in Watts(W). The "-23" is the encoding used for the hash (#) ASCII symbol.

Exporting Data in Different File Formats

Query results can be retrieved in a variety of file formats

On our SPARQL Endpoint, query results can be retrieved in a variety of file formats, such as XML, JSON, plain text, CSV, TSV, etc.

How to explore SPARQL Endpoints

SPARQL Endpoints usually look like this or this, which is a bit intimidating and uninformative for first-time users of SPARQL. The good thing about SPARQL is that it can be used to quickly explore new datasets to get an idea of what is in there. A few examples are provided below, and an excellent extensive source is the list of SPARQL Queries For Statistics.

Find all distinct properties

select distinct ?p where {
?s ?p ?o .
}

Find number of values associated with each property

This query will take a long time to run since it looks at the whole dataset

SELECT count(?o) as ?objectCount WHERE {
?s ?p ?o . 
} group by ?p

Find the types of things described in the dataset

SELECT distinct ?o WHERE {
?s rdf:type ?o . 
} 

Properties for Power Plants

This describes the common properties in use for the power plants on Enipedia. See Category:Powerplant for a full list or Portal:Power Plants for an overview of the project behind this data.

Property Page on Wiki Property usage in SPARQL Notes
owl:sameAs Links to the same entity in other databases
Property:Availability prop:Availability
Property:Carbonemissions prop:Carbonemissions-23kg CO2 emissions in kg for 2007 (from Carma.org)
Property:Carbonemissions2000 prop:Carbonemissions2000-23kg CO2 emissions in kg for 2000 (from Carma.org)
Property:Carbonemissionsnextdecade prop:Carbonemissionsnextdecade-23kg CO2 emissions in kg for 2020 (from Carma.org)
Property:CarmaId prop:CarmaId Identifier linking to data entry in Carma.org
Property:City prop:City
Property:Cooling_method prop:Cooling_method
Property:Country prop:Country
Property:DBpedia_Page prop:DBpedia_Page Link to corresponding entry in Dbpedia
Property:EGRID_ID prop:EGRID_ID Link to corresponding entry in eGRID database
Property:Energyoutput prop:Energyoutput-23J Energy output in Joules for 2007, based on data from Carma.org.
Property:Energyoutput2000 prop:Energyoutput2000-23J Energy output in Joules for 2000, based on data from Carma.org.
Property:Energyoutputnextdecade prop:Energyoutputnextdecade-23J Energy output in Joules for 2020, based on data from Carma.org.
Property:Fuel_type prop:Fuel_type Fuel in use, based on Category:Fuel
Property:Generation_capacity_thermal prop:Generation_capacity_thermal-23W Heat production in Watts.
Property:Generation_capacity prop:Generation_capacity-23W Electrical production in watts
Property:IAEA_Name prop:IAEA_Name unique identifier to entries in the IAEA database on nuclear power plants
Property:Intensity prop:Intensity-23kg Kilograms of CO2 per MWh in 2007, based on data from Carma.org.
Property:Intensity2000 prop:Intensity2000-23kg Kilograms of CO2 per MWh in 2002, based on data from Carma.org.
Property:Intensitynextdecade prop:Intensitynextdecade-23kg Kilograms of CO2 per MWh in 2020, based on data from Carma.org.
Property:Operating_efficiency prop:Operating_efficiency
Property:Operator prop:Operator The company operating the power plant
Property:Ownercompany prop:Ownercompany The company owning the power plant
Property:Point prop:Point Geographic coordiantes
Property:Power_plant_type prop:Power_plant_type
Property:State prop:State The state or region within a country where the plant is located
Property:Wikipedia_page prop:Wikipedia_page Link to the plant's page on Wikipedia
Property:Year_built prop:Year_built Year in which the power plant was built – several values may be present
Property:Zipcode prop:Zipcode Zip code or postcode for the plant
rdf:type Currently only Category:Powerplant. Retrieves categories that a page has been tagged with.
rdfs:label Name of the plant

Advanced

Combining data from different endpoints

Enipedia uses two SPARQL endpoints. The main one is http://enipedia.tudelft.nl/sparql/sparql and is synchronized with data contained directly on the Semantic Wiki. The other endpoint is http://enipedia.tudelft.nl/extdata/sparql and contains various external data sets which we have converted to RDF. This endpoint contains data from eGRID, the E-PRTR, and the IAEA.

This example finds all power plants in Pennsylvania with links to the IAEA data, and then retrieves the IAEA data on electricity production per year.

PREFIX prop: <http://enipedia.tudelft.nl/wiki/Property:>
PREFIX a: <http://enipedia.tudelft.nl/wiki/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX iaea: <http://enipedia.tudelft.nl/data/IAEA/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?name ?plant ?iaeaName ?elecProduction ?year where {
    service <http://enipedia.tudelft.nl/sparql/sparql> { 
        ?plant prop:State a:Pennsylvania . 
        ?plant prop:IAEA_Name ?iaeaName . 
    }
    service <http://enipedia.tudelft.nl/extdata/sparql> { 
        ?installation rdfs:label ?name .
        filter(?name = ?iaeaName) . 
        ?installation iaea:energyGWeh ?energyProdData . 
        ?energyProdData iaea:amount ?elecProduction . 
        ?energyProdData iaea:year ?year . 
    } 
} order by ?name ?year

Power plant CO2 intensity versus average year its generators were built

This is a rough investigation into whether power plants using older generators are more inefficient with regards to the amount of CO2 emissions versus power output. This must be run at http://enipedia.tudelft.nl/extdata/, otherwise the query must be adapted to have the <service> specified.

The eGRID data talks about the amount of CO2 emissions per MWh for a particular power plant. It also has information about the generators in use by a power plant. This query links these two sets of data and tries to make a connection. No data is available on the direct CO2 emissions from each generator.

PREFIX cat: <http://enipedia.tudelft.nl/wiki/Category:>
PREFIX prop: <http://enipedia.tudelft.nl/wiki/Property:>
PREFIX a: <http://enipedia.tudelft.nl/wiki/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX egridprop: <http://enipedia.tudelft.nl/data/eGRID/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?plant ?emissionsRate ?avgOnlineYear where {
   ?plant egridprop:Annual_Combustion_Output_Emission_Rate ?emissions . 
   ?emissions rdfs:label "CO2" . 
   ?emissions egridprop:Year ?year . 
   ?emissions egridprop:Amount ?emissionsRate . 
   filter(?year = 2007) . 
   {
      select ?plant (AVG(?generatorYearOnline) as ?avgOnlineYear) where {
         ?plant egridprop:Generator ?generator . 
         ?generator egridprop:Year_Online ?generatorYearOnline . 
      } group by ?plant
   }
} limit 100

Locate identifiers pointing to other databases

This will give you the CARMA identifier, and possible links to Wikipedia, the EU-ETS, eGRID, and the IAEA.

PREFIX prop: <http://enipedia.tudelft.nl/wiki/Property:>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
select * where {
?x prop:CarmaId ?carmaID .
OPTIONAL{?x owl:sameAs ?sameAs } . 
OPTIONAL{?x prop:EU_ETS_ID ?EU_ETS_ID } . 
OPTIONAL{?x prop:EGRID_ID ?eGRID_ID } . 
OPTIONAL{?x prop:IAEA_Name ?IAEA_Name } . 
} limit 10

What fuels are in eGRID?

Run this at http://enipedia.tudelft.nl/extdata

PREFIX egridprop: <http://enipedia.tudelft.nl/data/eGRID/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?fuel where {
?x egridprop:Annual_Net_Generation_By_Fuel ?annualNetGeneration . 
?annualNetGeneration rdfs:label ?fuel . 
}

Overview of renewable energy production in California in 2007

Run this at http://enipedia.tudelft.nl/extdata

PREFIX egridprop: <http://enipedia.tudelft.nl/data/eGRID/prop/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?x ?fuel ?amount where {

	#get me everything in California
	?x egridprop:State_abbreviation ?stateAbbrev . 
	FILTER(?stateAbbrev = "CA") . 

	#which has a type of renewable fuel
	?x egridprop:Annual_Net_Generation_By_Fuel ?annualNetGeneration . 
	?annualNetGeneration rdfs:label ?fuel . 
	FILTER(?fuel = "Hydro" || ?fuel = "Geothermal" || ?fuel = "Solar" || ?fuel = "Wind" || ?fuel = "Biomass") . 

	#only get data for 2007
	?annualNetGeneration egridprop:Year ?year . 
	FILTER(?year = 2007) . 

	#get the amount of power generated for that year
	?annualNetGeneration egridprop:Amount ?amount . 

#sort the results from largest to smallest in terms of power output
} order by DESC(?amount)

Get distinct properties for eGRID power plants

Run this at http://enipedia.tudelft.nl/extdata

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX egrid: <http://enipedia.tudelft.nl/data/eGRID/>
select distinct ?y {
?x rdf:type egrid:Plant . 
?x ?y ?z . 
}


Get CO2 emissions, power output, and emissions intensity for all power plants in Chile

Query can be run here, or download results as csv

PREFIX a: <http://enipedia.tudelft.nl/wiki/>
PREFIX prop: <http://enipedia.tudelft.nl/wiki/Property:>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?plant ?name 

#For 2000 get co2 in kg, elec output in MWh, kg co2/MWh intensity
?CO2_kg_2000 ((?energy_2000_J / 3.6e9) as ?energy_2000_MWh) ?intensity_2000

#For 2007 get co2 in kg, elec output in MWh, kg co2/MWh intensity
?CO2_kg_2007 ((?energy_2007_J / 3.6e9) as ?energy_2007_MWh) ?intensity_2007

#For 2020 ("next decade") get co2 in kg, elec output in MWh, kg co2/MWh intensity
?CO2_kg_2020 ((?energy_2020_J / 3.6e9) as ?energy_2020_MWh) ?intensity_2020 

where {

#get me power plants in Chile
?plant prop:Country a:Chile . 
#get the name
?plant rdfs:label ?name . 

#CO2 emissions in kg
?plant prop:Carbonemissions-23kg ?CO2_kg_2007 . 
?plant prop:Carbonemissions2000-23kg ?CO2_kg_2000 . 
?plant prop:Carbonemissionsnextdecade-23kg  ?CO2_kg_2020 . 

#energy output in Joules (converted to MWh above)
?plant prop:Energyoutput-23J ?energy_2007_J . 
?plant prop:Energyoutput2000-23J ?energy_2000_J . 
?plant prop:Energyoutputnextdecade-23J ?energy_2020_J . 

#kg CO2 per kWh
?plant prop:Intensity-23kg ?intensity_2007 . 
?plant prop:Intensity2000-23kg ?intensity_2000 . 
?plant prop:Intensitynextdecade-23kg ?intensity_2020 . 
}

The same can be found via Special:Ask by filling in the values below

Query Additional data to display
[[Country::Chile]]  ?Carbonemissions2000

?Energyoutput2000
?Intensity2000
?Carbonemissions
?Energyoutput
?Intensity
?Carbonemissionsnextdecade
?Energyoutputnextdecade
?Intensitynextdecade

Personal tools
Namespaces

Variants
Actions
Navigation
Portals
Advanced
Toolbox