Talk:Using SPARQL with Enipedia

From Enipedia
Jump to: navigation, search

We've tried to cover several examples of how to use SPARQL on Enipedia. Feel free to post here any further questions. --ChrisDavis 13:50, 13 February 2012 (CET)

  • Hi Chris, could you show me how to add the powerplant's enipedia URL to this query? Thanks.--Martin (talk) 23:23, 26 March 2014 (CET)
select ?Name ?Latitude ?Longitude ?city ?state ?country ?Fuel_Type ?Capacity_MW ?Status ?CARMA_future_estimated_annual_Power_output_MWh ?CARMA_future_estimated_annual_CO2_emmissions_kg {
?powerPlant prop:Continent a:Africa .
?powerPlant rdfs:label ?Name .
OPTIONAL{?powerPlant prop:Latitude ?Latitude } .
OPTIONAL{?powerPlant prop:Longitude ?Longitude } .
OPTIONAL{?powerPlant prop:City ?City .
?City rdfs:label ?city } .
OPTIONAL{?powerPlant prop:State ?State .
?State rdfs:label ?state } .
{?powerPlant prop:Country ?Country .
?Country rdfs:label ?country } .
OPTIONAL{?powerPlant prop:Generation_capacity_electrical_MW ?Capacity_MW } .
OPTIONAL{?powerPlant prop:Primary_fuel_type ?Primary_fuel_type . 
?Primary_fuel_type rdfs:label ?Fuel_Type } .
OPTIONAL{?powerPlant prop:Status ?Status } . 
OPTIONAL{?powerPlant prop:Annual_Energyoutputnextdecade_MWh ?CARMA_future_estimated_annual_Power_output_MWh } .
OPTIONAL{?powerPlant prop:Annual_Carbonemissionsnextdecade_kg ?CARMA_future_estimated_annual_CO2_emmissions_kg } .
} order by ?Country ?Capacity_MW ?Name
  • If you change
select ?Name ...

to:

select ?powerPlant ?Name ...

You'll see it. Also, if you replace the entire first line with:

select * where { ...

This will show you the values of all the variables specified in the various query statements. --ChrisDavis (talk) 23:37, 26 March 2014 (CET)

  • Thats good to know, but when I use
select * where { ...

things like country and fuel type get returned as web URLs. Is there an easy way round that?

"select *" gets you everything, so to only get the text and not the URLs, you need to specify the variables you want to have returned (i.e. "select ?powerPlant ?country ?city"), similar to what's in the first line of the query at the top of the page. Also, the variable names are case-sensitive, so ?Country is not the same as ?country. In the statement:

?Country rdfs:label ?country

this indicates that ?Country is the URL, while ?country is the text value, or label for that URL. --ChrisDavis (talk) 10:59, 28 March 2014 (CET)

    • Actually there seems to be a bug in the way Virtuoso does the globbing of * since both Country and country display as URLs while the label displays OK if it differs by more than just casing from its object.Nono (talk) 22:24, 28 March 2014 (CET)
  • For Martin, Virtuoso allows a slightly lighter syntax that can come handy in some occasions. For example your query can be written as
select ?powerPlant+>rdfs:label as ?Name ?powerPlant*>prop:Latitude as ?Latitude ?powerPlant*>prop:Longitude as ?Longitude ?City*>rdfs:label as ?city
?State*>rdfs:label as ?state ?Country+>rdfs:label as ?country ?Primary_fuel_type*>rdfs:label as ?Fuel_Type
?powerPlant*>prop:Generation_capacity_electrical_MW as ?Capacity_MW ?powerPlant*>prop:Status as ?Status
?powerPlant*>prop:Annual_Energyoutputnextdecade_MWh as ?CARMA_future_estimated_annual_Power_output_MWh
?powerPlant*>prop:Annual_Carbonemissionsnextdecade_kg as ?CARMA_future_estimated_annual_CO2_emmissions_kg {
?powerPlant prop:Continent a:Africa .
OPTIONAL{?powerPlant prop:City ?City  } .
OPTIONAL{?powerPlant prop:State ?State  } .
?powerPlant prop:Country ?Country .
OPTIONAL{?powerPlant prop:Primary_fuel_type ?Primary_fuel_type } .
} order by ?Country ?Capacity_MW ?Name

This way, your query stays the same and you can just choose in your select statement if you prefer to display objects as URLs or labels. Nono (talk) 22:24, 28 March 2014 (CET)

  • Thanks Nono. Another wierd bug I came across can be seen in this query:
select ?Name ?country ?Longitude {
?powerPlant prop:Continent a:Africa .
?powerPlant rdfs:label ?Name .
{?powerPlant prop:Country ?Country .
?Country rdfs:label ?country } .
?powerPlant prop:Longitude ?Longitude .
} order by ?country ?Name

where some powerplant names get repeated several times. I included the Longitude so you can see that separate plants are being given the same name, rather than all the plant's data being repeated on several rows. If the longitude is removed, the bug remains.

Ive also just noticed that this bug is present in all the examples on this page. It seems to happen when the location of a plant is quite similar or identical to the plant in the row below. --Martin (talk) 19:09, 29 March 2014 (CET)

    • Yes this is a bug linked I think to the implementation of sorting by Virtuoso (it will disappear in your example if you don't sort by name). This was spotted in many queries across the site after the last Virtuoso upgrade by Chris. I don't know whether it would be fixed by a later version. For the record, it's been reported here. Nono (talk) 21:33, 29 March 2014 (CET)
  • I am trying to make a query that returns the:

Enipedia URL Name

Latitude

Longitude

Country

Capacity

Fuel type

Wikipedia url

Wikimapia url

Open Street Map url

IndustryAbout url

References

Notes

For all plants that have capacity, fuel type and a reference entered.

I got most of the way through it and it was working ok:

select ?powerPlant as ?Enipedia_URL ?Name ?Lat ?Long ?Country*>rdfs:label as ?Country ?Primary_fuel_type*>rdfs:label as ?Primary_Fuel ?Generation_capacity ?Wikipedia_URL ?Wikimapia_URL ?OSM_URL ?IndustryAbout_URL where {
?powerPlant rdfs:label ?Name .
?powerPlant prop:Latitude ?Lat .
?powerPlant prop:Longitude ?Long .
?powerPlant prop:Country ?Country .
?powerPlant prop:Primary_fuel_type ?Primary_fuel_type .
?powerPlant prop:Generation_capacity_electrical_MW ?Generation_capacity . 
OPTIONAL{?powerPlant prop:Wikipedia_page ?Wikipedia_URL } .
OPTIONAL{?powerPlant prop:Wikimapia_link ?Wikimapia_URL } .
OPTIONAL{?powerPlant prop:OpenStreetMap_link ?OSM_URL } .
OPTIONAL{?powerPlant prop:Industry_About_link ?IndustryAbout_URL } .
} order by ?Country

but when it came to adding the References and notes at the end it stopped working.

select ?powerPlant as ?Enipedia_URL ?Name ?Lat ?Long ?Country*>rdfs:label as ?Country ?Primary_fuel_type*>rdfs:label as ?Primary_Fuel ?Generation_capacity ?Wikipedia_URL ?Wikimapia_URL ?OSM_URL ?IndustryAbout_URL ?Reference ?Notes where {
?powerPlant rdfs:label ?Name .
?powerPlant prop:Latitude ?Lat .
?powerPlant prop:Longitude ?Long .
?powerPlant prop:Country ?Country .
?powerPlant prop:Primary_fuel_type ?Primary_fuel_type .
?powerPlant prop:Generation_capacity_electrical_MW ?Generation_capacity . 
OPTIONAL{?powerPlant prop:Wikipedia_page ?Wikipedia_URL } .
OPTIONAL{?powerPlant prop:Wikimapia_link ?Wikimapia_URL } .
OPTIONAL{?powerPlant prop:OpenStreetMap_link ?OSM_URL } .
OPTIONAL{?powerPlant prop:Industry_About_link ?IndustryAbout_URL } .
?refNotesObj prop:Reference ?Reference . 
OPTIONAL{?refNotesObj prop:Notes ?Notes} . 
} order by ?Country

Could someone point out whats going wrong?. Thanks --Martin (talk) 17:27, 8 April 2014 (CEST)

The issue is that there wasn't a connection between ?refNotesObj and ?powerPlant. The references and notes are stored in what's called a Semantic Internal Object (also referred to as a blank node in RDF) in order to ensure that the reference and accompanying note are coupled together into one logical unit. Note that if you have multiple references for a power plant, then you will get (mostly) duplicated rows for that plant due to the way that sparql returns results. --ChrisDavis (talk) 20:08, 8 April 2014 (CEST)

select ?powerPlant as ?Enipedia_URL ?Name ?Lat ?Long ?Country*>rdfs:label as ?Country ?Primary_fuel_type*>rdfs:label as ?Primary_Fuel ?Generation_capacity ?Wikipedia_URL ?Wikimapia_URL ?OSM_URL ?IndustryAbout_URL ?Reference ?Notes where {
?powerPlant rdfs:label ?Name .
?powerPlant prop:Latitude ?Lat .
?powerPlant prop:Longitude ?Long .
?powerPlant prop:Country ?Country .
?powerPlant prop:Primary_fuel_type ?Primary_fuel_type .
?powerPlant prop:Generation_capacity_electrical_MW ?Generation_capacity . 
OPTIONAL{?powerPlant prop:Wikipedia_page ?Wikipedia_URL } .
OPTIONAL{?powerPlant prop:Wikimapia_link ?Wikimapia_URL } .
OPTIONAL{?powerPlant prop:OpenStreetMap_link ?OSM_URL } .
OPTIONAL{?powerPlant prop:Industry_About_link ?IndustryAbout_URL } .
OPTIONAL{?refNotesObj prop:Is_reference_and_notes_of ?powerPlant .
?refNotesObj prop:Reference ?Reference . 
?refNotesObj prop:Notes ?Notes} . 
} order by ?Country

[edit] How to get data from Enipedia for further processing in R?

I have tried the following code:

library(SPARQL)

endpoint <- "http://enipedia.tudelft.nl/wiki/"

prefixes <- c( 'BASE',     '<http://enipedia.tudelft.nl/wiki/>',
              'article',  '<http://enipedia.tudelft.nl/wiki/>',
              'a',        '<http://enipedia.tudelft.nl/wiki/>',
              'property', '<http://enipedia.tudelft.nl/wiki/Property:>',
              'prop',     '<http://enipedia.tudelft.nl/wiki/Property:>',
              'category', '<http://enipedia.tudelft.nl/wiki/Category:>',
              'rdfs',     '<http://www.w3.org/2000/01/rdf-schema#>',
              'rdf',      '<http://www.w3.org/1999/02/22-rdf-syntax-ns#>',
              'fn',       '<http://www.w3.org/2005/xpath-functions#>',
              'afn',      '<http://jena.hpl.hp.com/ARQ/function#>')
# colnames(prefixes) <- c( 'prefix', 'URI' )

test_query <- "
select distinct ?p where {
?s ?p ?o .
}
"

data <- SPARQL(url = endpoint, query = test_query, ns = prefixes)
print(data$results)

and all I get is:

data frame with 0 columns and 0 rows

What am I doing wrong here? How you you accessing Enipedia in R? I am using Mac OS X 10.9 Mavericks and R 3.1.1.
Ovvldc (talk) 12:26, 6 August 2014 (CEST)

Addendum: I found Comparing Enipedia with Other Datasets. Using the script as a template, I modified as:

endpoint <- "http://enipedia.tudelft.nl/sparql"

data <- SPARQL(url = endpoint, query = test_query, ns = prefixes, format = 'csv', extra = list( format = 'text/csv' ))
print(data$results)

and this results in an error that seems to be on your side:

  Virtuoso.37000.Error.SP030..SPARQL.compiler line.0..Bad.character..Â...0xc2..in.SPARQL.expression.at..Â.
1                               SPARQL query:                                                           NA
2                 define sql:big-data-const 0                                                           NA
3                     #output-format:text/csv                                                           NA
4                 select distinct ?p where {                                                           NA
5                                ?s ?p ?o .                                                           NA
6                                           }                                                           NA

Ovvldc (talk) 12:26, 6 August 2014 (CEST)

Never mind, it turned out to be the  characters in the query, which are spaces that the text in the SPARQL example contains. Now I get my output.
Ovvldc (talk) 12:29, 6 August 2014 (CEST)

I'm glad to see that it's working now. Let me know if you run into further issues. Also, aside from the data you see on the wiki, there are a few other data sets on the sparql endpoint which can be queried as named graphs. The current documentation on these is at Category:External_Data, and if you have any feedback, I'll do what I can to improve it. --ChrisDavis (talk) 16:36, 6 August 2014 (CEST)

I understood that much of this external data is included in Enipedia, though I am not sure how you preserve people's edits in between imports of the external data.
I did notice that reconciliation is not yet perfect, as there seems to be a double entry for the Korce Powerplant in Albania: there is one that uses Brown Coal and one that uses Lignite, and all the other properties are the same, including CarmaId. How do I remove the Lignite plant? Or is that coming from Carma and I report it there?
Ovvldc (talk) 10:23, 7 August 2014 (CEST)
Update: actually, there are quite a few more that share the same CarmaId and everything else, except that one uses Brown Coal and the other Lignite. Maybe add another rule to the automated cleanup? Here is my list:

                   plant_name                country              city                         owner         fuel_used CarmaId
21            Korce Powerplant                Albania             Korce  Korp Elektroenerg Shqipetare        Brown Coal   22827
22            Korce Powerplant                Albania             Korce  Korp Elektroenerg Shqipetare           Lignite   22827
35           Tirane Powerplant                Albania            Tirane  Korp Elektroenerg Shqipetare        Brown Coal   45524
36           Tirane Powerplant                Albania            Tirane  Korp Elektroenerg Shqipetare           Lignite   45524
348      Klagenfurt Powerplant                Austria        Klagenfurt         Stadtwerke Klagenfurt        Brown Coal   22402
349      Klagenfurt Powerplant                Austria        Klagenfurt         Stadtwerke Klagenfurt         Hard Coal   22402
350      Klagenfurt Powerplant                Austria        Klagenfurt         Stadtwerke Klagenfurt    Heavy Fuel Oil   22402
351      Klagenfurt Powerplant                Austria        Klagenfurt         Stadtwerke Klagenfurt           Lignite   22402
352      Klagenfurt Powerplant                Austria        Klagenfurt         Stadtwerke Klagenfurt Residual Fuel Oil   22402
1336          Gacko Powerplant Bosnia and Herzegovina             Gacko    Holding Elektroprivreda Rs        Brown Coal   14432
1337          Gacko Powerplant Bosnia and Herzegovina             Gacko    Holding Elektroprivreda Rs           Lignite   14432
1397          Tuzla Powerplant Bosnia and Herzegovina        Tuzla City        Jp Elektroprivreda Bih        Brown Coal   46702
1398          Tuzla Powerplant Bosnia and Herzegovina        Tuzla City        Jp Elektroprivreda Bih           Lignite   46702
1427      Bobov Dol Powerplant               Bulgaria          Dupnitza Bobov Dol Thermal Power Plant        Brown Coal    5022
1428      Bobov Dol Powerplant               Bulgaria          Dupnitza Bobov Dol Thermal Power Plant           Lignite    5022
1726       Komorany Powerplant         Czech Republic              Most  United Energy Pravi Nastupce        Brown Coal   22725
1727       Komorany Powerplant         Czech Republic              Most  United Energy Pravi Nastupce           Lignite   22725
1750       Melnik-1 Powerplant         Czech Republic     Horni Pocaply                Energotrans As        Brown Coal   27996
1751       Melnik-1 Powerplant         Czech Republic     Horni Pocaply                Energotrans As           Lignite   27996
10220          Ajka Powerplant                Hungary              Ajka              Bakonyi Eromu Rt        Brown Coal     570
10221          Ajka Powerplant                Hungary              Ajka              Bakonyi Eromu Rt           Lignite     570
10228        Borsod Powerplant                Hungary     Kazincbarcika   Aes Borsodi Energetikai Kft        Brown Coal    5344
10229        Borsod Powerplant                Hungary     Kazincbarcika   Aes Borsodi Energetikai Kft           Lignite    5344
10230        Borsod Powerplant                Hungary     Kazincbarcika   Aes Borsodi Energetikai Kft       Natural Gas    5344
10242         Dorog Powerplant                Hungary             Dorog  Dorog-esztergom Region Eromu        Brown Coal   11186
10243         Dorog Powerplant                Hungary             Dorog  Dorog-esztergom Region Eromu           Lignite   11186
10280         Komlo Powerplant                Hungary             Komlo      Pannon Power Holding Zrt        Brown Coal   22723
10281         Komlo Powerplant                Hungary             Komlo      Pannon Power Holding Zrt           Lignite   22723
10302     Oroszlany Powerplant                Hungary         Oroszlany              Vertesi Eromu Rt        Brown Coal   32805
10303     Oroszlany Powerplant                Hungary         Oroszlany              Vertesi Eromu Rt           Lignite   32805
10328     Tatabanya Powerplant                Hungary        Tatabanyai           Tatabanya Eromu Kft        Brown Coal   44703
10329     Tatabanya Powerplant                Hungary        Tatabanyai           Tatabanya Eromu Kft           Lignite   44703
10335 Tiszapalkonya Powerplant                Hungary      Tiszaujvaros   Aes Borsodi Energetikai Kft        Brown Coal   45552
10336 Tiszapalkonya Powerplant                Hungary      Tiszaujvaros   Aes Borsodi Energetikai Kft           Lignite   45552
14631        Adamow Powerplant                 Poland             Turek  Zes Elek Patnow-adamow-konin        Brown Coal     242
14632        Adamow Powerplant                 Poland             Turek  Zes Elek Patnow-adamow-konin           Lignite     242
14918        Patnow Powerplant                 Poland             Konin  Zes Elek Patnow-adamow-konin        Brown Coal   33864
14919        Patnow Powerplant                 Poland             Konin  Zes Elek Patnow-adamow-konin           Lignite   33864
15651        Brasov Powerplant                Romania            Brasov          Sc Termoelectrica Sa        Brown Coal    5624
15652        Brasov Powerplant                Romania            Brasov          Sc Termoelectrica Sa           Lignite    5624
15781       Giurgiu Powerplant                Romania           Giurgiu          Sc Termoelectrica Sa        Brown Coal   15186
15782       Giurgiu Powerplant                Romania           Giurgiu          Sc Termoelectrica Sa           Lignite   15186
15786        Govora Powerplant                Romania           Ramnicu          Sc Termoelectrica Sa        Brown Coal   15610
15787        Govora Powerplant                Romania           Ramnicu          Sc Termoelectrica Sa           Lignite   15610
15807        Iasi-2 Powerplant                Romania           Holboca                Sc Cet Iasi Sa        Brown Coal   19000
15808        Iasi-2 Powerplant                Romania           Holboca                Sc Cet Iasi Sa           Lignite   19000
15996       Suceava Powerplant                Romania           Suceava          Sc Termoelectrica Sa        Brown Coal   43537
15997       Suceava Powerplant                Romania           Suceava          Sc Termoelectrica Sa           Lignite   43537
16350       Sostanj Powerplant               Slovenia           Sostanj   Termoelektrarna Sostanj Doo        Brown Coal   42445
16351       Sostanj Powerplant               Slovenia           Sostanj   Termoelektrarna Sostanj Doo           Lignite   42445
20211      Cayirhan Powerplant                 Turkey Cayirhan-nallihan          Park Termik Elektrik        Brown Coal    7484
20212      Cayirhan Powerplant                 Turkey Cayirhan-nallihan          Park Termik Elektrik           Lignite    7484
20652       Yenikoy Powerplant                 Turkey             Milas    Yenikoy Elektrik Uretim As        Brown Coal   50660
20653       Yenikoy Powerplant                 Turkey             Milas    Yenikoy Elektrik Uretim As           Lignite   50660

Ovvldc (talk) 10:44, 7 August 2014 (CEST)
Updated again: Ah, I get it now, I get a SPARQL result for every power plant for every fuel. But Enipedia doesn't have a fuel category for Lignite, as it says that is the same as Brown Coal. So those should be consolidated for internal consistency purposes...
Ovvldc (talk) 11:14, 7 August 2014 (CEST)

If you run a query on the named graph of http://enipedia.tudelft.nl/wiki/, you'll get the data that can be edited directly on the wiki. The other named graphs (EU-ETS, EPRTR, etc) are all read-only. The main reason for this is that we expect for these external data sets to be updated, and it's a bit of work to keep straight edits that people may have made and how they may contrast with updates that are made in new releases of the data sets. Some of these data sets have complex data structures that are difficult to represent as wiki pages.
Reconciliation is still a big issue and opportunity for improvement. One thing you'll notice with the named graphs is that we have versions of Carma v2 & v3, which allows us to compare exactly what they've published with what is currently on Enipedia. My experience is that many data sets are somewhat wrong in their own ways (see example documented here), and we need to have a way to preserve the original information, and link together these sources of sometimes conflicting information.
The Brown Coal/Lignite issue is something that I need to run a bot to fix. You might want to check out some of the work that we've done at https://github.com/cbdavis/EnipediaDataQualityBot. This uses R code to check for various issues, which it then either automatically fixes, or it writes information to wiki pages so that the issues can be reviewed. I'm open to ideas on how this can be improved. --ChrisDavis (talk) 23:11, 7 August 2014 (CEST)

[edit] SPARQL and Python

Hi Chris,

May I know, how to use SPARQL query in python to obtain data from enipedia. If you have any tutorials or notes for learning the same will also suffice.

Best, Vinayak Narwade

The example below should get you started. Just make sure to install pandas and SPARQLWrapper. There's also an example at https://www.pfenninger.org/posts/mapping-the-worlds-nuclear-power-plants, but I don't know if "import sparql" in their code will work - that particular library doesn't seem to be maintained anymore.
from SPARQLWrapper import SPARQLWrapper, JSON
from pandas.io.json import json_normalize

sparql = SPARQLWrapper("http://enipedia.tudelft.nl/sparql")
sparql.setQuery("""
    BASE <http://enipedia.tudelft.nl/wiki/>
    PREFIX a: <http://enipedia.tudelft.nl/wiki/>
    PREFIX prop: <http://enipedia.tudelft.nl/wiki/Property:>
    PREFIX cat: <http://enipedia.tudelft.nl/wiki/Category:>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    select * where {
        ?x rdf:type cat:Powerplant .
    } limit 10
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

df = json_normalize(results['results']['bindings'])
--ChrisDavis (talk) 09:47, 24 January 2017 (CET)
Personal tools
Namespaces

Variants
Actions
Navigation
Portals
Advanced
Toolbox