Comparing Enipedia with Other Datasets

From Enipedia
Jump to: navigation, search

Tools such as the Enipedia Power Plant Dataset Reconciliation API allow for us to matching entities on Enipedia with data in other data sets, but it's also valuable to be able to compare values across data sets in order to be able to spot errors. Projects such as Wikipedia and OpenStreetMap contain a lot of data about the power industry, and we would like to be able to be able to be alerted to gaps and inconsistencies as they emerge. The work below describes some of our initial efforts.

[edit] Enipedia and Wikipedia

Comparing Enipedia and Wikipedia Power Plant Coordinates is an initial proof of concept that is generated based on the R code below, which uses the RSemanticMediaWikiBot. The code doesn't run on a schedule (yet). It would be nice to set this up as a javascript interface that would use the sfautoedit API so that people could just click on the correct value without having to actually edit the wiki page.

It would be ideal if the edits were attributed to the user already logged into the wiki, or if not logged in, then the IP address would be recorded. Tests with just a link on the wiki to the api call achieve this. If we want to do this through a javascript interface it might be a bit more complicated. A possible solution might be here with php code that accesses the relevant session code in php. The session cookies could probably be accessed as well. It's possible to call API:Login via javascript, although one concern would be if someone modified the code to send user credentials to another site instead of the enipedia API.

options(stringsAsFactors = FALSE)

library(SPARQL) #talk to Enipedia
library(geosphere) #distance between sets of coordinates
library(sqldf) #easy sorting, filtering by distance
detach("package:RPostgreSQL", unload=TRUE) #only work with sqlite, not postgres

#doesn't handle https
#http://r.789695.n4.nabble.com/changing-https-to-http-when-using-download-file-any-side-effects-or-just-use-RCurl-td868675.html
wikipediaData = read.csv("http://api.scraperwiki.com/api/1.0/datastore/sqlite?format=csv&name=wikipedia_power_plants&query=select+*+from+`swdata`&apikey=")

wikipediaData$wikipediaPage = gsub("http://dbpedia.org/resource/", "http://en.wikipedia.org/wiki/", wikipediaData$plant)

endpoint = "http://enipedia.tudelft.nl/sparql"
queryString = "select * where {
                        ?eni_plant prop:Wikipedia_page ?wikipediaPage . 
                        ?eni_plant rdf:type cat:Powerplant . 
                        ?eni_plant prop:Latitude ?eni_lat . 
                        ?eni_plant prop:Longitude ?eni_lon . 
                        }"
d <- SPARQL(url=endpoint, query=queryString, format='csv', extra=list(format='text/csv'))
enipediaLinksToWikipedia = d$results

#Check that http://en.wikipedia.org/wiki/J%C3%A4nschwalde_Power_Station and other URLs with encoded characters can be found

mergedData = merge(enipediaLinksToWikipedia, wikipediaData, by='wikipediaPage')

#find out how close the coordinates are between the same entries in different data sets
mergedData$distanceBetweenPoints = distCosine(cbind(mergedData$eni_lon, mergedData$eni_lat), 
                                              cbind(mergedData$longitude, mergedData$latitude))

#It would be nice to be able to check off which of these are correct - allow for some sort of easy user feedback
#provide a link to Google Maps where the coords could be verified
#length(which(distanceBetweenPoints > 1000))

dataToCompare = sqldf("SELECT eni_plant, wikipediaPage, eni_lat, eni_lon, latitude, longitude, distanceBetweenPoints FROM mergedData WHERE distanceBetweenPoints > 1000 ORDER BY distanceBetweenPoints DESC")

#this should really be something with javascript that works with the sfautoformat api call
#write out the data in a wikitable format

#convert from meters to kilometers
dataToCompare$distanceBetweenPoints = dataToCompare$distanceBetweenPoints/1000

dataToCompare$eni_plant = gsub("_", " ", gsub("http://enipedia.tudelft.nl/wiki/", "", dataToCompare$eni_plant))

header = "{| class=\"wikitable sortable\"\n! Distance (km) !! Enipedia Page !! Enipedia Coords !! Wikipedia Page !! Wikipedia Coords\n|-\n"
body = paste("| ", round(dataToCompare$distanceBetweenPoints, digits=1), " || ",
             "[[", dataToCompare$eni_plant, "]] || ",
             "[https://maps.google.com/maps?q=", dataToCompare$eni_lat, ",", dataToCompare$eni_lon, " ", dataToCompare$eni_lat, ", ", dataToCompare$eni_lon, "] || ", 
             "[", dataToCompare$wikipediaPage, " ", gsub("_", " ", gsub("http://en.wikipedia.org/wiki/", "", dataToCompare$wikipediaPage)), "] ||", 
             "[https://maps.google.com/maps?q=", dataToCompare$latitude, ",", dataToCompare$longitude, " ", dataToCompare$latitude, ", ", dataToCompare$longitude, "]",
             "\n|-\n", 
             collapse="", sep="")
footer = "|}"

table = paste(header, body, footer, sep="")

setwd("/home/cbdavis/RSemanticMediaWikiBot")
source("bot.R")
apiURL = "http://enipedia.tudelft.nl/enipedia/api.php"
bot = initializeBot(apiURL)
login(username, password, bot)
edit(title="Comparing Enipedia and Wikipedia Power Plant Coordinates", text=table, bot, summary="table comparing enipedia data to wikipedia")

Javascript test code:

<html>
<head>
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.0/jquery.min.js"></script>
<script src="http://code.jquery.com/ui/1.8.23/jquery-ui.min.js" type="text/javascript"></script>
<script type="text/javascript">
$(function(){
$.ajax({
                        url: "http://enipedia.tudelft.nl/enipedia/api.php",
                        dataType: "jsonp",
                        data: {
                            action: "sfautoedit",
 			    form: "Powerplant",
 			    target: "Pinjar_Powerplant",
 			    query: "PowerplantTest[point]=-31.558, 115.819"
                        },
                        success: function( data ) {
                                    alert("it worked");
                        }
                    });
});
</script>
</head>
<body>
test
</body>
</html>
Personal tools
Namespaces

Variants
Actions
Navigation
Portals
Advanced
Toolbox