Aligning powerplants on Enipedia with their Wikipedia articles

From Enipedia
Jump to: navigation, search

Below is a first attempt at using the Silk Link Discovery Framework to help provide links between Enipedia articles and the corresponding Wikipedia articles. This has been superceded by work on the Enipedia Power Plant Dataset Reconciliation API in combination with custom R code that traverses the hierarchical categories starting at http://en.wikipedia.org/wiki/Category:Power_stations_by_country.

[edit] What does this do?

Silk helps to link data on your SPARQL endpoint to data on someone else's SPARQL endpoint. This greatly help with the Linking aspect of the Linking Open Data movement. It does this by querying two SPARQL endpoints and then doing some sort of intelligent matching between the objects it gets returned. In practical terms, this is a great way for us to fill in links from Enipedia pages to Wikipedia pages.

Steps:

  • Find all power plants on Enipedia that do not have a Wikipedia page specified.
  • Try to match this with everything that is a type of dbpedia-owl:PowerStation on DBpedia.
  • Matches >= 85% are written to file.
  • Results are written to Enipedia page using a bot (some manual supervision is needed even for high scores).

[edit] What else can this do?

We should use this to align the other datasets that we have such as the EPRTR and EU-ETS. This will never give us a complete 1:1 matching, although we can autogenerate some tables of suggested links.

[edit] Code

<?xml version="1.0" encoding="utf-8" ?>
<Silk>
	<!--These are the prefixes used for the queries below-->
	<Prefixes>
		<Prefix id="rdf" namespace="http://www.w3.org/1999/02/22-rdf-syntax-ns#" />
		<Prefix id="rdfs" namespace="http://www.w3.org/2000/01/rdf-schema#" />
		<Prefix id="owl" namespace="http://www.w3.org/2002/07/owl#" />
		<Prefix id="cat" namespace="http://enipedia.tudelft.nl/wiki/Category:" />
		<Prefix id="prop" namespace="http://enipedia.tudelft.nl/wiki/Property:" />
		<Prefix id="dbpedia-owl" namespace="http://dbpedia.org/ontology/" />
	</Prefixes>

	<!--These are the SPARQL endpoints that will be queried-->
	<DataSources>

		<!--Talk to Enipedia-->
		<DataSource id="enipedia" type="sparqlEndpoint">
			<Param name="endpointURI" value="http://enipedia.tudelft.nl/sparql/sparql" />
		</DataSource>

		<!--Talk to DBpedia-->
		<DataSource id="dbpedia" type="sparqlEndpoint">
			<Param name="endpointURI" value="http://dbpedia.org/sparql" />
			<Param name="graph" value="http://dbpedia.org" />
		</DataSource>
	</DataSources>

	<Interlinks>
		<Interlink id="eni">
			<LinkType>owl:sameAs</LinkType>

			<!--This is where the query to Enipedia is built-->
			<!--get us everything that is a powerplant, which does not have a Wikipedia page specified-->			
			<SourceDataset dataSource="enipedia" var="a">
			<RestrictTo>
				?a rdf:type cat:Powerplant .
				OPTIONAL{?a prop:Wikipedia_page ?wikipediaPage} . 
				FILTER(!bound(?wikipediaPage)) . 
			</RestrictTo>
			</SourceDataset>

			<!--This is where the query to DBpedia is built-->
			<!--get us everything that is a type of dbpedia-owl:PowerStation-->
			<TargetDataset dataSource="dbpedia" var="b">
			<RestrictTo>
				?b rdf:type dbpedia-owl:PowerStation .
			</RestrictTo>
			</TargetDataset>

			<!-- compare the rdfs:label found for variables ?a and ?b -->
			<LinkCondition>
				<Aggregate type="max">
					<Compare metric="jaroWinkler">
						<TransformInput function="lowerCase">
							<Input path="?a/rdfs:label" />
						</TransformInput>
						<TransformInput function="lowerCase">
							<Input path="?b/rdfs:label" />
						</TransformInput>
					</Compare>
				</Aggregate>
			</LinkCondition>

			<!--don't show values below 85%-->
			<Filter threshold="0.85" />

			<Outputs>
				<!--if confidence between 85% and 99%, put in eni_wiki_verify_links.xml file-->
				<Output maxConfidence="0.99" type="file" >
					<Param name="file" value="eni_wiki_verify_links.xml"/>
					<Param name="format" value="alignment"/>
				</Output>

				<!--if confidence above 99%, put in eni_wiki_accept_links.xml file-->
				<Output minConfidence="0.99" type="file">
					<Param name="file" value="eni_wiki_accepted_links.xml"/>
					<Param name="format" value="alignment"/>
				</Output>
			</Outputs>
		</Interlink>
	</Interlinks>
</Silk>
Personal tools
Namespaces

Variants
Actions
Navigation
Portals
Advanced
Toolbox