Power plant dataset reconciliation example with Google Refine

From Enipedia
Jump to: navigation, search

Contents

[edit] Background

See Enipedia Power Plant Dataset Reconciliation API for an initial introduction. This page gives an example of how reconciliation with Enipedia data can be done on an Excel spreadsheet describing Peruvian power plants (found at http://www.inei.gob.pe/biblioineipub/bancopub/Est/Lib1008/cap16/Cap16009.XLS).

[edit] Instructions

[edit] Step 1 - Install Google Refine

Download and install Google Refine to your computer. See the Google Refine website for instructions and also videos that give an introduction to the features of the program.

[edit] Step 2 - Get & Import the Data

GoogleRefineChooseFiles.png

[edit] Step 3 - Check the Data Import

  • You will now see that the data from the spreadsheet has been loaded in.
  • Depending on the format of the spreadsheet, Google Refine may not have located the column headers, and there may be blank rows.
  • The menu provides several options for taking care of this.
  • In this case, we specify that it should ignore the first two lines at the beginning of the file, and then parse the next 1 line as the column headers.
  • From there we click on "Create Project" in the top right.

GoogleRefineIgnoreFirstTwoLines.png

[edit] Step 4 - Add a Column for the Country

The country needs to be specified, as this speeds up the matching process and reduces the chance of errors. If the data does not contain a column indicating the country, then we need to add one (as is the case with this example). To add a column, we can click on any of the columns, then click on "Edit column", and then "Add column based on this column...".

GoogleRefineAddColumnForCountry.png


This will bring up another box, where we label the column name as "Country" (without quotations marks), and for the Expression use "Peru" (with quotation marks). Clicking on "OK" brings us back to the main view.

GoogleRefineAddValuesForCountryColumn.png

[edit] Step 5 - Make a Copy of the Column to Perform Matching on

We are interested in matching on the "Central el├ęctrica" column, but first we need to make a copy of the column. The reason for this is that when we later perform matching over the values in a column, those values will be overwritten with the names of the matched entities, and it is useful for us to keep a record of the original values.

GoogleRefineAddColumnToMatchOn.png


We'll call this new column "MatchOnThis". No modifications are needed for the Expression, and the values shown should be the same as that for the original column.

GoogleRefineAddMatchOnThisColumn.png

[edit] Step 6 - Start Reconciling

For the column that we want to match on, select "Reconcile", and the "Start reconciling"

GoogleRefineStartReconilingOnMatchingColumn.png

This will bring up a box where we need to specify the service that is located on Enipedia. Click on "Add Standard Service...", then fill in http://enipedia.tudelft.nl/matching/reconcile.php. and click "Add Service".

Check the column name corresponding to the country, type in "Country" in the "As Property" field.

If you have information about the owner and/or the coordinates, it can be specified as well. The owner column correspondes to the "owner" property. If latitude and longitude are specified in separate columns, then you need to tag these as "latitude" and "longitude". If the latitude and longitude are in a single column (with latitude first, and the values separated by a comma), then this can be specified as well by tagging this column as "point".

Finally, click "Start Reconciling".

GoogleRefineReconciliationServices.png

[edit] Step 7 - Results are Returned

Within a minute, the results should be returned, with the top three possible matches shown. The number in parenthesis shows the score, and the name, owner, and city of the possible match is shown to help with disambiguation. In practice we have found that matches with the highest score aren't always the best, and that using information about the city and owner can help to make an accurate match.

GoogleRefineExampleReconciliationResults.png

[edit] Step 8 - Select Actual Matching Instances

Check the match candidate that corresponds to the entry in the original data.

GoogleRefinePickMatchingEntity.png

[edit] Step 9 - Identify Instances not in Enipedia

If none of the matching candidates fits, then this is likely not in Enipedia. Click on "Create new topic" to indicate this.

GoogleRefineCreateNewTopic.png

[edit] Step 10 - Export the Data

Once you have gone through and reconciled all the entries, you can export the data.

GoogleRefineExportToTabSeparated.png

[edit] Step 11 - Integrate into Enipedia

If you get this far, contact us, and feel free to upload the file and leave comments on the Discussion page.

Personal tools
Namespaces

Variants
Actions
Navigation
Portals
Advanced
Toolbox