Posts Tagged ‘semantic’

Jun 11

Semantic Gas

Rising oil prices, recognition of climate change and policies aimed at “decarbonizing” the energy sector have promoted natural gas as the preferred fuel for power generation and heating. Natural gas – the cleanest of the fossil fuels (ex nuclear) and cheaper than renewable sources (depending on location) – is likely to replace coal and nuclear power in the immediate future and serve as a “transition fuel” as we move towards cleaner energy sources.

Many Western European countries (Netherlands, Belgium, Germany, Italy) have already been relying on natural gas as primary source for heating and power generation for a few decades now. The indigenous European resources have been gradually depleting giving way to imports from Russia and North Africa. With growing importance of natural gas in the energy landscape of the EU but also the emerging energy markets such as the BRIC economies, the sources of supply of natural gas and the emerging dynamics of the (global) gas market is questioned.

One of the goals of Enipedia is to enable a collaborative analysis (or more humbly: collaborative curiosity) of such issues by aggregating and publishing relevant data. We also belief that data is not enough. It is as important to provide context to the data, give examples and write stories about data.

In the next few paragraphs I will try to do exactly that – write a short story of how data describing major natural gas pipelines came to be and give a couple examples how it can be used.

Natural gas pipelines are the primary transport for delivering gas from its sources to the consumers. It is usually the case that the conventional natural gas fields are stranded far away from the markets and the pipelines require significant investments and long term country level commitments. It can be said that the natural gas pipeline network is essential to understand the current gas market and its the future development options.

Easier said than done. One would expect that such an essential piece of information would be publicly available from numerous energy agencies, consultants, government data streams. That is not the case. Closest to providing this data in the EU is the ENTSOG. They have a beautiful picture with all the pipes, cross-border connections and trading points in the EU. They also provide an interface allowing to query transport options on the EU gas grid. Despite that it still does not allow to export the data or explore it with a different query. But the ENTSOG picture and data provided a good starting point to scrape up the necessary data.

The description below assume one is familiar with a whole range of open source tools. If not, the links provided could be used as interesting starting points.


The following steps detail the process of creating, cross-referencing and importing gas pipeline data into Enipedia:

  1. Collected bitmap pictures found on the ENTSOG and other websites (such as Theodora)
  2. Chopped up the images into smaller tiles using imagemagick
  3. Using the tiles created Google Earth overlays transforming the images to match the country borders and major cities. The previous step was necessary to be able to stretch the images on to the map.
  4. Redrew the pipelines by hand (around 300 of them) using GE path tool
  5. Drew interconnector points (where lines meet or border points) and LNG terminals using GE polygon tool
  6. Exported the KMZ
  7. Downloaded the html files accompanying the images (see step 1)
  8. Used tidy to convert the html into valid xhtml and xpath to extract the pieces into a table (CSV)
  9. Imported the CSV into Google Refine to fix up the messy pipeline start and end location names; gave names to pipelines.
  10. Used Google Geocode API within Google Refine to retrieve coordinates for pipeline start and end locations
  11. Used sed magic to cross-reference KML map data with the refined CSV
  12. Defined a number of semantic properties, such as name, length, diameter, start, path and end points of the pipeline; defined a category “Pipeline” for all pipeline pages
  13. Created a template to contain the properties, record and standardise the pipeline data; this is a good way to consistently map the CSV data structure with the one in your wiki. The template also defined the mechanics of correcting for missing data. For example the missing pipeline lengths were approximated by taking the start and end coordinates and using the {{#geodistance}} function.
  14. Used the Java Wiki Bot Framework to import the CSV file into a wiki creating a page per row writing into the template
  15. Uploaded the KML onto the wiki and used Google Maps extension to display the uploaded file in a wiki; Google Maps extension is also used in all pipeline pages to display maps of individual pipelines

Final result:

When the data got written into the wiki, each pipeline property was automatically imported into a SPARQL endpoint using the SparqlExtension functionality. Now the public SPARQL endpoint allows to execute arbitrary queries on the data imported.

  • Count the number of pipelines imported:
select count(?pipe) where {
   ?pipe a cat:Pipeline


  • List the pipelines in (from and to) Algeria:
select distinct ?pipe where {
   ?pipe prop:ToCountry | prop:FromCountry <Algeria> .


  • List the pipelines importing natural gas into the EU:
select * where {
   ?pipeline prop:ToCountry ?to;
                prop:FromCountry ?from .
   ?to a cat:EUMember .
   Filter (Not Exists {?from a cat:EUMember}) .
} order by ?from


  • List top 5 locations with most incoming gas pipelines – used to find where are the consumers and big distribution centers:


select ?consumer (count(?pipe) as ?incoming) where {
   ?pipe prop:PipelineTo ?consumer .
   Filter( Not Exists {?consumer a cat:Country} ) .
} group by (?consumer) order by desc(?incoming) limit 5


  • List top 5 locations with most outgoing gas pipelines; used to find where are the producers and big distribution centers:
select ?producer (count(?pipe) as ?outgoing) where {
   ?pipe prop:PipelineFrom ?producer .
   Filter( Not Exists {?producer a cat:Country} ) .
} group by (?producer) order by desc(?outgoing) limit 5

Result (Groningen in the Netherlands tops the list)

These are some basic queries that pop up immediately. The more interesting (and complicated) queries could reveal which points connect the most countries or the most gas fields with most diverse customers. That is where gas trading hubs will eventually emerge following the liberalisation of the EUs gas market.

For more information on natural gas economics and infrastructure see:

keywords: , , ,