Enipedia Use

From Enipedia
Jump to: navigation, search

Contents

[edit] Questions / discussion / documentation (?) on Enipedia use

[edit] Fuel Types

  • Netherlands/Powerplants lists a number of Fuel Types, some in red for which no page exists in Enipedia
  • Netherlands/Coal lists only one power plant -- all plants use Hard Coal
    • GerardDijkema 14:30, 20 October 2011 (CEST)
    • I'm currently making pages for the rest. One thing we should consider is how to deal with fuel types that may be more specific or general terms for other fuel types. This applies to the different types of coal and the various forms of fuel oil. It would be useful to define which fuels are sub-types of other fuels, and on the summary pages point people towards this information. The general philosophy I think we should support on Enipedia is that we want to be able to accept the most generic information ("coal of unknown type"), but then be able to iteratively refine it as we get more information. ChrisDavis 08:56, 24 October 2011 (CEST)

[edit] Search and Go

  • using the search box is not intuitive -- it appears only to search for page titles as you type
    • also, it is case specific
    • so typing EDF yields nothing, also Sloecentrale yields nothing
    • while Edf and Sloecentrale_Powerplant do exist
    • Question: is there a possibility to change this
    • and what is our convention - Sloecentrale_underscore_Powerplant or Sloecentrale_space_Powerplant?
      • underscore and space are actually the same thing - if you put a space in the title, MediaWiki will automatically convert it to an underscore in the URL. The same goes for including links on wiki pages. ChrisDavis 18:44, 23 September 2011 (CEST)

[edit] Form:Powerplant

  • TODO - Form:Powerplant doesn't autocomplete on a complete list of powerplants. The solution is that requests need to be sent to the server to grab the complete list. It seems that this is currently not possible, and I've posted a question on the Semantic Forms discussion page to see what can be done about this. ChrisDavis 16:41, 28 September 2011 (CEST)
  • ERROR: something wrong with page Ledvice_Powerplant GerardDijkema 23:07, 28 September 2011 (CEST)
  • On the Powerplant form, users can input data on the electrical output (nameplate capacity), where it says "Unit accepted: MW".
    • If you now put in a figure without MW, so 1000 i.s.o. 1000 MW, this figure will not show up in queries
    • Should enipedia accept default units, and only do something when user inputs other unit (e.g. rating in MW, but user may put in kW or GW, which is then recalculated)
    • GerardDijkema 23:14, 28 September 2011 (CEST)

[edit] Google Earth Visualization

  • Power Plant Icons
    • On the Europe page, a map has been created of the 300 largest power plants. The icon of Ruien_Powerplant shows as "biomass". However, the Ruien_Powerplant page indicates the primary fuel of this plant is Coal. Is it possible, instead of display a single or more icons per location, when a plant is co-firing coal and biomass, to create a "biomass/coal icon" and display it when the data indicate that both are fired on a regular basis.
    • Furthermore, many powerplants (coal-fired) have the option to fire natural gas or fuel oil to allow it to start-up. Maybe we should list these as start-up fuel?

11:17, 25 November 2011 (CET) GerardDijkema

    • I think this is good to discuss as we're running into various issues with fuel types on the site, and we should figure out how to better manage this. I see this as two issues. The first is that we should think about refactoring how we describe the types of fuels that the power plants use. It might be useful to separate things into primary, secondary, and "other" fuel, similarly to how the Wikipedia infoboxes for power stations is set up. For primary and secondary fuel, we may allow only one choice, while allowing multiple selections for "other" fuel. Regarding the icons, the challenge is that we need to figure out an intelligent way to deal with all the different permutations of fuels. For some of these power plants, I've seen that they have a primary fuel, and then seem to throw in whatever other fuels that they seem to have lying around. We can certainly automate the creation of icons for these different combinations - it's more that we should think about aspects such as if we want to allow icons with five different fuel types. Also regarding the permutations, we should think about if the order of the icons has some particular meaning. ChrisDavis 08:31, 28 November 2011 (CET)
  • TODO - Have links in balloons open a page in an external browser by default instead of Google Earth browser. Some people say it's possible, although I've found it causes problems with older version of Google Earth (5.1). See here for ongoing discussion on Google Earth help forum. ChrisDavis 16:44, 28 September 2011 (CEST)

[edit] Power Conversion Units

GerardDijkema 18:59, 23 September 2011 (CEST)

    • Furthermore, this plant (as is Borssele Powerplant is owned by Delta Nv
      • Error: However, if you start typing "Delta" into the field for owner, automplete does not suggest Delta Nv is an existing page
      • FIXED: This is indeed a deeper issue. It seems that the code for the form is not loading the complete list. ChrisDavis 18:33, 23 September 2011 (CEST)
        • I changed the form to fix this, although this makes the page a bit slower. What happens by default is that a limited list of options is hidden in the source of the document. With Enipedia, the amount of options that we are dealing with is so large that we need calls to the server to grab all the data. This is done using the "remote autocompletion" option for Semantic Forms. ChrisDavis 10:31, 26 September 2011 (CEST)
    • So I created DELTA. Obviously, this page can be taken out again
      • But -- Enipedia has company pages setup to display combinations of.
      • If a new power company page is created, ideally, it should inherit this setup (but.. how to know that you are creating a new company page)
      • hmm, remember there are special instructions to complete new power plant page. Maybe there is also something like that for new companies.
    • Also, the Sloecentrale_Powerplant is owned and operated by Sloecentrale Bv, which is joint venture of Delta Nv and Edf
      • Question: How to deal with this - multiple owners allowed? Would think we don't want to store the web of companies
        • The eGRID data allows for multiple owners, and has percent ownership specified for each of them (up to 16). An alternative would be to simply have a list of owners, although I think having the percent ownership would be valuable for some calculations. ChrisDavis 18:50, 23 September 2011 (CEST)
        • Question: how does that show on the power plant pages when you are inputting // updating (edit with form)??
    • Thought: Ideally, PGU information should be added up to fill main page information -- or cross-check? What is our strategy here
      • I'm more in favor of cross-checking with some form of automatic checking for inconsistencies. To automatically fill things in, we would first have to check that some value was not filled in already, otherwise your data would be overwritten every time you add a PGU. This creates a rather complex syntax on the templates used on the queries. What I've been doing to fill in some of the data (from Wikipedia) is to have a bot script that first checks for the absence of data before filling it in. We may want to split this off into a separate thread about automation. I have quite a few ideas about what the opportunities are, and how we can pull this off. ChrisDavis 18:56, 23 September 2011 (CEST)
  • Should include ICE (internal combustion engine) or reciprocating engine - Gas, or Diesel, and perhaps CHP engines as power unit selections in the drop down options. In particular, many of the remote and Island power generators are smaller but should still be included in this database. Also, finding this Enepdia Use page wasn't very apparent to me, so when I had a suggestion to make, it took me a few minutes to find the right place to make it (if it took any longer, I would have given up). I would suggest a clearer directory for how to make suggestions/recommendations. Great initiative though, hope more people add to it and make it more complete. SMcC (talk) 06:32, 13 June 2017 (CEST)

[edit] Correct Naming

  • Error: - special characters
    • Ohi Powerplant
    • I renamed this one to its correct name, Ōi Powerplant
    • apparently, enipedia cannot deal with this special character
    • in Japan/Powerplants the -C5-8Ci Powerplant shows up!!
    • GerardDijkema 15:56, 26 September 2011 (CEST)
      • TODO: This is likely a SparqlExtension issue where we need to have characters translated back into human-readable format. ChrisDavis 16:48, 28 September 2011 (CEST)
  • Question: naming convention
    • Electricite de France is known as EDF; however, enipedia has it as Edf.
    • Dutch companies are either an NV or BV, or vof. Enipedia knows them as Nv and Bv
    • Should we do something about this?

GerardDijkema 18:09, 23 September 2011 (CEST)

    • TODO: I can set up a bot script to move these pages (Nv, Bv to NV, BV). If you have other examples, add them to this list, and I'll include them. As part of this, I would also refactor "Edf" to "Electricite de France" to make it a bit more self-documenting. ChrisDavis 18:33, 23 September 2011 (CEST)
    • TODO: If I remember correctly, SMW needs work to deal with a bug where it incorrectly exports RDF for a page created after a redirect. They may have fixed this in SMW 1.6, although we would have to make some modifications to the SparqlExtension to work with that version. ChrisDavis 18:33, 23 September 2011 (CEST)
    • Thought: From reading through some of the articles that CARMA wrote about their creation of the dataset, I know they specify both companies and their subsidiaries. We could do some sort of string matching check and locate some of the parent companies and subsidiaries. ChrisDavis 20:17, 23 September 2011 (CEST)

[edit] List of required changes

  • Electricite de France is known as EDF; however, enipedia has it as Edf --> its official name is Électricité de France S.A.
  • Dutch companies are either an NV or BV, or vof. Enipedia knows them as Nv and Bv
  • General Electric is known as GE. Its subsidiaries are known as GE Capital, GE Plastics etc. Enipedia knows them as Ge

--GerardDijkema 15:56, 26 September 2011 (CEST)

[edit] Greenhouse power plants

  • I have updated the information of some Dutch Power Plants with missing coordinates
    • the first one was the Sint Maarten power plant. This is a diesel engine only power plant
    • the others are greenhouse cogen installations
      • these are gas engines with boilers and heat storage
  • New power plant types?
    • Large-scale cogeneration power plants use in the process industry typically consist of a gas turbine - heat recovery boiler combination
    • Medium-scale (< 20 MWe) and small-scale installations typically use gas engines. Also, these are somewhat more robust w.r.t. gas composition
    • It is typically these Medium-scale installations that are found with greenhouse growers
    • Important criterion in classifying is to be distinguishable, for future meaningful use in queries
    • Therefore suggest to change modern cogen into gt-cogen, engine-cogen
    • special cogen is than reserved for installations that provide more than only electricity and heat
    • cogen via steam let-down is reserved for thermal installations that employ a rankine cycle, and produce heat via steam let-down from the cycle
  • Suggestion: include standard industry classification code (SIC) for company owning/operating the power plants
    • This would enable queries like "give me all the power plants in horticulture in the Netherlands", all power plants in chemical industry in China etc.
    • GerardDijkema 10:51, 22 September 2011 (CEST)
  • I have modified the erroneous name of La Brie Greenhouse Powerplant into Kwekerij Labrie Powerplant
    • this begs the question how we are dealing with language - Kwekerij Labrie is the company name. Translated, this would by Greenhouse Farmer Labrie. They have a WKK, a cogen power plant
    • complicated, or not?
    • GerardDijkema 22:23, 21 September 2011 (CEST)
  • ChrisDavis created Kwekerij_Labrie.
    • This creates a mistake, due to the redirect created by altering the page name --> the company really only has one powerplant!!
    • Greenhous Powerplant maybe better in Enipedia than Kwekerij
    • Maybe introduce a type "Greenhouse Cogen" for Greenhouse power plants? Not sure yet.
    • GerardDijkema 23:00, 21 September 2011 (CEST)

[edit] Waste to energy plants information

  • I have updated some information on the Sidor Powerplant
  • This actually is a MSW incineration facility that co-produces electricity and heat for district heating
  • Inputting the data I found a couple of issues
    • for PGU's, it is now only possible to input electric rating. Suggest we provide electrical output (MWe) and thermal output (MWt)
    • the type "Waste to energy" facility cannot be inputted on the main power plant scheme (types are only electricity, cogen and special cogen. Suggest we add this category
    • this particular plant is an example of RENEWAL: the CARMA database contains information on the 1985 built facilities.
      • these have been decommissioned in 2010 and replaced by a single new unit
      • this history can already be reflected in the PGU's
      • however, on the main page it says "Year Built".
      • can we change this, and for example replace this with "Year first built"
      • and then in the listing of PGU's on the main page also list year commissioned - decommissioned?
    • this plant has been designed and built by E.On Energy from Waste AG
    • it is operated by Sidor
      • this begs the question if we should somehow incorporate this information in the database GerardDijkema 17:35, 8 September 2011 (CEST)
        • We could borrow the structure used on the Wikipedia Template:Infobox power station where they have fields for owner, operator, developer, and constructor ChrisDavis 13:35, 9 September 2011 (CEST)
        • And then automatically load this from Wikipedia if it there or what? GerardDijkema 14:57, 20 September 2011 (CEST)
          • There's a possibility to do this, although there's a few things to think about:
            • how would the number of queries influence how long it takes to load pages?
            • do we just display this data, or also load it into our database? Automatic loading might break things, such as if slightly different names are used for the owner of a plant.
            • What I would advocate for is the further development of code that could grab all this data and highlight any inconsistencies between various data sets. My experience in working with this data is that a manual process is still needed for some type of intelligent verification, since there's all sorts of things that can go wrong with the data entry process. However, there is a lot more that we can do as far as automating this process. For example, we can now query the live version of DBpedia (http://live.dbpedia.org) which (mostly) reflects the current state of Wikipedia. I've also done some initial work on a screenscraper for Wikipedia lists of powerplants (like this). There's an enormous amount of information in these, and they're always changing over time. ChrisDavis 17:01, 21 September 2011 (CEST)

[edit] how to input new power plant or update information

The (Carma) database contains information on the Roca powerplant by E.ON. However, there are some mistakes here

- location shown on map is wrong -- see http://www.eon-benelux.com/eonwww/publishing.nsf/AttachmentsByTitle/RouteRoCa/$FILE/roca_ned.pdf

- capacity listed is correct, but can be detailed a bit more. The RoCa consists of three units, Roca1, 2 and 3. See http://www.eon-benelux.com/eonwww/publishing.nsf/Content/Overzicht+Centrales. The Roca3 plant has a net electric capacity of 220 MWe; an additional 200 MW thermal capacity; i.e. to supply heat to the connected district heating system. Enriched flue gases are sent to greenhouses in Oostland. Together, Roca1, 2 and 3 have the capacity listed in Carma.org -- i.e. 269 MWe.

  • Date of commissioning Roca1 (&2??) is 1980
  • Data of commissioning Roca3 is 1986
  • this begs the question where to evolve from enipedia??

I've fixed the location and specified the power output in terms of electrical and thermal output. In going forward, I think we should look at how the template on Wikipedia defines information about power plants. Another source would be eGRID - "The Emissions & Generation Resource Integrated Database (eGRID) is a comprehensive inventory of environmental attributes of electric power systems. The preeminent source of air emissions data for the electric power sector, eGRID is based on available plant-specific data for all U.S. electricity generating plants that provide power to the electric grid and report data to the U.S. government."

Both of these sources should give an idea of the wide variety of things that we can describe, which will influence how we decide to expand the ontology used for describing power plants. ChrisDavis 16:57, 9 March 2011 (CET)

OK; are you suggesting we should use the Wikipedia template for description of PowerPlants in enipedia? GerardDijkema

At a minimum, I think we should look at it to see what they are describing to help inform how we expand our own schema for describing power plants. I don't think they have everything that we may deem necessary, but they've at least done quite a bit of work (within a community of people) in figuring out common properties of power plants.

In addition to enumerating all the different properties of things we want to describe, I think we should also give some attention to the different (multiple) levels of detail in which this data is likely to exist (i.e. aggregate data at the level of a power plant vs. data for individual generators). In other words, we should set up a system as much as possible so that people can enter in either very generic or very specific information, depending on what information is available. The information they find should as much as possible "just fit" within the blanks we specify. To do this, we should be careful about how much we try to automate, and should allow for some "useful" duplication. For example, we may want to allow properties such as power output to be specified both at the level of the plant, and also at the level of the individual generators. While we could automatically calculate the power output of the total plant from the summing outputs of the individual generators, we may not want to do this by default on the pages. For example, we may know the total output of the power plant, but maybe only the output of two of the four generators. We can always later devise queries to run over the data that can highlight entries where the more detailed data does not match the aggregate data.ChrisDavis 16:57, 11 March 2011 (CET)

[edit] data on powerplants -- synchronize with wikipedia??

From that page I copy the template for "other power plants (i.e. thermal ones).

[edit] All other types of power stations

{{Infobox power station 
| name               = the name of the PowerPlant in enipedia / wikipedia
| official_name      = E.ON Roca3 (example)
| image              = some.jpg
| image_size         = 
| image_caption      = 
| image_alt          = 
| location_map       = 
| location_map_width = 
| location_map_text  = 
| lat_d     = 
| lat_m     = 
| lat_s     = 
| lat_NS    = 
| long_d    = 
| long_m    = 
| long_s    = 
| long_EW   = 
| coordinates_type   = type:landmark
| coordinates_display= inline,title
| coordinates_ref    = 
| country            = Netherlands
| locale             = Rotterdam (?)
| status             = Operating (?)
| construction_began = 1980
| commissioned       = 1986
| licence_expires    = n.a.
| decommissioned     = in the future
| cost               = 100 Million Euro (??)
| owner              = E.ON Benelux
| operator           = E.ON Benelux
| developer          = EZH
| constructor        = Alsthom, Siemens, ABB, dunno
| primary_fuel       = NaturalGas
| secondary_fuel     = 
| tertiary_fuel      = 
| generation_units   = 3 Gas Turbines, One Steam Turbine
| turbine_manu_other = Siemens
| thermal_power_all  = 285 ? must check what is meant
| cogeneration_all   = what is this
| combined_cycle     = yes or no? No.
| ghg_emission       = ton per year or what?
| installed_capacity = of what, MWe
| max_planned_cap    = does this relate to expansion?
| capacity_factor    = or utilization factor
| average_annual_gen = in MWe
| net_generation     = MWe
| website            = www.eon-benelux.com
| as_of              = what?
| extra              = what?
}}

[edit] comments to template above

1) first question that comes to mind is -- how to extract data from Wikipedia (query all powerplants with info in Wikipedia? 2) second: I have indicated what I would fill in in the template above.

  Without the template manual it appears NOT to be intuitive, i.e. prone to errors, and ambiguity. 

Also, the template does not appear to be really useful (there appears not to be a database behind it.

What would we want to know of PowerPlants in the Netherlands?

aye) the trivial stuff

  • name, location, power rating (MWe = MW electric), net (this is also what e.g. E.ON lists on their website.
  • fuel type(s) used
  • steam turbine no. and capacity, year of commissioning
  • gas turbine no. and capacity, year of commissioning
  • system heat rate // thermal capacity (rate of fuel input, MWth)
  • emission performance (on SO2, NOx, particulates, other)
  • economic // financial data: investment (per MWe); operating cost (excluding fuel)
  • facts about technology used and system design (requires us to briefly dig-in to the main types of PowerPlant design:
  • Only electricity
    • simple Rankine cycle (furnace, boiler, steam turbine); most coal fired plants (check for standard name)
    • combined cycle plant (same as before, but now with gas turbine)

The above with possibility of heat supply from steam turbine let down (combined power and heat)

  • Modern cogen (combined heat and power)
    • gas turbine + boiler
    • variations thereof (fired boiler; inclusion of a (small) turbine etc.
  • special cogen
    • e.g. the Roca, with special technology for enriching flue gas with CO2 (fired boiler)

For any system with a steam cycle it is of interest to include

    • HP boiler steam pressure
    • condensing or non-condensing steam turbine; back-pressure or not
    • cooling method

(can be air fan, cooling tower, cooling water (fresh water, sea water)

I've converted some of the notes above into the modified templates below. What's needed still is to explicitly indicate/check which properties are attached to which types of objects (i.e. power plants vs. turbines vs. boilers). Also, as mentioned further above in this page, I think we should give people the option to enter in either aggregated (i.e. power plant level) data or specific (i.e. individual turbine level) data. Allowing data to be specified at multiple levels will lead to some duplication, and we should map out how these properties existing at different levels are related. ChrisDavis 09:19, 14 March 2011 (CET)

[edit] Modifying existing template

  • Suggested changes:
    • Add explicit dates to carbonemissions, energyoutput, and intensity
    • Add fields for turbine and boiler objects (these should be represented as distinct objects with their own properties).
    • Change power_output to power_output_electrical
{{PowerplantTest
| name=
| year_built
| ownercompany=
| parentcompany=
| carmaId=
| fuel_type= (list of types - include amounts?)
| efficiency=
| availability=
| operating_cost=
| power_output=
| power_output_thermal=
| carbonemissions= (number + date) -> may want to generalize this into just "emissions" where we specify the pollutant, the amount, and the date (+ release medium?)
| energyoutput= (number + date)
| intensity= (number + date)
| city=
| metroarea=
| county=
| congdist=
| state=
| zipcode=
| country=
| isocountry=
| continent=
| latitude=
| longitude=
| point=
| cooling_method= (does this belong here?) - air fan, cooling tower, fresh water, sea water
| power_plant_type(?)=electricity only, modern cogen, special cogen
| number_of_turbines=
| turbines=(see turbine template below)
| boilers=(see boiler template below)
| thermodynamic_cycles(?)=Rankine cycle, combined cycle
}}

{{Turbine
| turbine_type(?)= steam vs. gas
| capacity= 
| year_built=
| (property name?)=condensing or non-condensing steam turbine
| (property name?)=back-pressure or not
}}

{{Boiler
| steam_pressure=
}}

[edit] Fixing up power plant form

  • Units need to be dealt with in a consistent manner
    • need to list defaults for the properties, and other units that people may input
  • add in extra fields to accommodate sources for each piece of data.
    • Need process to guide people in entering in new references (based on Add Reference)
  • Look into things that we can use auto-completion with
    • owner, parent company, ???
  • The form needs more descriptions to guide people through what each of the fields means.

[edit] Summary of possibilities for retrieving/adding data to enipedia

  • First (off-topic) question: is there a page (I didn't find any) where Enipedia users can ask various questions to all other users? This way, when having a problem or a question, instead of asking one person, all users could give an answer if they have any (that makes me think that I don't often see indenting in discussion when users answer to each other, which makes it quite difficult sometimes when going through what has been said/written...I know this is not a forum, but sometimes it adds readability).
  • Otherwise my question is on the various possibilities on how to retrieve data and add it to Enipedia. I have gone through many pages, trying to get what was done and how, I've found many pages giving hints and instructions on how to do things. But I think what is missing is a page that would list and detail all the different possibilities. Until now I understood this:
    • some tools have been created to retrieve data from massive databases (Wiki pages, eGRID, Google Maps, OpenstreetMaps, etc)
    • SPARQL queries permit to retrieve data from various endpoints (or only dbpedia for now? I'm not sure. By the way this list of Currently Alive SPARQL Endpoints could be interesting)
    • that you can convert data of excel datasheets, MS access to RDF format in order to match it to Enipedia using Google Refine (or use directly Google Refine with excel datasheets?), but I still haven't understand how this "matching" is used then to integrate data to Enipedia...

I've spent quite some time, reading things here and there to understand this, and maybe a page summarizing everything could be nice for new users.

  • Now a concrete question concerning all this: I've found an excel document listing all UK oil field (UK Oil fields approvals on gov.uk). The excel sheets have maybe to be reworked a bit. As a start I wanted to automatically create pages for each oil field. I guess a template would be useful also. But everything is still too fuzzy for me in order to see how to do all that.

Raph (talk) 10:15, 6 February 2013 (CET)


  • We don't currently have a dedicated forum page although we're open to creating one. The discussion page on Help:Contents might be a good place to start this. Currently we just follow the RecentChanges to see if people are asking things.
  • We could certainly use a bit of help in clarifying and updating the documentation. Basically, the more you ask, the more it helps us to identify the gaps in the current documentation. There's a lot of information still in our heads and it's not always clear to us what's missing or confusing for people. Currently I've been tagging the relevant pages with Category:Documentation, although these aren't currently connected together in a clear fashion.
    • SPARQL queries should be able to retrieve data from any endpoint by using the SERVICE keyword. We've also done some work on the SparqlExtension to allow for a value for an "endpoint" parameter to be specified. With this you don't have to specify SERVICE as it runs the query directly on the remote endpoint. The limitation of this is that it may not work with some of the Google Chart visualizations. You can see a demo of these various methods in action at Mines.
    • The matching and integration of external data is something we're trying to more systematically address. With regard to matching, the discussion here describes some of the difficulties encountered and different approaches we're looking into to overcome these. In general, it's common to encounter data sets that use different names for the same thing and similar names for different things. The integration is a bit easier and generally involves using a wiki bot framework that can write to templates. There's quite a few bot frameworks out there, and we've made one for R as well. A simple way to write data to templates is by making API calls to sfautoedit (see documentation here)
      • What's probably confusing is that integration may mean different levels of things. For example, when we talk about converting data sets to RDF, this means that we then host them in the sparql endpoint as a separate named graph. We may then further integrate this data by linking unique identifiers to wiki pages that describe the same object. An example of this is the Amer Powerplant where the page contains the identifier that is in use within the EU-ETS. Based on this identifier, the graph of the emissions trend is drawn by first querying the named graph for the EU-ETS data.
  • For making pages for oil fields, see the discussion above about bots, sfautoedit, etc. Also, the links on Help:Contents to semantic forms and semantic templates will show what needs to be set up.

--ChrisDavis (talk) 12:50, 6 February 2013 (CET)

Thanks for all this information. I tried things out a little, starting easy.

  • I created a test oil field page User:Raph/MyOilField
  • I created a Category:Oil_Field.
  • I also created a Template:OilFiedTest. Of course I messed up in the spelling which I took time to notice...I didn't check out the procedure to rename a page without messing up everything. I'll leave it like this for now.
  • And finally I created a Form:OilFieldTest which is based on the eponymous template.

What I did is add the Category:Oil_Field to the User:Raph/MyOilField page (but maybe I could add directly the template to the category? this way all pages created with this template are in this category no?). Then I defined that the Category:Oil_Field has the default form Form:OilFieldTest. But I think it's not all clear to me all the links between categories, templates and forms... I did this to try out API calls to sfautoedit (since it needs to use a form and a template) and I got it to work. I need now to see how to concretely make automatic calls to sfautoedit with the data I want to add to the pages. I feel that for now it's the easiest thing to do before looking at more complicated stuff like bots (which needs more rights than simple user rights no?). And I need to practice more with SMW stuff, like queries and things like that. This is new for me. Anyway thanks for the guidelines. And please tell me if I should do things differently, I prefer learning directly the right way to do it. Raph (talk) 13:43, 7 February 2013 (CET)

What you probably want to do is to add the category to the template page, similar to what we do at the bottom of Template:PowerplantTest. We also often add to the template things like __SHOWFACTBOX__ in order to show all the properties and values contained on the page.

One thing to think about is that the approach you use in organizing/managing the data depends on things like how often the original data set is updated. For instance, in developing Enipedia, we've had to figure out which data is best stored on the wiki itself, and which data is easier to manage if it's stored as a named graph in the sparql endpoint. As a general rule of thumb, if you're piecing data together from multiple sources, then storing the data on the wiki makes a lot of sense. If you have a large data set that gets updated on a regular basis, then it makes sense to store it as a named graph in the sparql endpoint. This approach makes it much easier to just delete all the old data and import the new data. We originally tried having all data stored on the wiki, but it became a bit of an administrative challenge for these data sets that we expect to be updated.

Bots are surprisingly not that difficult, especially if you're using an existing framework. Also, if you're doing automated calls to sfautoedit, you're basically running a bot already. Even if you look at the code for the one we developed here, it's only a couple POST requests involved. As for accounts, we usually set up separate accounts for bots as it makes it easier to undo changes if something is wrong with the script. --ChrisDavis (talk) 18:33, 7 February 2013 (CET)

I myself intended to set up entries for gas fields in North Sea, so may I suggest that you use some more generic object that can fit different kind of fields? Also, just importing the content of the Excel file you have won't provide much added benefit. Maybe you can enrich it with some additional facts such as those found here as well as some production figures. Starting from there we'll have some stuff to work with. There is also extensive data from Norway which could be used as well as some summary data for the Netherlands, the main benefit from Enipedia being to bring together different data sets with a unified querying mechanism. --Nono (talk) 00:24, 8 February 2013 (CET)

I agree that it should be more generic. What I did here was actually more to sense better how to do things (as I'm discovering a lot right now), so it was more of a test. I thought starting first from the info of the excel file because I thought it would be easier to start with. To be honest I had found this source but I didn't see how to retrieve all the data. But I agree there is a lot more information than in the excel document. Otherwise I have a question concerning Forms. Is it right to say that when you create a Form based on a Template, it is best that the template is in a "final" version? Otherwise if you modify the template you have to manually modify the form right? I'll change the template to a more generic one and work on it. Meanwhile I'll also try to find out how to get all this data from the UK Gov website. --Raph (talk) 09:42, 8 February 2013 (CET)

  • No problem with that! Just, you were going so fast that I became afraid you could go beyond testing before I got in touch... I have set up a scraper to retrieve data from DECC site --Nono (talk) 09:44, 11 February 2013 (CET)
Sorry for that. Thanks for the scraper. I took a look at it, it seems not so difficult. There is just some syntax that I don't get. For the next step (convert to RDF and store as a named graph) I don't think I can help yet. Not enough knowledge and hindsight. But I can try to scrape other sources that I find. And I can work on a general field template. I took a look at the PowerplantTest template which I know is using information from named graph, but it is quite substantial and I don't clearly see where (in another template maybe?) are made sparql queries to the graphs. Maybe you know of a simpler template that I could inspire myself of. (Never mind, I wrote too fast, I figure that it is a script that runs and fills out the templates of the Powerplant pages using sparql queries. The template is only for displaying information on the page. And now I understand the use of those arraymap and parser functions...Raph (talk) 13:31, 12 February 2013 (CET))
--Raph (talk) 10:52, 12 February 2013 (CET)
I can help you get started with mapping to RDF and set up an example. The approach I've used in the past for data such as eGRID and the E-PRTR is to use Google Refine and the RDF Refine extension. This is one of the more user-friendly approaches I've found for converting data to RDF. Google Refine is awesome as it has a lot of features for cleaning up messy data and transforming it into a more usable format. You can get an overview of some of the features by browsing Google Refine Tutorial.
In general, I would start by thinking about what types of things you would like to link to what other types of things. For example, instead of just plotting the output over time for a particular field, you may want to do things like plot the sum of the output per month for all the fields owned by one company, with the output scaled by their ownership share. The RDF schema needs to be set up in some sort of way to allow for queries to be built that connect the dots between these different types of facts. This whole process usually takes a few iterations to get the schema right or at least good enough. Also, from looking scraper data, it seems like you'll find a few issues that need cleanup. For example, in the field_info table, the AFFLECK field has operator MAERSK, while in the field_owner table the AFFLECK field has owner MAERSK OIL UK LIMITED. These types of issues don't have to be dealt with immediately, but for more sophisticated queries, it may eventually be necessary to address these.
--ChrisDavis (talk) 14:01, 12 February 2013 (CET)
I had already tried the Powerplant dataset reconcile example with Google refine and failed (there was no matching at all...). But I just did it again and it worked fine. I installed the RDF extension also. I'll take a look at the RDF export functionnality, it looks like there is a correct documentation and examples. About the cleanup you talk about, is that also done using Google refine?
My other question is that there will probably be other sources for oil field (than the uk gov website Nono scraped), which means another scraper and therefore another database that will not exactly be the same...so I guess the set up RDF schema should also be generic enough to be able to easily add new data. Or maybe the schema can be modified later?
--Raph (talk) 22:49, 12 February 2013 (CET)
In looking at the scraper data, I can see that Google Refine can be useful for cleaning it up by converting values such as Aug/2009 and 2009/8 into actual dates. The same can be done for locating all the measurements in feet and converting them to meters. This could also be done in the scraper, but in general, Google Refine is good for helping you to spot and get an overview of these issues. An added benefit is that it records the history of all the changes that you have made, meaning that you can repeat the process (for certain operations) whenever the data is updated.
It's good to have the RDF schema as generic as possible although I fully expect it to evolve over time based on what you find in the different data sets. I would also suggest to think in terms of repeatable modular data processing steps. For example, imagine a process that collects the data from multiple ScraperWiki scrapers via API calls and then through several steps cleans it up and aligns it to a standard schema. As you work with more data sets, you're going to find all sorts of strange data issues and also notice things that you forgot to do or would be nice to have. There's no one correct way to do this, but in general, it's good to try to make it easy to experiment, iterate and repeat the whole process of converting the raw data to cleaned/structured data.
--ChrisDavis (talk) 19:01, 13 February 2013 (CET)
I just did an initial mapping of the data to RDF, which is now available via the sparql endpoint and is browsable via the pubby interface. It's currently good enough for testing, and we'll probably figure out better ways to map the data as we try out different things with it.
select * where {
?x rdf:type <http://enipedia.tudelft.nl/data/DECC_UK_PPRS/Resource/Field> . 
}
  • Find properties of all fields:
select distinct ?y where {
?x rdf:type <http://enipedia.tudelft.nl/data/DECC_UK_PPRS/Resource/Field> . 
?x ?y ?z . 
}
  • For all fields of a type (offshore or onshore), sum up the production per product per date:
PREFIX deccprop: <http://enipedia.tudelft.nl/data/DECC_UK_PPRS/Property/>
select ?offshoreOrOnshore ?product ?date sum(?value) as ?totalValue where {
?x rdf:type <http://enipedia.tudelft.nl/data/DECC_UK_PPRS/Resource/Field> . 
?x deccprop:production ?productionInfo . 
?x deccprop:offshore_onshore ?offshoreOrOnshore . 
?productionInfo deccprop:product ?product . 
?productionInfo deccprop:date ?date . 
?productionInfo deccprop:value ?value . 
} group by ?offshoreOrOnshore ?product ?date order by ?date
The mapping is loosely based on the ideas presented here, namely that URLs used to identify objects should be human-readable so that people have an indication of what the object is describing. This whole process can be repeated by using the CSV files from the scraper and applying the OpenRefine change histories up on https://github.com/cbdavis/DECC_UK_PPRS-to-RDF.
--ChrisDavis (talk) 10:46, 14 February 2013 (CET)

The main thing with templates and forms is that you have to make sure that they are synchronized (i.e. you need to manually update them), otherwise the form may try to set values that are not mapped to a semantic property or the template may allow for properties that cannot be set by the form.

For the DECC production data, I think the best approach would be to set up a scraper on ScraperWiki to run once a month and get the latest data. On enipedia, we can set up a script to convert this to RDF and store it as a named graph in the sparql endpoint. If you want to have wiki pages for each of the fields, then it's possible to pass a unique identifier for the field to a template that then constructs a sparql query. This is the approach that we use when linking power plants to their EU-ETS data. --ChrisDavis (talk) 10:16, 8 February 2013 (CET)

[edit] PowerPlant Ontology

== Company 
- name
- website

== Location
- lat
- lng
- city
- state
- country
- zip

== Powerplant # is something that has (a set of) owner and a location
- name
- website
- image
- location [[Location]]
- owner [[Company]]
- operator [[Company]]
- units [[PowerGeneratingUnit]] 

== PowerGeneratingTechnologyModel
- label
- producer

== PowerGeneratingUnit # generators, what happens when describing wind parks
- technologies [[PowerGeneratingTechnologyModel]] # think ford fiesta
- yearCommissioned
- yearDecommissioned 
- yearStartConstruction
- permitValidUntil
- totalInvestmentCost
- investmentPerCapacityMW
- totalInvestmentEur
- operatingAndMaintenanceCostsPerYearEur
- capacityMW
- fuelConsumption [[FuelConsumption]]
- emissions [[Emission]]
- efficiency
- electricityProduction [[ElectricityProduction]]
- heatProduction [[HeatProduction]]

== ElectricityProduction
- year
- amountMWh

== HeatProduction
- year
- amountGJ
- type (ie district heating, industrial heating, agro heating, high-temp, low-temp)

== Emission
- class (ie CO2, SO2, NOX, dust)
- amountTon
- year


== FuelConsumption
- fuelType [[FuelType]]
- amountGJ
- year

== FuelType
- class (ie. coal, gas, etc)
- energyDensity
- purity
- co2Intensity or carbonContent
- name (ie polish dirty coal)


[edit] New Power plant ontology

This is still in progress. Need to finalise the addition of details discussed during previous meeting. Also need to work out dependencies in data that may be distributed at different levels (e.g. total power output can be calculated by summing output of individual generators, etc)

  • wikipedia has different templates for different types of PowerPlants
  • to date, below PowerPlant Ontology is for thermal powerplants
  • the question is is this template also suitable for other than fossil fuel fired power plants?
    • nuclear power plants
    • concentrated solar power plants
    • for these two types, the answer appears to be yes, as both these plants essentiall employ a different heat source to drive a traditional Rankine cycle (steam-turbine, condensor)
    • we may want to add an ontology for the type of nuclear reactors later
  • what about windfarms and Photovoltaic solar plants?
  • upon closer inspection of the ontology
      • it does include the energy conversion devices
        • fuel to heat to power/heat (gas turbine)
        • heat to steam (boiler)
        • steam to power (steam turbine)
      • it does not include
        • the heat source
        • which is a nuclear reactor, solar power concentrator, or the traditional furnace where some fuel is burned
        • or part of the equipment, i.e. the combustion chamber of the gas turbine
      • nor does it include the method of cooling
        • this can be either cooling tower (requires limited freshwater input), direct cooling (lots of fresh water or salt water), air cooling (no water required, massive cooling banks, lower efficiency)
  • GerardDijkema 23:04, 28 March 2011 (CEST)

[edit] Refactoring / what needs to happen

  • The current Powerplant template doesn't need to be changed too much. The most refactoring will be done on these properties:
    • carbonemissions2000, carbonemissions, carbonemissionsnextdecade
    • energyoutput2000, energyoutput, energyoutputnextdecade
    • intensity, intensity2000, intensitynextdecade
  • Change power_output_thermal to power_rating_thermal
  • Split efficiency into electricalEfficiency and thermalEfficiency

The current values for these should be left in as we refactor the ontology in order to prevent breaking the current queries. Before deleting these we need to set up some tools that can find all the queries all the site so that we can update these as well to match the new data structure.

We also need to add in references for each of the properties, and this will be done by running a bot over all of the pages, and filling in the appropriate reference to carma.

More templates and forms need to be set up to match the new structure as well.

[edit] Visualization of structure

This has been moved to Power Plant Data Structure Visualization by GerardDijkema 14:27, 31 March 2011 (CEST)

[edit] Notes

  • remove parentcomany of ownercompany
  • no need for metroarea, county, congdist, state, isocountry, continent
    • how so (no need)? This would be nice, if people input this, to do searches
    • alternatively, it could be somehow populated based on lat and longitude
    • would say a low priority item to change or fix; GerardDijkema 22:30, 28 March 2011 (CEST)
    • check to see how carma uses this - duplicate info, or additional info? ChrisDavis 16:55, 25 March 2011 (CET)
  • for location we dont need a separate object but we need: city, address, zip, country, lat & longitude
  • we need city with country
    • for now we should country on each of the power plant pages. Otherwise, we have to create a page for every city and then explicitly mention which country that it exists in. We would also have to disambiguate cities with the same name in different countries. ChrisDavis 16:55, 25 March 2011 (CET)
  • we need country obj with: iso, continent
  • in aggregated put total capacity (figure out power_output vs capacity vs electricityProduction)
    • do you mean name_plate capacity here? GerardDijkema 22:32, 28 March 2011 (CEST)
  • totalInvestment, delete totalInvestmentCost
  • year ranges - do we need it?

[edit] Refactoring Powerplant Template & Form

  • Forms
    • Form:PowerplantTest
      • if you type the exact name of an existing page, Enipedia displays a warning - OK
      • what about nearly the same name - can some autocomplete function be implemented to avoid people inputting multiple instances of the same power plant?
        • Autocomplete has been installed on the template. OK GerardDijkema 10:16, 29 March 2011 (CEST)
      • or in the final version -- display a warning -- you are inputting a new powerplant; please (confirm?) that you have checked by searching the database a record does not exist yet
      • or check by address, zip code or coordinates (later on, with a bot or what?
      • the second one is the pragmatic solution, maybe requires a search powerplants by name or location (town) or company (automatic popup?)
      • GerardDijkema 23:12, 28 March 2011 (CEST)
      • with the Autocomplete added, the remainders are low-priority items. Suggest to put a short text on the PowerPlant edit Form though -- such as:"If you are about to add a NEW powerplant to the database, please check that an entry does not already exist".
        • I've just updated this ChrisDavis 14:07, 29 March 2011 (CEST)
    • Form:Power Conversion Unit
  • Templates
  • Features - general details meant to help with the workflow (useful for documenting extelligence process)
    • if no wikipedia page specified for power plant, a link to the wikipedia search for that power plant name is automatically generated
    • references are specified at the page level, with a field that can document notes that people have on it.
    • coordinates generated by moving marker on the map
  • Before deploying:
    • search & replace PowerplantTest with Powerplant.
    • Need to go through and mark relevant emission pages with "Category:Pollutant"
    • Fixing up old Powerplant template:
      • migrate from the use of "lat" and "long" to just "point" - the new form supports moving the marker on a map, and the coordinates are returned as a single value not as distinct lat & long values (See here for an example).
      • some of the older fields in the template should be left in until everything has been safely migrated to the new schema
      • Should the "name" field just be PAGENAME? I don't know why we need this distinction.
  • impressive piece of work
    • - from reading this, current status is work in progress
    • UPDATE 13:48, 4 April 2011 (CEST) current status is beta-testing!! GerardDijkema
      • -- deadline (ready to rock'n roll - April 2nd, 2011) still feasible?? GerardDijkema 22:34, 28 March 2011 (CEST)
      • I think ready to go by morning of April 4th is feasible, with regard to editing information about Dutch power plants.
        • Current TODOs:
          • assign default values & units where possible
          • check mapping between forms and semantic properties (if you enter it, does it show up?)
          • check ontology discussions above - not all of the ontology has been implemented yet in the forms/templates
          • finish aligning the templates so the way they display data is consistent with the way that it's displayed in the forms.
          • run a bot to refactor the data on existing pages.
          • ChrisDavis 11:45, 31 March 2011 (CEST)

[edit] Status as of April 3rd 2011

  • Power plants in the Netherlands are using the new templates
  • Known issues:
    • You may have to click on refresh in order to see the latest data on the page (click on the downward pointing triangle next to "View history", and then on "Refresh").
    • Some of the queries on other pages (like Netherlands) reference older "legacy" data that is still hidden on the pages. This is done to prevent breaking the site during the transition. The new form directly edits data set up using the new schema. To view the "legacy" data you have to look at the raw view of the page. Once it's clear that everything has transitioned ok, we can switch the queries, and clean up the old data.
    • Still need to set up templates so that by clicking on a red link for a type of technology (i.e. turbine, boiler, etc) you are redirected to a form where you create a page and fill in properties of that technology.
    • "Availability" needs to be refactored into the new schema. I believe that this is only specified for the Netherlands.
    • More work needs to be done on specifying unit conversions.
    • ChrisDavis 13:47, 3 April 2011 (CEST)
  • Conversion of units:
    • TODO - on the main page for a power plant, the efficiency is listed. The Eems Powerplant, for example, shows 0.5 %. Has the conversion go wrong, or do we use another unit. The real efficiency is 55% (or 0.55) GerardDijkema
    • TODO - I think this needs to be corrected for all the powerplants. The figure needs to be multiplied by 100 to get the percentage. I will not manually edit. GerardDijkema 20:26, 4 April 2011 (CEST)
      • FIXED - I ran a bot script to fix all the values. Admin 23:57, 4 April 2011 (CEST)

[edit] User experience on/after April 3rd, 2011

  • OK, it all seems to work, I am starting to upgrade the information on Dutch Powerplants
  • This really is a serious activity -- need to document a lot of things and keep track of sources
  • Make sure to add suitable references to the system - this is conveniently located at bottom of page
  • The edit pages are rather long. Can the save page button be located somewhere at the top too?
    • save page buttons can be included many times throughout the page. We may also want to look into setting up tabbed forms to break up the sections. ChrisDavis 19:12, 3 April 2011 (CEST)
  • I am not quite sure what to make of the label entry on the PCU page, Gas Turbine page etcetera
    • I suggest that we take it off. The intention was that this would be something like the "official name", but I think we should be ok just by using the name of the wiki page. ChrisDavis 22:45, 3 April 2011 (CEST)
  • on Gas Turbine page it has a field turbine type
    • this still needs to be developed, for future use?
  • The autocomplete search box at the top is really awesome to navigate the site / database of powerplants!
  • also the option to shift the coordinates works pretty smooth!!
  • hmmm, I was working on E.ON's Maasvlakte plants, see below. It turns out it is not straightforward (to me) to create a new page / database entry for a new power plant.
    • and of course, it should assist the user and prevent duplicate entries
    • GerardDijkema
  • working on Electrabel's Bergum Powerplant, the Google Earth display seems to work incorrectly - when clicking the - button, it sometimes zoomes out to the world.

[edit] Comments, Bugs, Suggestions after April 3rd 2011

  • This section documents results from first using the new template forms. The idea is to get everything working as swiftly and intuitively as possible, identify annoying bugs etc., before going to engage a wider audience
  • Secondary objective, of course, is to get the representation of Dutch power plants as up to date as possible
  • Split Documenting into Top, Middle and Tail for now, as this already has grown into a long list. Good to keep track of what has happened though for future documentation and publication.

[edit] Documenting Top

  • Using: I found that when inputting Power Conversion Unit, I need to create entries for Boiler, Gas Turbine etc. I created Generic Gas Turbine and Boiler.
    • on the Power Conversion Unit page, the entry for NamePlate Capacity is not at the right place. It should be on the left, under the box year commissioned etc.
    • Oops, a power plant retrofitted with a gas turbine has two heat sources, - furnace and gas turbine. Also, in principle, these can have different types of fuel.
      • solved this - by using Ctrl-button, you can select more than one heat source GerardDijkema 21:07, 3 April 2011 (CEST)
    • gas turbine combined cycle has one heat source (it only has a boiler to recover heat from exhaust gases)
    • what is good practice, when we know they are there, but we still do not know which ones?
    • Using: / Bug: when you have created a new Boiler page or Gas Turbine page, you cannot go back to the original page you were editing
      • This also happens when adding a reference
      • TODO - What I propose to fix this is that we (1) take off the "create new" link from the form, (2) let people still fill in the name of the technology, and (3) once they press save, they're presented with red links that they can click on to go straight to the appropriate form (need to use this) ChrisDavis 19:31, 3 April 2011 (CEST)
    • Using: / Bug: - clicking Show Preview on PCU page lets Enipedia go away loading... forever?
      • Maybe a bug, although I'm running a bot right now that is updating the rest of the pages. I just tried stopping it, and the Show Preview feature was responsive. ChrisDavis 19:10, 3 April 2011 (CEST)
    • Bug: - when taking out the comment text going with a reference, the listing of the reference on the PCU page display disappears
  • Using: / Suggestion: We should discuss how to deal with references when we have underlying PCU pages for a powerplant. Do we migrate/collect these underlying references on the main page for a powerplant?
    • TODO - I think we should redisplay them on the main power plant page via a query. For example, we could show a list of all the references used, next to the various objects that use that reference. ChrisDavis 22:43, 3 April 2011 (CEST)
  • Using: Multiple units and multiple dates of building
    • TODO - The year built on a power plant page is somewhat ambiguous. Many sites are with multiple plants.
    • Suggestion: put an explanation that this date signifies date first plant was (ever) built on site
    • TODO - And list the PCU's with their name, rated capacity and date built on main page per power plant
    • above is really a feature request -- to be discussed and decided later
      • This isn't so difficult to do ChrisDavis 12:00, 4 April 2011 (CEST)
    • TODO - Need to display units for emissions, power output, etc. Initial prototype is at User:ChrisDavis/Test
    • TODO - If create a new PCU, but with empty title, then need to auto-generate title.
    • TODO - take out "label" field, just use page name.
    • TODO - for multiple select lists, mention that you can Ctrl + click to select multiple
    • TODO - work through workflow where may not know the type of technology used (generator, boiler, turbine, etc), but wish to indicate the presence of it. Use checkboxes?

[edit] Documenting Middle

  • Suggestion: the Amer (A8 and A9) you see that there is progress in boiler/steam conditions, and thus efficiency. It would be great to be able to list steam pressure and temperature on the PCU page, rather than have them listed as conditions of the boiler (otherwise, we are forced to create a new boiler page for each PCU)
  • look at the free text data on Maasvlakte MV-1 - we should think of how to capture the history of PCU's, as they are being revamped, equipped with flue gas cleaning facilities etc.;
    • extend the technology sheet with flue gas cleaning facilities/technologies
    • create the possibility of multiple instances of a single PCU, time stamped
    • otherwise? GerardDijkema 14:34, 4 April 2011 (CEST)
  • Using: working via Power Plants in the Netherlands, you get to Maasvlakte Powerplant.
    • However, this is not listed as an E.ON plant. If you type E.ON in the search box, it does not show up
  • Using: / Suggestion: / Bug:(correcting/updating entries in the database): the name E.on Bv is wrong. But correcting it appears to break the query on E.on's power plants
    • TODO What should the name instead? I want to test this out. ChrisDavis 16:29, 4 April 2011 (CEST)
    • The correct name is E.ON Benelux N.v. see http://www.eon-benelux.com/ GerardDijkema
      • See issue below. I've renamed it, although I need to fix our current software setup to better deal with redirects. ChrisDavis 00:58, 5 April 2011 (CEST)
    • It would also be interesting to see what happens if we redirect Maasvlakte Powerplant to E.ON MPP1 & MPP2. GerardDijkema 20:21, 4 April 2011 (CEST)
      • TODO I need to fix the SparqlExtension/triplestore to robustly deal with redirects. Unlike TWiki, MediaWiki doesn't update the links on the pages when you rename the page. This leads to problems with queries because it looks like the old page and the new page are two distinct objects. There's a few options to fix this, and I need to figure out what's best. ChrisDavis 00:55, 5 April 2011 (CEST)
        • See bug report. This is a Semantic MediaWiki issue, which I'm talking to the developers about. ChrisDavis 00:03, 8 April 2011 (CEST)
    • there are entries E.ON MPP1 & MPP2 and E.ON MPP3 -- we need to align these with Maasvlakte Powerplant
    • suggest we have a site E.ON Maasvlakte-I
    • then on the site three power plants
    • discussion needed I believe.
    • For now, I suggest we created a new powerplant, E.ON Maasvlakte MPP-3
  • 'Using: / Suggestion: can we redirect a PCU to a new power plant (e.g. if people make mistakes). From http://www.eon-benelux.com/eonwww/publishing.nsf/Content/Type+centrale you can see that is really is a new plant, although it is sharing some facilities with the existing plant.
    • this question is related to the previous one
      • What do you mean by this exactly? For a PCU you can easily specify that it's part of a different powerplant via the form (although you should rename it as well to avoid any confusion - not sure if there's a way to automate this). If you mean specifying that a PCU is actually a powerplant, this is a bit more difficult, since you're trying to copy data between objects with somewhat different schemas. It can be done, but a very precise user manual would have to be written to explain how to fix this. ChrisDavis 16:37, 4 April 2011 (CEST)
      • for me this about avoiding duplicate entries existing in the database for facilities where there exists only one physical instance. But when inputting data, creating pages, people may make mistakes, or find better ways to organize the data. For example, an incompletely informed user may presume that MPP-3 is a new PCU added to E.ON's Maasvlakte Powerplant. A more informed user, however would immediately realize that MPP-3 is actually better input in the database as a new power plant, because that is what it effectively is. GerardDijkema 20:12, 4 April 2011 (CEST)
    • and I would prefer to walk through these edits, redirects, fixing the database etc. together to develop a curation strategy and manual GerardDijkema
  • Bug: I created Maasvlakte Powerplant/MV-1 and subsequently redirected it to Maasvlakte Powerplant/MPP-1
    • the result was a redirect (fine), but on the Maasvlakte Powerplant page, the PCU now has disappeared GerardDijkema
    • TODO - A bug report has been filed, and this is a problem with Semantic MediaWiki where if you move a page, the underlying semantic data is not transferred unless you save the page you have just moved. ChrisDavis 08:42, 13 April 2011 (CEST)

[edit] Documenting Tail (newest)

  • Bug: inputting electrical efficiency on PCU page does not return a result on display. See Maasvlakte Powerplant/MPP-2
  • Bug: on the Netherlands page, the biggest power plants are displayed. However, after the edits & additions of this afternoon on the Amer Powerplant, this one has disappeared from said listing. But it is still the second largest in MWh produced! Also, a refresh to force the query to execute does not bring back the plant in said listing. What has happened here? GerardDijkema 20:33, 4 April 2011 (CEST)
    • FIXED This is an issue that arose when I was initially testing the bot script, which was first run only on the Amer Powerplant. The diff showing the bot edit that broke things is here, and the code that caused the problem was fixed before running it over the rest of the pages. ChrisDavis 08:57, 5 April 2011 (CEST)
    • FIXED ChrisDavis 08:35, 13 April 2011 (CEST) This is actually a deeper issue with the Semantic Forms extension. I'm talking to the developer of the extension to figure out what's happening. ChrisDavis 23:24, 7 April 2011 (CEST)
    • hmm, I updated the page, but now Eems Powerplant has disappeared. Same story, or a more fundamental problem GerardDijkema 10:13, 5 April 2011 (CEST)
      • FIXED - Same story & a more fundamental issue. Some data was removed by initial bot testing, and the queries on Netherlands/Powerplants also needed to be improved. ChrisDavis 11:04, 5 April 2011 (CEST)
  • Suggestion: / Feature request: or discussion -- natural gas
    • Given Alfredas ambitions, I believe we should somehow decompose natural gas
    • Reason: the four power plants I have looked at now all use a special variety of natural gas
      • Eems Powerplant uses Ekofisk natural gas
      • Velsen Powerplant uses Dowson gas - Blast furnace gas - and Coke oven gas
      • most other plants in the Netherlands will use some form of high-caloric gas
      • in the future we have LNG-derived natural gas
    • GerardDijkema 21:24, 4 April 2011 (CEST)
  • Using: / Suggestion: - Enipedia confronts you with a rather busy home page. If we want to ship out the Power plant database to the world, should we not provide a very simple page as entrypoint, a la Google -- just a search box: Search Power Plant? GerardDijkema 10:06, 5 April 2011 (CEST)
  • Bug: For the treemap on Netherlands/Powerplants, Amer Powerplant is listed twice because it is specified as using hard coal and biomass as fuel types. We need to figure out how to change the query in an intelligent way. It currently assumes that a power plant has a single fuel type, which results in its total output. If there are multiple fuel types, then this gets more complicated since we have to look at fuel consumption (and make calculations involving efficiency to work out the total output per fuel type). Alternatively, we could look at the PCUs (which may not always be specified). This situation occurs for 179 power plants and I believe it comes directly from the carma data since this has been specified on the wiki since last year and covers some plants outside of the Netherlands. ChrisDavis 11:32, 5 April 2011 (CEST)
    • What happens with many PCU's is that they have a primary fuel and a secondary fuel. In most cases, the secondary fuel is only for backup/emergency situations, but in some cases the use of the secondary fuel is structural. Some units even use a tertiary fuel.
      • Essent mentions on its website that Amer A8 is co-firing biomass, probably at a standard (no-risk) 10-15%, while on the A9 it states it is already using 35% co-firing of biomass. However, it is not clear (or should be checked) whether this figure is indeen on a MW fuel input basis, not weight.
      • After the overhaul, A9 will go to 50% of biomass firing. The A8 and A9 are started on light oil (to heat up the equipment smoothly, avoiding massive emissions originating from badly burnt coal and biomass
    • Suggestion: include possibility of multi-fuel plants, indicating amount of each fuel (fraction of thermal input or output, power rating, acceptable and consisten units. GerardDijkema 22:31, 5 April 2011 (CEST)
    • Interim suggestion - for the visualization, could group together multi-fuel power plants? Not sure if this would cause any problems. ChrisDavis 13:21, 11 April 2011 (CEST)
  • Using: Using one of the new queries (no location data), I looked up the location of the Slufterdam West. However, now it is not straightforward to input the coordinates from the edit with form. latitude: 51.5525, longitude: 3.5926;. Also, on the Nuon / Vattenfall site, the map is marked with icons. Maybe we could do that (e.g. using the icons from our ABM's; GerardDijkema 17:00, 12 April 2011 (CEST)
  • TODO take off "name", just use the name of the page (applies to Power plant and PCU templates + forms, also others like boilers, etc?). Also add number of turbines to Power powerplant template + form. ChrisDavis 20:40, 12 April 2011 (CEST)

[edit] Documenting (new tail) - even newer

  • Using: Working on Electrabel's power plants, I found that when updating geographical location (Bergum plant), the query on Electrabel's power plant is now erroneous (Bergum in list, but not on map)
    • also, it is unclear whether we need to input 800, 800MW, 800 MW for nameplate capacity
    • naming these facilities is confusing - we need to think about it
    • duplicate name Electrabel Nederland and Electrabel Nederland NV, which is part of the PoR industry database. Find out how to merge (automatically?) or correct?
    • On the Harculo centrale, on the page the coordinates are interpreted, but it is indicated something is wrong with the coordinates
    • Something wrong with coordinates throughout the database?
    • GerardDijkema 16:40, 14 April 2011 (CEST)
  • The Avi Amsterdam Powerplant is listed on User:ChrisDavis/PowerplantQualityCheckQueries as one of the plants with no fuel type listed. However, on the page it is listed! GerardDijkema 16:40, 14 April 2011 (CEST)
    • This was an issue stemming from a time when there was an error in the triplestore database (found during a demo), and to get things working we swapped it out with a slightly older copy of the database. We now have a script that creates a fresh copy of the triplestore database every night, so this will be less of an issue. ChrisDavis 13:34, 19 April 2011 (CEST)

[edit] Glitches in the (Carma) database

  • when you start typing Ijmuiden in the search box, you will find Ijmuiden mekogessor powerplant. I believe this is not in existence (anymore). Googling mekogessor yields no hits but enipedia and carma
  • maybe Ijmuiden works and Ijmuiden UNA Power plant are the same

[edit] Resources

[edit] Bug reports submitted, based on issues found while working on Enipedia

  • Semantic MediaWiki does not properly export RDF data from pages that have just been moved. This issue affects the SparqlExtension.
  • Semantic Forms - when multiple templates are on a page, fields from one template may be copied into other templates.
    • This is halfway fixed - The developer has updated the code in svn, which fixes the problem, but leads to an issue where once you save the page (using the form), the map fails to load due to a javascript issue. Refreshing the page (purging) the page gets around this. I expect this issue should be fixed as well within a day or so. See here for the latest discussion around this bug. ChrisDavis 12:40, 9 April 2011 (CEST)
    • Can we duplicate this bug with the maps not loading? I'm not running into this issue any more. ChrisDavis 10:57, 18 April 2011 (CEST)
  • http://bugs.librdf.org/mantis/view.php?id=436 (referred to 4store & posted to http://groups.google.com/group/4store-support/browse_thread/thread/4fbc9b18cd8e8474?pli=1) - SPARQL queries performed using the 4store triplestore return incorrect results if large numbers are not specified in scientific notation. This bug resulted in calculations of power plants with negative outputs. This doesn't affect the site currently, but we've been testing this as an alternative back end.
  • Reported issues with 4store that prevented startup scripts from working due to bash syntax used in shell scripts (related to this).
  • Multiple instance templates + header tabs + semantic forms causes deep problems where input fields are given the wrong names... In other words, your data gets saved to the wrong place. Bug report submitted here, awaiting reply. Haven't heard anything in a few days, so cross-posted here. ChrisDavis 10:14, 27 April 2011 (CEST)
  • With Semantic Forms, it's possible to cache the form definitions (the rendered HTML) in order to speed up the site. However, when you do this, it changes the displayed name of the form from something like "Form:Powerplant" to "Form definition title for caching purposes". https://bugzilla.wikimedia.org/show_bug.cgi?id=31264

[edit] Data for multiple years

  • OK, but is there someway to either present a table to the user when he edits with form?
    • The option I'm thinking about is seen here with the "Add Another" button. With this, they could fill in the values for the different properties, with the value for the year being optional. ChrisDavis 22:45, 23 March 2011 (CET)
  • the idea is to input for a single year, or multiple year the characteristic operating data, that shows year, fuelinput, poweroutput, heatoutput, CO2emission etc. OR utilization rate
    • Multiple years may be possible, but we would need to think carefully about it. The immediate issue that comes to mind is that we would be describing slightly different things, and any queries that search by date would need to be able to handle this. For example, if you search for something in an exact year, you would have to check if the year you're searching for falls within the range of years specified for some particular property. To make things easy for people, we could say that if the data is only for a single year, just put it in the "start date"ChrisDavis 22:45, 23 March 2011 (CET)
  • so can it somehow be made configurable / expandable / collapsable to make it convenient for people to input datasets?
    • I think it's possible. As an example, see the show/hide section on the Kalundborg page here. We can probably add these to the forms as well. ChrisDavis 22:45, 23 March 2011 (CET)
  • or, in the future, put a sparql query to some external database to populate such data (and then store it in Enipedia, only running temporarily update checks? GerardDijkema
    • We should look into this. The main issue is that you will run into conflicting data (which is maybe the point of this whole exercise), so we need a way to present this in a way that doesn't confuse people. Part of the value we can add is providing ways for people at least compare these data sets. The workflow for people might include a list of "we think these are the same, but we're not sure". Alfredas has done some work on aligning datasets, and has run into issues where trying to compare names and locations still isn't enough to necessarily say if the data is about the same facility ChrisDavis 22:45, 23 March 2011 (CET)
      • At this point in time, I think we are the ones making the external databases available (for querying over the web), since there still hasn't been that much action in the energy domain on this.
      • These are the ones off the top of my head that we can do something with:
        • EPRTR - available on a sparql endpoint here (converted from MS Access database)
        • EU-ETS - available on a sparql endpoint here (converted from a set of Excel Spreadsheets)
        • US EPA Toxics release inventory - available in RDF, not sure if it's available via a sparql endpoint
        • eGrid - available as an Excel Spreadsheet
      • This particular work can be enabled by Alfredas' work on extending Google Refine.

[edit] where to find the power plants inputted by Alfredas?

You can find the list here: http://enipedia.tudelft.nl/wiki/Category:Powerplant

[edit] Processes behind the scenes

This is a list of the other steps that are involved in improving the data on Enipedia.

  • Aligning powerplants on Enipedia with their Wikipedia articles - This uses the Silk Link Discovery Framework to discover which Wikipedia articles correspond to powerplants on Enipedia. This is really important since it helps to put "relevant information closer together" and will help with future maintenance.
    • Next steps are to use these linkages to help fill in coordinates, fuel types, owners, etc.
    • This is something that still needs human supervision, but it at least makes things a lot easier.
    • I've recently started using R for matching since it's a bit more transparent and customizable than Silk. For example, with Silk, I just get a list of objects that are likely the same. With R, I can generate spreadsheets that allow me to first find the likely candidates for matching, and then visually inspect if the fields in the two different data sets are similar enough. ChrisDavis 12:42, 29 September 2011 (CEST)
  • All powerplants with names contianing "Hydro", "Solar", "Wte", and "Wind" have had fuel types specified for them.
Personal tools
Namespaces

Variants
Actions
Navigation
Portals
Advanced
Toolbox