Miscoding of half-hourly generation data for the Ngawha geothermal power station(s) produces confusion.
Nicky McLean made this Official Information request to Electricity Authority
Response to this request is long overdue. By law Electricity Authority should have responded by now (details and exceptions). The requester can complain to the Ombudsman.
From: Nicky McLean
Dear Electricity Authority,
Via your website www.emi.ea.govt.nz, files containing half-hourly data are supplied containing multiple data series that are somewhat identified via sequences of code names. The most recent data were made available last Monday, and, file 202105_Embedded_generation.csv has a data series for the Ngawha geothermal power station, identified via the sequence KOE1101,TOPE,NGAG,GEN1,I.
This power station is long established, and in September 2008 its two generators of 4·65 MW were supplemented by a third generator of 15 MW for a maximum capacity of 24-25 M : the supplied data evidently are for their nett output. But suddenly, on the first of January 2021, the output leapt to 45-50 MW.
It so happens that a second power station had just been completed, not far away from Ngawha but separated by clear unbroken scrub and so not a part of the first power station. Its capacity is 31·5 MW, and evidently, the data series identified by KOE1101,TOPE,NGAG,GEN1,I now represents the total generation from both power stations, the "Ngawha collective", perhaps.
But suddenly, starting with the first of May, the data for this series drops back to ~24MW, and, a new (never-before-seen) data series appears, identified via KOE1101,TOPE,NGAG,GEN2,I and its values are around 31 MW; if these values are added to those from KOE1101,TOPE,NGAG,GEN1,I the result is obviously a continuation of the earlier 45-50MW sequence. (Alas, the services of this web-page interface do not allow me to include a graph, no matter how pretty it is.)
But suddenly, on Thursday the 15'th of May, data for this new series (KOE1101,TOPE,NGAG,GEN2,I) drop to zero. Perhaps there was some mishap at the new power station, even though it had been running continuously all year. Oh dear.
The first point is that data series should be consistently identified. If, on the commissioning of a related power station, a series is interpreted to be the summation of all generation at the site (as at Huntly, say), then, it should remain as such. If on the other hand there are to be two separate data series for two separate power stations, then, they should start and remain as such. Specifically, the ~GEN1 series should never have included the generation from the second power station.
The second point is that given the merit of separate data series for separate power stations, the new data series (~GEN2) should have started when the new power station began delivering power, back in January, not in May. And the first data series should have been left alone. With separate series, should one power station falter (but not the other) there would be a chance of identifying which it was.
The third point is that since evidently the second power station started operations months ago, where are the data series (named ~GEN2) for those earlier months? Actually, as it is surely unlikely that a new power station begins operation on New Year's Day, where are the data for its generation in late December? There may have been test runs, but megawatts of power don't just vanish, they go somewhere.
I'd suggest a name scheme whereby ~GEN1 stands for the first power station at Ngawha, and ~GEN2 stands for the second nearby. If a data series is to be provided for the summation, it should be called ~GEN, with neither a 1 nor a 2. But arranging something of the sort would require forethought and organisation. Oh dear.
To resolve this matter, two consistent data series are required:
KOE1101,TOPE,NGAG,GEN1,I containing only data for the original power station at Ngawha, as was the case up to the end of December 2020, and became again the case in May 2021. This would mean providing revised data files with different data.
KOE1101,TOPE,NGAG,GEN2,I containing only data for the new power station near to the existing power station at Ngawha, starting some time in December 2020 and not just in May 2021. This would mean providing revised data files with additional data.
And as well, the corresponding data series for the consumption of power from the network at these two separate sites, though most values are zero.
Yours faithfully,
Nicky McLean.
From: Ministerial information
Electricity Authority
Dear Mr McLean
Thank you for your request received 20 June 2021, under the Official Information Act 1982 regarding data available via the EMI website.
We will respond to your request as soon as possible and no later than 16 July 2021, being 20 working days after the day your request was received. If we are unable to respond to your request by then, we will notify you of an extension of that timeframe.
If you have any queries or additional factors come to light which are relevant to your request, please feel free to contact me.
Kind regards,
Caroline Sides
Senior Ministerial Advisor
Electricity Authority - Te Mana Hiko
Level 7, Harbour Tower, 2 Hunter Street
PO Box 10041, Wellington 6143, New Zealand
www.ea.govt.nz
show quoted sections
From: OIA
Electricity Authority
Dear Nicky
Thank you for your Official Information Act 1982 request received on 20
June 2021 via the FYI website for clarification of data published on the
Authority’s EMI website.
Please find attached the Authority’s response to your request.
If you have any questions about our response, or additional queries,
please don’t hesitate to contact us.
Kind regards,
[1]cid:image002.jpg@01D3BC56.08E60CE0 Caroline Sides
Senior Ministerial Advisor
M: +64 27 243 3623
DDI: +64 4 471 8552
Electricity Authority - Te Mana Hiko
Level 7, Harbour Tower, 2 Hunter Street
PO Box 10041, Wellington 6143, New Zealand
[2]www.ea.govt.nz
[3]cid:image002.png@01CFF902.D96E77A0 [4]linkedin-icon
[5]YT-small
"The information contained in this transmission is confidential. It is
intended for the named addressee only. If you are not the named addressee
you may not copy, distribute or take any action in reliance upon this
transmission."
References
Visible links
2. http://www.ea.govt.nz/
3. https://twitter.com/ElectricityAuth
4. https://www.linkedin.com/company/1451242...
5. https://www.youtube.com/user/ElectricityNZ
From: Nicky McLean
Dear Ms. Gillies,
Thank you for your response of the 16'th July, and apologies for my delay – I was hoping that the FYI Administrator would explain via [email address] how I might include images in a response via their website, since a picture is worth a thousand words. But no reply has come, so words will have to do. I imagine that amongst your colleagues there would be someone who could produce such graphs, should there be an interest.
You observe that the “participants may at times change how they classify data (in order to minimise transmission and network charges).”. Is this simply a general remark, or, a definite statement that this is what has happened for the specific data series, as your organisation has determined? In the four weeks that it took for your response to be formulated, there would have been plenty of time to have investigated the facts: that the Electricity authority did receive combined data series (so that the combination was not due to some processing within the Electricity authority's systems), the data series were supplied as combined by the N.Z. Stock Exchange, that they in turn were supplied with combined data from whatever other intermediary may be participating in the flow of data, and so on, all the way back to the data originator, which I imagine to be Top Energy, the “participant” owning the Ngawha power stations, which also contain metering devices, whence cometh the data that we all cherish.
To be specific, there are now two separate power stations at Ngawha, which I shall call Ngawha A and Ngawha B, just as with Ohau A, B, and C. Up until December last year, data for what I now call Ngawha A were supplied via code KOE1101,TOPE,NGAG,GEN1,I, along with a corresponding series coded KOE1101,TOPE,NGAG,GEN1,X for supply to the power station, as when its own generation was insufficient.
Suddenly, starting with January, the half-hourly values supplied for these codes greatly increase, until May when they resume previous levels, and data for two new codes appear, using the term GEN2, and these correspond to the new power station at Ngawha, Ngawha B, which however began operating in January or maybe late December.
It is clear that data supplied with the codes KOE1101,TOPE,NGAG,GEN1,I and X corresponded to that for Ngawha A until the end of December, whereupon it corresponded to data for Ngawha A plus Ngawha B combined until the end of April, when it reverted to being data for Ngawha A alone.
I now invite you to view your organisation's web page https://www.emi.ea.govt.nz/ wherein you can read the first offering of text: “A fundamental requirement of competitive and efficient electricity markets is access to reliable data and performance metrics.”
Taking these words at face value, I believe that supplying data that are reliable means supplying data that are complete and accurate, not misleading or deceptive, nor likely to mislead or deceive. And that it is obvious that a data series that means one thing up to December, another for January to April, and yet another for May onwards is going to mislead anyone who looks at the code names and does not see them change, thus believing that they refer to the same thing – when in fact they do not. An initial interpretation when faced with the changed style of data for January was that code KOE1101,TOPE,NGAG,GEN1 was being passed off as a “Ngawha Total” (the Ngawha A station contains more than one actual generator), but, this interpretation is wrecked by the change in May when that code was still the same but the data were no longer for the Ngawha total, just Ngawha A.
Now it may be that you can find staff or other persons who will affirm that they find this variation satisfactory, and even that you can produce documentation from somewhere describing the interpretation of the data series that mentions that data for Ngawha are of varying types, so “reader beware”. If so, then the statement on your web page should be amended, somewhat along the lines of the “as is, where is” of used car sales and the like.
But actually, the Electricity authority has the responsibility of publishing data that are reliable, and that remains so even if the data were supplied in some unsatisfactory form. Since the N.Z. Stock Exchange affirms that it supplies “High quality market information” and has strong rules against being supplied with misleading or inadequate information, one wonders where to point the finger: surely their data are not misleading?
A curious detail is that the data file for May, 202105_Embedded_generation.csv, contained no data for KOE1101,TOPE,NGAG,GEN1,X (though it does have data for KOE1101,TOPE,NGAG,GEN2,X) and this is so even though data for KOE1101,TOPE,NGAG,GEN1,I does appear. I doubt that this can be brushed aside as some putative manoeuvre by Top Energy in order to minimise their charges, nor is it likely that they did so in order to mislead lest they face sanctions – unless you have evidence that they actually did. Considering that Top Energy has been in discussions with the Electricity authority over its allowable generation capacity, I would imagine that correct information all round was expected.
Instead I think it likely that somewhere along the way the juggling of data series was fumbled. Perhaps the initial rule had been something like “add everything called Ngawha” so that when a second station KOE1101,TOPE,NGAG,GEN2 appeared, the result became their summation. Later on, this was noticed and for May's data onwards the rule became to emit data for the two stations separately, coded GEN1 and GEN2. This sort of thing has happened before.
For the Electricity authority to live up to its claim of supplying reliable data, what is needed in this matter is the provision of four data series corresponding to GEN1 and 2, I and X, being separately for Ngawha A and B not any combination, for January to April, and possibly for December as well.
Or else admit that “A fundamental requirement” is not being met. Surely not!
Yours sincerely,
Nicky McLean
From: OIA
Electricity Authority
Dear Mr McLean
Thank you for your request received on 28 July 2021, under the Official Information Act 1982, for clarification of data published on the Authority’s EMI website.
Please find attached the Authority's response.
If you have any questions regarding our response, please don't hesitate to contact us.
Kind regards,
Caroline Sides
Senior Ministerial Advisor
DDI: +64 4 471 8552
Electricity Authority - Te Mana Hiko
Level 7, Harbour Tower, 2 Hunter Street
PO Box 10041, Wellington 6143, New Zealand
www.ea.govt.nz
show quoted sections
From: Nicky McLean
Dear Ms. Gillies,
Thank you for your response dated the 20'th of August. Amusingly enough, the day after I sent my earlier and delayed message, a response from FYI.org arrived. But waiting another day wouldn't have helped, because the response was to say that the software they use (Alaveteli) does not support the likes of attachments such as images, as of graphs. And nothing can be done by them, because the software comes from an outside source and they have no software development capacity. So they can't restrain the mangling of the layout either, making this more difficult to read. Oh well.
In your quote from (presumably) some person at Top Energy, I see the words “After being notified by the Authority that data was being submitted incorrectly, ...”
I think it reasonable to deduce from this that someone in your organisation had noticed the problem before I made my original query, and so it follows that your organisation has knowingly published incorrect data – though that knowing might be split between different people at different times.
Well, knowing is good, but, knowing is not enough; we must apply.
Further into the quote appears “... and the data corrected through the standard washup process.”
What that might mean exactly is not entirely obvious, but presumably is unlike money laundering, an activity one would normally keep hidden. It appears that correct (clean?) data have been supplied.
So then, what is to be done? (Влади́мир Ильи́ч Ле́нин, 1902.)
Surely it is apparent that the incorrect data are to be replaced by the corrected data in the computer files published via your web site, not forgetting to supply the omitted data mentioned earlier. This was and remains the purpose of my query. While your organisation may well also be using software whose functioning you cannot alter, adjusting the data it presents should be within your reach. But if not (Daniel 3:18), how can you live up to your proclamation of “access to reliable data”?
Yet you persist in retaining the incorrect data. How is this “for the long-term benefit of consumers”?
Yours sincerely,
Nicky McLean
From: OIA
Electricity Authority
Dear Mr McLean
This email acknowledges and thanks you for your latest correspondence.
I have passed your email to the Authority's data analytics team for their reference and action if necessary.
Ngā mihi
Sarah
Sarah Gillies
GM Legal, Monitoring and Compliance
Mobile: +64 27 306 5250
Electricity Authority - Te Mana Hiko
Level 7, Harbour Tower, 2 Hunter Street
PO Box 10041
Wellington 6143
New Zealand
www.ea.govt.nz
show quoted sections
From: OIA
Electricity Authority
Dear Mr McLean,
Please find attached a letter in response to your complaint to the Office
of the Ombudsman on 21 September 2021.
Kind regards,
Tessa Balinger
Ministerial Advisor
DDI: +64 [DDI redacted]
Electricity Authority - Te Mana Hiko
[1]www.ea.govt.nz
[2][IMG] [3]linkedin-icon [4]YT-small
"The information contained in this transmission is confidential. It is
intended for the named addressee only. If you are not the named addressee
you may not copy, distribute or take any action in reliance upon this
transmission."
References
Visible links
1. http://www.ea.govt.nz/
2. https://twitter.com/ElectricityAuth
3. https://www.linkedin.com/company/1451242...
4. https://www.youtube.com/user/ElectricityNZ
From: Nicky McLean
Dear Ms. Balinger,
This time I see your name spelt with one "l", so now I'm in doubt. Anyway, thank you for the communication from Dr,. Bishop, to whom I respond as follows:
Dear Dr. Bishop,
Thank you for your response of the tenth of November., in which you mentioned a large collection of data files. This brings forth the idea “You don’t like these data files? I have others.” (apologies to Groucho Marx, though a dialect form is to be found in 18/10/1873 issue of the NZ Tablet, it seems) Anyway, rather than ignore them, some attention was directed to their content, and this has required a while to complete a worthwhile investigation.
Taking file ReconciledInjectionAndOfftake_202101_20210826_113517.csv.gz for example. Its size is 34·7MB and when decompressed, the resulting file is ReconciledInjectionAndOfftake_202101_20210826_113517.csv which is of size 303·299MB. If this file is recompressed using the (widely available and at no charge) 7-zip procedure, the compressed size is 30·84MB, which is about 12% smaller. The 7-zip protocol offers superior compression to that of WinZip as well, which is why Brian Kirtlan selected 7-zip for the publishing of compressed files.
The first few records of the file read
PointOfConnection,Network,Island,Participant,TradingDate,TradingPeriod,TradingPeriodStartTime,FlowDirection,KilowattHours
HAM0111,WAIK,NI,MERI,2021-01-27,21,10:00,Injection,3.0
HEP0331,UNET,NI,MERI,2021-01-27,31,15:00,Offtake,5294.0
and these appear in what looks like a random order. If the data file’s contents are put into sorted order, 7-zip does better, compressing the file to 17·6MB which is about half the size of the original. No doubt gzip would also produce a more compressed file given orderly content. With a little planning, this saving could be achieved for free, unless there is something magical about the disordered data that should be preserved.
There are also the rather verbose codes “Injection” and “Offtake”, which unless they mean something special could more compactly be presented as “I” and “X”, a long-established practice. Similarly, this style of one time slot per line need not mention the times as well as the half-hour number, as also is established practice. A simple filleting of the data file produces something like
%Fillets from file ReconciledInjectionAndOfftake_202101_20210826_113517.csv via column selection 1-4 8 5-6 9
PointOfConnection,Network,Island,Participant,FlowDirection,TradingDate,TradingPeriod,KilowattHours
HAM0111,WAIK,NI,MERI,I,2021-01-27,21,3.0
HEP0331,UNET,NI,MERI,X,2021-01-27,31,5294.0
And this file is just 240MB in size instead of 303MB.
But the major waste in this style is the excessive repetition of the same information. Only one value is delivered per line, a payload of six or so characters, and to assemble data for a day’s worth of half-hourly values there must be 48 (or 46 or 50) records, each of which specify the same name (23 characters, if the X and I style is used, or about 30 if the “Offtake” and “Injection” style is used), plus 13 for the date and half-hour number. Or, 52 characters added to the payload of 6 or seven. On the other hand, the half-hour number is sometimes a one-digit number instead of two: there’s a saving to cherish. But, this variable length changes the alignment of columns, and so causes a straightforward sort process to order the half-hour numbers as 1, 10, 11, … 19, 2, 20, 21, … which is not so helpful.
Anyway, the month’s data can be read by Gnash and the resulting workfile is 22MB in size, smaller than the 35MB size of the compressed source data file. Compressing the workfile via 7-zip reduces it to 4MB. Thus, a month’s data of this sort could be presented for distribution in just 4MB blobs, instead of 35MB blobs, still less the 303MB of the uncompressed source data file. A factor of seventy-five...
Well, if one file is good, many files are even better. Some messing about leads to the download of a mere 3·79GB of data files, which when uncompressed produce 32·1GB for a factor of about ten. When assimilated by Gnash, the resulting workfile is 1·46GB, again much smaller than the original compressed collection, and which in turn can be compressed by 7-zip to 0·5GB, a seventh of the size of the compressed collection and a factor of sixty-four smaller than the uncompressed data.
Gnawing through the mass of input data took some forty-five minutes, with many complaints over incoming data being out-of-order. That is, if the first encounter of data for some name is for a given date D, subsequent data for that name should not be for dates previous to D. But no matter, it is for computers to slog forward, muttering if so inclined, and eventually, the data for 1/1/2005 to 31/10/2021 were assimilated.
Those data could then be “heaved” forth in the same style as is used for the “metered data” collection, which is to say, each record is for one name and one date, with all the values for that day strung along the line. This took about ten minutes and produced a file of 2·993GB, already smaller than the size of the compressed source files so better than a factor of ten smaller than the uncompressed source files holding the same data. And, when compressed via 7-zip, it was reduced to 0·596GB.
Then, when read by Gnash, just three minutes were required to read the lot – a factor of fifteen. The original source data files appear to present their data in a random order – certainly not ordered either by date or by name, and so for every incoming datum searches must be done to assign it to its proper location. The “heaved” data produced by Gnash are ordered by name and then by date, and this single statement applies to every value for that day, strung out along the line. Thus, not only does the name and date information not have to be re-determined for every datum, the data arrive in a regular sequence. Much less searching is required.
A modern database system could store incoming data in the order as received without any such searching and collating, but this comes at the cost of delays in subsequent access. To obtain the successive values of a day for some analysis would mean that each successive value would have to be individually searched for in the database, and this would have to happen every time data are processed. Perhaps this is why the supplied data files present values in an apparently random order; it is accession order.
Considering the huge differences in file size, no sensible Data Manager would choose the verbose one-value-per-line format, unless there is some wonderful gain somewhere else to offset the bloat. In the case of the Final, Reserve (and recently, Forecast as well) Price data, one value per name, date, and time is suitable because of the associated timestamps. This means a mixed-type data record and so greater complexity as well as a greater payload per line. Having instead separate records of the different types (perhaps in different files) would be an open door to alignment errors between the timestamp and price data sequences. Which is to be avoided.
But these data series have no timestamps, and so the forty-eight values per line style offers a great saving. The only difficulty is presented by the accursed daylight saving changeover days when there must be either forty-six or fifty values to a line. This might be a problem for analysts, which could be assuaged by them only requesting half-hours one to forty-eight for analysis. They must already be prepared to handle missing data, so the two absent values of the forty-eight would cause no special trouble, while the values associated with the maddening half-hours forty-nine and fifty would never be called for and so it would be as if there was no difficulty. If one ignores omitted data.
Alas, this would mean that on the changeover days, the times associated with the values would be misaligned with the times associated with values from other days, and so the analysis would be incorrect in that too. Perhaps to no great effect, or not one that the analyst cares to make the effort to worry about, but this sort of slackness should not be enabled by a proper Data Administrator.
So, I’m puzzled. And if you must use this format, outputting the data in an orderly fashion will enable the compression process to do a better job. Presumably, this would be simple with a modern database system.Same data, better order.
Having read all the provided data into a workfile, some remarks can be made. Certain data series draw attention, as follows. I trust that you can cope with Injection and Offtake being changed to I and X.
ASB0331.EASH.SI.SUPE.X 24/ 4/2019
BOB1101.COUP.NI.AWFL.X 18/ 3/2020
CPK0111.UNET.NI.PSNZ.X 6/ 5/2008
EDG0331.HEDL.NI.CNIR.I 12/10/2021
EDG0331.HEDL.NI.EDGE.X 15/ 7/2021
HTI0331.LINE.NI.TUAR.X 26/ 6/2008
KAW0111.HEDL.NI.OPHL.X 31/ 1/2018
MOT0111.TASM.SI.CSNL.X 20/11/2014
MOT0111.TASM.SI.SELS.X 20/11/2014
PAP0661.ORON.SI.SIMP.X 23/ 3/2011
STU0111.ALPE.SI.PRME.X 8/ 2/2021
TNG0111.WNST.NI.MRPL.X 14/ 1/2007
TNG0111.WNST.NI.SWCH.X 18/12/2020
ATU1101.WPOW.SI.TRUS.I 22/ 8/2007
INV0331.ELIN.SI.SIMP.X 1/ 5/2008
MNG1101.VECT.NI.CLUB.X 5/12/2018
MNG1101.VECT.NI.KING.X 31/10/2016
MHO0331.ELEC.NI.GENH.I 25/ 9/2011
MHO0331.ELEC.NI.SIMP.I 25/ 9/2011
WTK0111.MERI.SI.MERI.X 19/10/2017
These all offer one value only, on the days specified. Singular! Then there are hundreds of series which only have one value, 1, also singular. A few also have only values of 2, 3, 4, 6, 38 or 44. How this can be a sensible situation is beyond me, but I’m not managing these data. Many many other values are missing, either for parts of a day or for many whole days.
Anyway, the main point is that there are 8,418 different data series, many involving unusual name parts and their combinations. They look like they might represent the power flows attributable to individual retailers at the various substation locations. This is odd, as I recall a statement that the Electricity authority would not be publishing data down to the retailer level, or some such jargon. So that guess must be wrong, or, my recollection is in need of revision by new assertions.
However, Nghwha has a relatively unusual name, and its code NGAG can be spotted. There are just two entries amongst the 8,418:
KOE1101.TOPE.NI.NGAG.I 1/ 7/2020-31/10/2021
KOE1101.TOPE.NI.NGAG.X 20/11/2020-24/10/2021
The first is clearly for generation at the Ngawha geothermal power station, and is obviously for the generation of the Ngawha A and Ngawha B power stations combined..There are some thirty-three names of the form KOE~.I (four for 33KV) covering assorted date spans and it is possible that information on the generation at Ngawha could be collated for a wider date span using these data series, with some effort.
However, none of the series offer data for generation (or consumption) at Ngawha A and B separately. Supplying this information was and remains the objective of my original request for official information, and it seems to me that this is a request for official information to be supplied. The response from Ms. Gillies, “I have passed your email to the Authority's data analytics team for their reference and action if necessary.” is indeed a response, but it does not involve supplying official information, which in this case could be effected by the correcting of the data file your website already offers access to, rather than not correcting it – as it remains today.
Mentioning instead yet other data file collections reminds me of the short story “The Library of Babel” by Jorge Luis Borges, which describes a Library containing every possible book The only difficulty lies in finding the actual book containing what you seek, correctly... Indeed, there is a fascinating website https://libraryofbabel.info/search.html dedicated to finding you example books.
Thus, nominating another file collection not containing the separate data series is not a positive action towards supplying a data file that does contain the separate data series.
Yours sincerely,
Nicky McLean
From: OIA
Electricity Authority
Kia ora Mr McLean,
Thank you for your email, I have passed it on to Dr Bishop.
My name does indeed have two L's. My apologies for the confusion - the singular 'L' was a typo in my signature which few people noticed until recently.
Kind regards,
Tessa Ballinger
Ministerial Advisor
DDI: +64 [DDI redacted]
Electricity Authority - Te Mana Hiko
www.ea.govt.nz
show quoted sections
Things to do with this request
- Add an annotation (to help the requester or others)
- Download a zip file of all correspondence