Additional variables to be added to the Crash Analysis
System (CAS) open dataset
To:
Helen Aki, Senior Manager Research & Analytics
Cc:
Holly Ludlow, Manager Statistical Analysis; Rebecca Schulz, Manager Specialist Analytics;
Suzanne Jones, Data Scientist; Colin Morrison, Senior Advisor Strategic Interventions; Christopher
Liu, Data Scientist; Mel Smalley, Manager CAS Processors; Louise Murrell, CrashHub Project Manager
From: Carolyn Fyfe, Manager Statistical Design
Date: xx March 2018
Subject: Additional variables to be added to the Crash Analysis System (CAS) open dataset
Purpose
The purpose of this memo is to seek your approval to release the following variables in the unit
record open data produced from the Crash Analysis System (CAS).
Background
We aim to make as much information as possible openly available from the Crash Analysis System
(CAS), while ensuring we are maintaining our privacy obligations. We consider that CAS variables can
be put into three groups:
1. Personal identifiers, e.g. Name, address, date of birth, licence plate, driver licence number,
officer ID. These variables absolutely must not be openly released from CAS so do not form
part of our open data approach.
2. Non personal information, e.g. Location, road characteristics, crash conditions (lighting,
weather), crash severity, etc. We have made these variables
openly available for each crash in
the CAS since 1 January 2000.
3. ‘Grey area’ e.g Crash factor, crash cause, exact time of crash, etc. These need careful
consideration before they are made openly available to ensure that we are not allowing an
under the Official Information Act 1982
individual to be identified or identifiable, and that the variables are suitable for open release
in terms of accuracy/reliability.
Since the release of the first variables, we have been responding to requests for additions to the
dataset by adding non-personal variables as appropriate. We have also continued assessing the
‘grey area’ variables, including going over our assessment in an in-depth manner with the Data
Futures Partnership Re-Identification work-stream lead.
The purpose of this memo is to seek approval for the release of the variables set out below in
appendix 1 in the open dataset. Like the current data, this information would be provided for each
Released
individual crash from 1 January 2000. If approved, we will endeavour to make these variables openly
available in time for the Hackathon on 16th March. If this is not achievable, we will make these
available as soon as possible, and notify those attending the Hackathon that this is in the pipeline.
For information on the assessment of the variables currently available, please see this
memo.
1 | P a g e
Risks
Providing the data at unit record level allows people to observe trends in the data (which is one of
the aims/benefits of open data). However, in some instances these trends may be a result of
changes to coding practices by our CAS Processing team. In instances where changes to practices
are known, we will include this information in the metadata. However, there is a risk that errors in
coding, or coding issues (such as the current issue with alcohol suspected), could be observed and
questioned. We do not think this risk should prevent release, as any issues can also be seen by
existing CAS users. Moreover, being open to people questioning the data is a good feedback loop to
help us constantly improve data quality. Before the release of this data the Manager of the CAS
Processors will be advised of what will be released, and this risk and given the ability to advise on
any data that should not yet be released due to coding concerns.
Recommendations
It is recommended that the Manager Research and Analytics:
Approves the release of the variables listed in appendix 1 at the individual level
YES / NO
1
(ie per distinct crash), without suppression or aggregation.
YES / NO
Agrees that the variables listed in appendix 2 are not fit for release at the
2
individual level (ie per distinct crash), and therefore should not be released as
open data.
Notes the advice from the Privacy Officer attached in Appendix 3.
YES / NO
3
Accepts the risk that openly releasing this data may highlight any changes to
YES / NO
4
coding practices/incorrect coding by the CAS Processing Team
under the Official Information Act 1982
Released
2 | P a g e
Appendix 1 – Variables for release, March 2018
Format of Data
Currently the CAS open data is released in a .csv file with one row per crash, without suppression or
aggregation. These additional variables will be added to the
existing dataset where appropriate. In
the event that the current format does not support these additional variables, we will reshape the
data into a form that best supports our customers’ usage of the data. (For example, to provide
information about the characteristics of the people involved).
1982
Privacy Assessment
Given the amount of information contained in this database, when assessing each individual
variable for open release it is also important to consider the picture that is being provided by the
Act
data as a whole.
As crashes are often reported in detail in the media, information about the crash is likely to already
be in the public domain. As such, it is reasonable to assume that someone could find the crash a
particular individual has been involved in, in our dataset. We therefore need to consider whether we
are exposing additional personal information about that person.
Following that assumption, if these variables are added to the dataset the following types of
information could be deduced about an individual from our data (worst case scenarios, not
identified instances):
• The social cost resulting from the person crashing Information
• Factors contributing to the crash (discussed further below)
• The driving movement the person was undertaking when they crashed
• The very broad age band of the person (very likely to have already been known at this level
to identify them in the data)
• Whether or not they were wearing their seatbelt
• If a fatality would have been prevented for someone in their vehicle if they were wearing a
seatbelt
Official
• The severity of their injury (but not what the injury was)
• The severity of the injury [they caused] (but not what the injury was)
• The parties involved in their crash – eg could potentially deduce, they were over the legal
the
limit or on drugs and crashed into a tree; they were over the legal limit or on drugs and hit a
pedestrian aged under 16,resulting in a fatality (here you need to know they were the driver
and fault is assumed); etc
• Their drivers licence status
• Whether they owned or rented the vehicle
To attempt to protect against absolute certainty in identification we have grouped variables that are
under
most likely to be used as a way to link with knowledge obtained elsewhere, such as exact day and
time, and the age of the person.
Factor codes
We assess the variables with the highest risk in release to be the factors contributing to a crash.
However, the inclusion of these would also provide the greatest benefit, as they would enable our
open data customers to understand the causes behind the crash and potentially come up with
preventions for this happening again. To balance this risk and benefit, we are proposing to only
openly release information about factors at the highest level. At that level, these could potentially
Released
allow someone to deduce things such as that a person was:
• over the alcohol limit or on drugs
• had an illness or disability
• doing something intentional or criminal
3 | P a g e
•
driving too fast for the conditions
•
failed to keep left
•
conducted a forbidden movement
•
in the wrong lane
•
did not stop
•
inexperienced, etc
For a full list please see the factor section below.
When considering factors it is vital to note that these are factors PROBABLY contributing to crashes,
as recorded by the Police Officer at the time of the crash. This information has not been
substantiated in a court of law. A worst case example of this could be (hypothetically) that it was
believed a person was on drugs, but they were having a medical event, thereby exposing this person
to gossip or rumour. This will be made as clear as possible in our documentation. In addition, we
will not explicitly link crash factors to an individual. However, in some circumstances, such as single
vehicle crashes, or based on movement codes, it will be reasonably easy to accurately deduce which
party the factor applies to. Note also, a crash may have multiple crash factors.
Summary
While it is possible that additional information about a person could be discovered from this data,
we consider the public good and potential benefit in terms of road safety initiatives that could be
gained from proactively releasing this data outweigh any privacy risk in doing so.
Data for release
NB. Only new variables have been noted below. For information on the variables already available
please see the
open data. Some variables may be renamed when released for ease of use for our
customers (eg. coi_id is not immediately recognisable as ‘suburb’). Descriptions will be updated to
match the variable released.
The groupings below are by CAS table.
Crash
VARIABLE
DESCRIPTION
ASSESSMENT
This is a hierarchy based on
the most vulnerable person
in the crash. For example, a
crash in which a pedestrian
was a fatality would be a
‘pedestrian crash’. Release
A derived flag summarizing
of at fault should be
casualties. Values: 'E' (pedestrian),
under the Official Information Act 1982
considered with this
'S'(cyclist), 'O'(other non-driver),
variable. There is limited
'M' (motor cyclist), 'H' (heavy
privacy risk in releasing this
casualty_hier
vehicle occupants, 'D' (car/van
hierarchy alone. Releasing
drivers), 'P' (car/van pass), 'T'
this variable with at-fault
(other motorised) The first in this
would make it easier to link
order that applies is attached to
an individual to a fatality
the crash.
they caused, in the event
that individual was able to
be identified in the data.
Released
Note, this is most likely to
occur through media reports
of fatal crashes.
coi_id
Suburb name
Exact crash location is
already openly available.
4 | P a g e
This provides flexibility in
the way users can
understand location.
While year, month, location
and crash descriptions may
The month (1-12) in which a crash allow a crash to be
crash_month
occurred, if known. Derived from
identified, exact day and
crash_date.
time will not be provided as
a confounding factor.
1982
While year, month, location
A field derived from crash_time. It and crash descriptions may
is the time of day in 4hr blocks.
allow a crash to be
crash_time_4hr
Eg the value 1 is times 000-0359;
identified, exact day and
Act
the value 2 is times 0400-0759
time will not be provided as
etc. Value 0 is unknown time.
a confounding factor.
This is another way of
presenting the data about
the vehicles involved in the
Another flag (see also multi_veh)
crash that is already
derived from the number of
available in the open data.
vehicles which are given roles in a
crash. Values 'S' (Single vehicle;
multi_veh = 'S'), 'M' (multi-vehicle; While with the inclusion of
multi_veh_simple
multi_veh = 'M', 'C', 'O' or 'E')), '_'
crash cause it would be
Information
(Non-vehicle; multi_veh = 'F', 'G',
possible to identify single
'P' or '_'). 'Vehicle' means non-
vehicle crashes in which
parked vehicle. All these may also alcohol was a cause,
involve parked vehicles.
confounding factors (eg of
day and time) limit the
ability to identify the person
involved with certainty.
Official
A unique id for a movement code
The risk of consequential
group. Values 'O' (overtaking), 'S'
elements associated with
(Straight road), 'B' (bend), 'R' (rear
this variable are considered
the
end or obstruction), 'I'
to be negligible. Nothing
mvmt_id_grp
(intersection), 'P' (pedestrian), 'M'
indicating party at fault will
(miscellaneous), '' (unknown).
be released, further
These are the Road Safety Report
mitigating any risk in
movement code groups.
releasing this variable.
As per above. Note however
under
terminology around
‘principal vehicle’ would
The first movement code for the
need to be considered to
principal vehicle involved in the
ensure attribution of fault is
crash. Possible values range from
not being identified. Suggest
mvmt_ida
'A' to 'Q'. See Vehicle Movement
using ‘key vehicle’ to align
Coding Sheet (version 1.0 April
with the messaging in the
1997) for details.
guide to coded crash reports
which clearly states that ‘the
vehicle role does not in any
Released
way indicate driver fault.’
The second movement code for a
mvmt_idb
crash. There is no direct lookup
As per above.
for this as its meaning depends
5 | P a g e
on the value of mvmt_ida as well.
The descriptions are therefore
found in lu_mvmt_id_ab.
No privacy concerns here as
A flag indicating whether police
just notes if Police attended
pol_att
attended the crash. Possible
the crash or if this was
values 'Y', 'N'
reported to them.
A derived flag: whether the crash
1982
was on a weekend. Values 'W'
(weekend - between Fri 1800 and
Providing this derived flag
weekend
Mon 0559 for ordinary weekends, rather than exact day as a
Act
or 1600 on the first day to 0559
confounding factor.
on the final day for holiday
weekends); ' ' (not weekend)
Used to hold cal_rcs.road_class. A
single character indication of the
general class of road. Values 'Y'
Not personal information,
z2
motorway, 'S' (open road SH), 'O'
no privacy concerns.
(other open road), 'M' (major
urban road), 'I' (minor urban road)
Information
Used to hold a code for the
season of the year, derived from
crash_month. Possible values: 'S'
Not personal information,
z3
(Summer; months 12,1,2), 'A'
no privacy concerns.
(Autumn; months 3-5), 'W'
(Winter; months 6-8), 'P' (Spring;
months 9-11)
Official
Details of objects struck are
already provided in the open
A Y/N flag indicating whether any data. This summary
z5_obj_struck
objects were struck during this
the
indicator would be of benefit
crash
to users. There are no
privacy concerns.
Person
under
VARIABLE
DESCRIPTION
ASSESSMENT
This variable is being
[new variable being derived for
derived for the purposes of
open data] Age group the person
the open data to provide the
Age group
belongs to. Age groups will be: 0-
benefit of age information,
15 years; 16 – 24 yrs; 25 – 54
while minimising the chance
years; 55 – 64 years; 65+
of re-identification.
Released
This is the 'party type' as used in
There is a lot of benefit in
the crosstab reports. Values 'C'
being able to tell the role of
ct_party_type
(car/van/taxi), 'T' (truck) 'B' (bus),
the person. Ensuring that
'M' (motorcycle), 'P' (moped), 'S'
any identifying features (eg
(cycle/cyclist), 'E' (ped.), 'O'
age) are confounding
6 | P a g e
(other), 'U' (unknown), 'Q'
minimises privacy concerns.
(equestrian), 'K' (skateboarder), '4'
(Van/Ute/4WD). Derived from
pers_type,veh_type
'Fatality preventable by wearing
No privacy concerns, but
seatbelt'. Values 'Y' (yes,
need to confirm if we should
preventable), 'N' (no, not
fat_prev
release this as it is Police
preventable), 'U' (unknown), ' '
opinion, not from coroner
(not fatal crash, or not applicable
etc
- pedestrian etc.)
There is a lot of benefit in 1982
being able to tell the role of
If the person was injured, this
the person and the level of
field has a code for the severity of injury they sustained.
inj_sev
injury. Values 'F' (fatal) 'M' (minor) Ensuring that any identifying
Act
'S' (serious) 'N' (not injured) '' (not
features (eg age) are
known).
confounding minimises
privacy concerns.
Each party to a crash (vehicle,
cycle, ped ) is given an 'LTSA role'
to distinguish it from other
We do need a randomly
parties. The main participant is
generated identifier to
assigned role one, secondary
indicate which person we
parties role 2, and so on.
ltsa_role
are referring to.
Ltsa_role is also used to connect
Descriptions of roles need to
Information
people to the vehicles they were
be written so as not to
in. In the crash_cse_code table
assign fault.
where the cause code is
environmental, ltsa_role is set to
0 (zero).
Whether seat belt etc was worn.
This applies to data 1980-87.
Official
Values 'W' (worn), 'K' (unknown),
'A' (not available), 'N' (not worn).
Now applies post-2001 also with
pers_prot
Not personally identifying.
values 'Y' (worn), 'N' (not worn),
the
'O' (not available), 'U' (Unknown)
and ' '. Old pre-87 data may have
its values changed to the new
ones.
Limit to passenger, driver,
under
cyclist; pedestrian, other.
Witness and vehicle owner
The part a person played in the
to be removed as witness
crash. Values: 'P' (passenger), 'D'
information will not be
pers_role
(driver), 'W' (witness), 'O' (vehicle
included in the open data
owner), ' ' (other incl. cyclist, ped, and whether or not the
equest., skateboarder etc)
driver was the owner of the
vehicle will be indicated
elsewhere
Released
Combined with the other
The gender of a person involved
data in the dataset this will
pers_sex
in a crash
increase potential to identify
an individual
pers_type
The type of person that we have
This is the road user type.
7 | P a g e
data on. Values 'D' (driver, not
Need to consider this
including cyclist etc), 'P'
variable along with at-fault,
(passenger), 'K' (skateboarder), 'E' eg. if this and at fault are
(pedestrian), 'S' (cyclist), 'Q'
released, and a person is
(equestrian), 'O' (other), 'H'
identified in the dataset you
(wheeled ped.), 'W' (parked car
could establish they were
owner)
the driver and at-fault. If the
person was identifiable and
movement codes are
released, it could allow you
to identify their actions (eg.
distracted). Note, this is
most likely to occur through
media reports of fatal
crashes. Within the data
demographic variables will
be grouped to prevent
instant recognition. If
releasing this, remove
‘parked car owner’ as we
won’t be providing
information related to
ownership/information not
directly related to
understanding the crash.
Need to confirm, but
A derived flag used for reporting.
understand this is the type
For a person this is the same as
of person that was casualty.
pers_cas_type
the pers_type unless they are a
If this is so, same
'D', 'P' or 'W' in which case it is the considerations as casualty
veh_type they were in.
hierarchy apply.
Vehicle
VARIABLE
DESCRIPTION
ASSESSMENT
Each party to a crash (vehicle,
cycle, ped ) is given an 'LTSA role'
to distinguish it from other
We do need a randomly
parties. The main participant is
generated identifier to
assigned role one, secondary
indicate which person we
parties role 2, and so on.
under the Official Information Act 1982
ltsa_role
are referring to.
Ltsa_role is also used to connect
Descriptions of roles need to
people to the vehicles they were
be written so as not to
in. In the crash_cse_code table
assign fault.
where the cause code is
environmental, ltsa_role is set to
0 (zero).
Not considered to be
identifying even when
The drivers licence status. L
considered against other
Released
(Learner), R (Restricted), F (Full), N
dvr_lic_stus
demographic variables being
(Never licensed), D (Disqualified),
released. However, so as not
O (Overseas), K (unKnown)
to provide information about
a person’s compliance,
Never Licenced, Disqualified,
8 | P a g e
and unknown should be
grouped into an ‘other’
category.
A flag indicating whether the
vehicle was parked or reversing.
Not personal information,
park_revr
Values 'P' (parked), 'R' (reversing), and no information about
'S' (stationary), ' ' (neither or
fault is being included.
unknown).
Was the vehicle towing, and if so,
what? Values 'B' (boat), 'C'
Not personal information.
towing
(caravan), 'T' (trailer), 'S'
1982
Unlikely to be identifying.
(semitrailer), 'A' (A train), 'R' (B
train), 'O' (other), ' ' (unknown)
veh_cc_rating
The engine capacity in cc.
Not personally identifying
Act
The location of the damage to a
vehicle. Values : 'F' (front), 'R'
(RHS), 'L' (LHS), 'B' (back), 'T' (top),
'G' (general), 'C' (concertina), 'U'
veh_dam_locn
Not personally identifying
(unknown), '1' (R front corner), '2'
(R centre), '3' (R rear corner), '4' (L
rear corner),'5' (L centre), '6' (L
front corner)
A code indicating the severity of
damage to a vehicle. Values: 'N'
Information
veh_dam_sev
(nil), 'M' (minor), 'E' (extensive), 'F' Not personally identifying.
(fire), 'O' (overturned), 'W'
(writeoff), 'U' unknown.
A Y/N flag derived from the cause
codes for this crash which
Non-personal information –
veh_fault
indicates that this vehicle was
about the vehicle.
faulty. Used in the cross-tab area.
Official
Essential for understanding
the impact of older vehicles
The year of manufacture of the
(safe vehicles). As
the
veh_manuf_year
vehicle.
make/model is not being
included, this should not be
identifying.
The relationship of the driver to
the vehicle. Values: 'O' (driver is
veh_own
Not personally identifying.
owner), 'N' (vehicle is borrowed
under etc), 'R' (rental), '' (unknown)
The type of the vehicle - C (car,
station wagon), X (taxi), T (truck),
B (bus), M (motorcycle), R (rental),
veh_type
V (van or ute), A (artic. truck), L
Not personally identifying.
(school bus), P (power cycle), U
(unknown), O (other), S (bike), H
(wheeled pedestrian), 4 (4WD)
Released
The number of front seat
Non-personal information,
passengers (note: this data is not
but need to note that the
pass_front
reliable - although it should be
driver may sometimes
front seat passengers the Police
counted.
will sometimes record 1 meaning
9 | P a g e
there was a driver, when they
should have recorded 0)
Passengers in the vehicle who
pass_other
As per above.
were not in the front or rear seat.
The number of rear seat
pass_rear
As per above.
passengers.
This is the speed the person
said they were going. This is
not personal information,
The speed of a vehicle before the
crash_spd
but it should be noted in the
crash.
1982
metadata that this is not
necessarily accurately
measured.
Act
Objects
All information about objects struck is already openly available.
Factors
The following factors will be released at the highest level (ie. factor_grp_id):
All road user factors:
• Roll the following up to factor_grp_id 100 Alcohol or drugs, where these are proven (ie
Information
codes 103, 109, 220 and 221 below only)
Alcohol
101
Alcohol suspected
102
Alcohol test below limit
103
Alcohol test above limit or test refused
Official
105
Impaired non-driver (pedestrian /cyclist/passenger, etc)
100
Other alcohol
the
Drugs
108
Drugs suspected
109
Drugs present
220
Impaired non-driver (pedestrian / cyclist / passenger, etc)
221
Other drugs
under
• Roll the following up to factor_grp_id 380 Misjudged speed
Misjudged speed, distance, size or position of
381
Another vehicle
383
Pedestrian
385
Size or position of fixed object or obstacle
Released
386
Own vehicle
387
Misjudged intentions of another party
380
Other - misjudged speed, distance, size or position
10 | P a g e
• Roll the following up to factor_grp_id 500 Illness and disability
1982
Act
Information
Official
the
under
Released
11 | P a g e
Illness
501
Sudden il ness
504
Medical illness
505
Mental il ness
506
Attempted suicidal
500
Other il ness
Disability
502
Physical impairment
1982
503
Defective vision
507
Impaired ability due to old age
508
Other disability
Act
• Roll the following up to factor_grp_id 520 Driver/Passenger
Driver or passenger boarding, leaving or in vehicle
521
Intentional y leaving / boarding moving vehicle
523
Riding in insecure position
524
Interfered with driver
525
Opened door inadvertently
527
Child playing in parked vehicle
Information
520
Other driver or passenger boarding, leaving or in vehicle
Driver only factors:
• Roll the following up to factor_grp_id 110 Too fast for Conditions
Official
Inappropriate speed
111
Entering/on curve
112
On straight the
113
Approaching a traffic control
115
When passing school bus
116
At temporary speed limit
117
At crash or emergency
118
For road conditions
under
119
For weather conditions
182
Travelling unreasonably slowly
110
Other inappropriate speed conditions
• Roll the following up to factor_grp_id 120 Failed to keep left
Position on road
Released
121
Swung wide on bend
122
Swung wide at intersection
123
Cutting corner on bend
124
Cutting corner at intersection
12 | P a g e
125
Too far right
126
Vehicle crossed flush median
129
Too far left
120
Other positions on road
• Roll the following up to factor_grp_id 130 Lost Control
Lost control
131
Lost control when turning
1982
132
Lost control under braking
133
Lost control under acceleration
134
Lost control while returning to seal from unsealed shoulder
Act
135
Lost control - road conditions
136
Lost control - vehicle fault
137
Lost control avoiding another party
130
Other lost control
• Roll the following up to factor_grp_id 140 Failed to Signal in Time
Appropriate signal ing
141
Failed to signal in time
Information
145
Incorrect signal
140
Other failed to signal
• Roll the following up to factor_grp_id 150 Overtaking
Overtaking
Official
151
Overtaking line of traffic or queue
152
Overtaking in the face of oncoming traffic
the
156
With insufficient visibility
157
Overtaking at an intersection
158
On left without due care
159
Cut in after overtaking
160
Vehicle signalling turn
under
150
Other overtaking
• Roll the following up to factor_grp_id 170 Wrong Lane/Turned From Wrong Position
Wrong lane or turned from wrong position
171
Turned from incorrect lane
173
Travelled straight from turning lane or flush median
174
Turned from incorrect position on road
Released
176
Turned into incorrect lane
177
Weaving or cut in on multi-lane roads
179
Long vehicle tracked outside lane
184
Incorrect merging / diverging
13 | P a g e
170
Other wrong lane or position
• Roll the following up to factor_grp_id: 180 Following too close
Fol owing too close
181
Following too closely
183
Motorist crowded cyclist
180
Other fol owing too close
1982
• Roll the following up to factor_grp_id: 190 Sudden action
Sudden action
Act
191
Suddenly braked
192
Suddenly turned left/right
194
Served to avoid pedestrian
195
Served to avoid animal
196
Served to avoid crash or broken down vehicle
197
Swerved to avoid vehicle
199
Swerved avoiding emergency Vehicle
190
Other sudden action
Information
• Roll the following up to factor_grp_id 200 Forbidden movements
Forbidden movements
201
Wrong way on road / motorway
202
Non-compliance with regulatory device with sign or marking
204
Driving / riding in pedestrian space
Official
208
Motor vehicle in special purpose lane
533
Equestrian not keeping to verge
the
200
Other forbidden movements
• Roll the following up to factor_grp_id 320 Did not Stop
Did not stop
under
321
At stop sign
322
At ful red traffic signal
324
At amber traffic signal
326
At flashing red signals (railway crossing, fire stations, etc)
327
For traffic controller
328
For school patrol / kea crossing
320
Other did not stop
Released
• Roll the following up to factor_grp_id 300 Failed to give way
Failed to give way
14 | P a g e
301
At a priority traffic control
303
When turning to non-turning traffic
304
When priority defined by road markings
306
To a pedestrian
308
When entering roadway from driveway
309
To traffic approaching or crossing from the right
312
Entering roadway not from driveway or intersection
313
Failed to give way to emergency vehicle
314
Driver waved through
1982
315
When turning right to opposing left turning traffic
316
To traffic approaching or crossing from the left
300
Other failed to give way
Act
• Roll the following up to factor_grp_id 350 Attention diverted
Attention diverted by
Inside vehicle
351
Passengers
354
Animal or insect in vehicle
357
Emotional y upset / road rage
358
Food, cigarettes, beverages
359
Cell phone
Information
361
Navigation device
362
Non cel communication device
364
Vehicle console inbuilt features: radio / heater, etc
365
Objects under driver’s pedals
366
Food, cigarettes, beverages
Official
Outside vehicle
352
Scenery or persons outside vehicle
the
353
Other traffic
355
Trying to find intersection, house number, destination, etc
356
Advertising or signs
363
Driver dazzled
350
Other attention diverted by
under
• Roll the following up to factor_grp_id 330 Inattentive: Failed to Notice
Failed to notice
331
Vehicle slowing, stopping or stationary in front
332
Bend in road
333
Indication of vehicle in front
334
Failed to notice control
Released
336
Failed to notice signs
339
Failed to notice road works
340
Failed to notice markings
341
Obstructions on roadway
15 | P a g e
534
Another party wearing dark clothing
330
Other inattentive
• Roll the following up to factor_grp_id 370 Did Not See or Look for Another Party Until Too
Late
Did not see or look for other parties until too late
371
Did not check / notice another party behind
375
Did not check / notice another party
1982
377
When visibility obstructed by other traffic
370
Other did not see or look
Act
• Roll the following up to factor_grp_id: 400 Inexperience
Lack of experience
401
In driving in fast complex or heavy traffic
402
New driver / under instruction
403
Driving unfamiliar with vehicle/towing
404
Overseas / migrant driver fails to adjust to NZ road rules and road conditions
407
Driver over-reacted
400
Other lack of experience
Information
• Roll the following up to factor_grp_id: 410 Fatigue (Drowsy)
Fatigue (drowsy, tired or fel asleep)
411
Long trip
412
Lack of sleep
Official
414
Long day (working / recreation)
415
Exceeded driving hours
the
410
Other fatigue
• Roll the following up to factor_grp_id: 420 Incorrect use of vehicle controls
Vehicle control mistakes
under
421
Started in gear / stal ed
423
Wrong pedal / foot slipped
426
Lights not switched on
428
Parking brake not fully applied
429
Trailer coupling or safety chain not secured
420
Other vehicle controls
Released
• Roll the following up to factor_grp_id: 440 Parked or Stopped
Parking
441
Parked vehicle is not visible
16 | P a g e
443
Incorrectly parked vehicle
447
Not clear of rail crossing
440
Other parking
• Roll the following up to factor_grp_id: 510 Intentional or Criminal
Intentional action
Showing off
431
Racing
1982
432
Playing 'chicken'
433
Wheel spins / wheelies / doughnuts / drifting etc
434
Intimidating driving
Act
Intentional
430
Other intentional actions
Intentional or criminal
511
Homicide/suicide (successful)
512
Intentional collision
514
Evading enforcement
515
Object thrown (at / by / from)
Information
518
Over the speed limit
Pedestrian factors:
• Roll the following up to factor_grp_id: 700 Walking along Road
Official
Walking along road
701
Not keeping to footpath
702
Not keeping to side of road
the
703
Not facing oncoming traffic
704
Not on outside of blind curve
705
Wheeled pedestrian behaviour
700
Other pedestrian walking along the road
under
• Roll the following up to factor_grp_id: 710 Crossing Road
Crossing road
710
Other pedestrian crossing the road
711
Walking heedless of traffic
712
Stepping out from behind vehicles
713
Running heedless of traffic
Released
714
Failed to use pedestrian crossing when one within 20 metres
715
Waiting on carriageway / confused by traffic
717
Stepping suddenly onto crossing
718
Not complying with traffic signals or school patrols
17 | P a g e
719
Misjudged speed and/or distance of vehicle
740
Looking the wrong way
• Roll the following up to factor_grp_id: 720 Miscellaneous
Miscellaneous
721
Pushing, working on or unloading vehicle
722
Playing / unnecessarily on road
723
Working on road
1982
725
Vision obscured by umbrella or hood
726
Child escaped from supervision
727
Unsupervised child
Act
729
Pedestrian from or to school bus
730
Pedestrian behind reversing / manoeuvring vehicle
731
Overseas pedestrian
732
Pedestrian attention diverted by cigarette, cell phone, music player
733
Pedestrian from or to scheduled service
720
Other pedestrian
Vehicle factors:
Information
• Roll the following up to factor_grp_id: 600 Lights and Reflectors at Fault or Dirty
Running lights
601
Dazzling headlights
602
Headlights inadequate / no headlights or failed suddenly
604
Brake lights or indicators faulty or not fitted
Official
605
Tail lights inadequate or no tail lights
606
Reflectors inadequate or no reflectors
607
Lights or reflectors obscured
the
608
Confusing / dazzling lights
609
Lights or reflectors at fault or dirty
600
Other lights or reflectors
under
• Roll the following up to factor_grp_id: 610 brakes
Brakes
611
Parking brake failed / defective
613
Service brake failed
614
Service brake defective
615
Jack-knifed – uneven braking
610
Other brakes
Released
• Roll the following up to factor_grp_id: 620 Steering
Steering
18 | P a g e
621
Defective
622
Failed suddenly
620
Other steering
• Roll the following up to factor_grp_id: 630 Tyres
Tyres
631
Puncture or blowout
632
Worn tread on tyre
1982
633
Incorrect tyre type
634
Mixed types (tread) / space savers
630
Other tyres
Act
• Roll the following up to factor_grp_id: 640 Windscreen or Mirror
Windscreens, mirrors, visors
641
Shattered windscreen
642
Vehicle windows / helmet visors / goggles / glasses / misted / dirty / windscreen
wipers
643
Rear vision mirror
640
Other windscreen / mirror
Information
• Roll the following up to factor_grp_id: 650 Mechanical
Mechanical
651
Engine failure
652
Transmission failure /broken axle
Official
653
Accelerator or throttle jammed
650
Other mechanical
the
• Roll the following up to factor_grp_id: 660 Body or Chassis
Chassis / running gear
661
Body, chassis or frame (cycle / motorcycle) failure
under
662
Suspension failure
665
Inadequate tow coupling
666
Inadequate or no safety chain
668
Wheel off
660
Other chassis / gear
Body / doors
667
Door / bonnet catch failed, defective or not shut
Released
670
Inconspicuous colour
671
Blind spot
664
Other body / doors
19 | P a g e
• Roll the following up to factor_grp_id: 680 Load
Load
681
Load interferes with driver
682
Not wel secured
683
Over-hanging
684
Load obscured vision
686
Over-dimensional vehicle or load
687
Load too heavy
688
Towed vehicle or trailer too heavy or incompatible
1982
680
Other load
Act
• Roll the following up to factor_grp_id: 690 Miscellaneous Vehicle
Miscellaneous vehicle
691
Emergency vehicle
692
Vehicle caught fire
693
Being towed
690
Other vehicle
Road factors:
Information
• Roll the following up to factor_grp_id: 800 Slippery
Slippery road surface
804
Loose material on seal
807
Painted markings
808
Recently graded Official
809
Surface bleeding / defective
813
Deep loose metal
the
828
Steel / iron covers and joints
800
Other slippery road condition
Wet surfaces
801
Rain
under
802
Frost or ice
803
Snow or hail
805
Mud / effluent
806
Oil / fuel
• Roll the following up to factor_grp_id: 810 Surface
Released
Surface condition
811
Potholed
812
Uneven
814
High crown
20 | P a g e
815
Curve not wel banked
816
Edge badly defined or gave way
817
Under construction or maintenance
818
Unusual y narrow
810
Other surface condition
• Roll the following up to factor_grp_id: 820 Obstructed
Obstructions and objects
1982
821
Fallen tree or branch
822
Slip or subsidence
823
Flood waters, large puddles and fords
Act
824
Road works not adequately lit/sign posted
826
Roadside object fel on vehicle
827
Object flicked by other vehicle
820
Other road obstructed
• Roll the following up to factor_grp_id: 830 Visibility Limited
Visibility limited
Visibility limited by road feature
Information
831
Curve
832
Crest
837
Bank
849
Traffic signs
Visibility limited by other feature
Official
833
Building
834
Trees
835
Hedge or fence
the
836
Scrub, long grass or foliage
838
Temporary obstruction, dust or smoke
839
Parked vehicle
829
Signs / bil boards / hoardings
under
830
Other road feature limit visibility
• Roll the following up to factor_grp_id: 840 Signs and Signals
Signs and signals
841
Damaged, removed or malfunction
842
Badly located
843
Ineffective / inadequate / obscured
Released
844
Necessary
845
Signals off
840
Other signs or signals
21 | P a g e
• Roll the following up to factor_grp_id: 850 Markings
Markings / Islands / Barriers
851
Faded
852
Difficult to see due to weather or geometry
853
Markings necessary
872
Traffic island(s) ineffective, badly located or designed
856
Barriers necessary
857
Island necessary
850
Other markings / islands / barriers
1982
• Roll the following up to factor_grp_id: 860 Street Lighting
Act
Street lighting
861
Failed
862
Inadequate for road and pedestrian crossing
860
Other street lighting factors
Environmental factors:
• Roll the following up to factor_grp_id: 900 Weather Information
Weather
901
Heavy rain
902
Dazzling sun
903
Strong wind
904
Fog or mist
905
Snow, sleet or hail
Official
900
Other weather
the
• Roll the following up to factor_grp_id: 910 Animals
Animals
912
Household pets rushed out or playing
913
Farm animals straying
under
914
Farm animals attended, but inadequate warning or unexpected
915
Farm animals attended, but out of control
910
Wild Animal
911
Other animal factors
• Roll the following up to factor_grp_id: 999 Unknown
NO IDENTIFIABLE FACTORS
Released
999
Unknown
22 | P a g e
Appendix 2 – Data assessed as NOT suitable for open release at the level of individual
crash
Crash
VARIABLE
DESCRIPTION
ASSESSMENT
Unique identifiers are not to be
openly released. NB. a random
number may be assigned to each
crash if we need to reshape the data
crash_id
The unique identifier for a crash
into multiple tables. This will be
randomly generated and will not
enable any linkages with the source
system.
A code for the overall cause of a crash.
Multiple factors may contribute to a
Values 'A' (alcohol), 'S' (speed), 'B'
crash. These will be provided as per
(alcohol & speed), 'N' (neither alcohol
crash_cause
the factors assessment above. As
nor speed). Relates to cause codes
such, one ‘overall’ cause will not be
100,101,103-109 for alcohol, and 110-
provided.
119 for speed.
Exact date will be excluded as a
confounding factor to prevent re-
identification by matching with
The date on which a crash occurred.
external sources (eg media). An
The date or the least significant parts of indicator of whether the crash was
crash_date
the date may be unknown. Format is
on a weekday/weekend will be
YYYYMMDD
included. Year, financial year, and
whether the crash occurred over a
holiday period are already available
in the open data.
As per above. The weekend
The day-of-week on which the crash
indicator will be included however.
occurred. Possible values range from '1'
crash_dow
This will allow the identification of
(Sunday) to '7' (Saturday). ' ' is invalid or whether this crash was on a
unknown.
weekday or in the weekend.
Another crash severity flag, derived
While there is no reason not to
from crash_sev. Values 'F' (fatal, same
release this variable, it contains the
crash_fin
as sev. 'F'), 'I' (injury, same as sev. 'S'
same information as crash_sev,
and 'M'), 'N' (non-injury, same as sev. 'N' which is already openly available.
or '')
under the Official Information Act 1982
Another crash severity flag, derived
from crash_sev. Values 'I' (injury, same
crash_in
As above.
as sev. 'F', 'S', 'M'), 'N' (non-injury, same
as sev 'N' or '')
The only non-derived severity flag.
Comes off the police form or INCIS and
To be used internally for validating
crash_pol_sev
is entered. Used for cross-checking with the derived flags. No value add in
the derived flags. Has the same values
release.
Released as crash_fin.
The road ID of the road segment on
As we already provide a variety of
crash_road_id
which the crash is located. This may not location variables, including
be the road ID of the road whose name
eastings and northings, there is no
is in crash_locn1 or even crash_locn2 -
benefit in including this variable. In
23 | P a g e
eg if the crash is in the middle of a
addition, this variable requires
roundabout segment it will be the ID of
expert knowledge to interpret.
the highest priority road which includes
that segment - perhaps a SH.
Not being included to prevent re-
The time-of-day of the crash. 24hr clock identification with external sources
crash_time
hhmm. The exact time may be
(eg media). A derived variable
unknown.
providing the crash time in a 4 hour
block will be released however.
Releasing this variable alongside
crash_time_4hr would be reducing
1982
the time window to a 2hr period,
A field derived from crash_time. It is the which would greatly increase the
time of day in 6hr blocks. Eg value 1 is
crash_time_6hr
potential for re-identification. The
times 0000-0559; value 2is times 0600-
benefit in providing this slightly
Act
1159 etc. Value 0 is unknown time.
different time window is not
considered to outway the
reidentification/privacy risk in this.
There is no privacy concern in
releasing this variable, but there are
The name of the locality where the
quality considerations in terms of it
locality_name
crash occurred. This is 'free' data not
being a free-text field. Given the
limited by lookup.
other location information being
released, there is no additional
benefit in releasing this variable.
Information
The unique identifier of an NZTA
region. Values 'A' (Auckland), 'H'
(Waikato/BoP), 'J' (Central), 'S'
Not relevant for general users
(Southern). TLAs do not map exactly
outside of the Agency. No additional
onto a single NZTA region in every
nzta_region_id
benefit above the other
case, but TLAS 1-7 are in 'A', 11-27 are
geographical variables already being
in 'H', 28-53 are in 'J', 54-75 are in 'S'.
released.
Official
This atttribute is blank (' ') where it is
used in lu_tla and the boundaries do
not match. the
Exact date of crash not provided so
no benefit in providing this. If we
were to include it we would need to
off_att_date
The date the police attended the crash.
liaise with Police in terms of the
information about them this is
providing.
under
The time when the police attended the
off_att_time
As per above.
crash.
Not relevant for general users. No
additional benefit above the other
pol_area_id
The unique id of a police area.
geographical variables already being
released.
The unique identifier of a police
district. Current values CJ Auckland, DA Not relevant for general users. No
Released Counties/Manukau,DC Bay of Plenty, DE additional benefit above the other
pol_dstr_id
Eastern, DH Wellington, DJ Tasman, DK
geographical variables already being
Canterbury, DM Southern, DG Central,
released.
CX Nth Shore/Waitakere, DB Waikato,
BG Northland.
24 | P a g e
Used to indicate whether the crash falls
in the area patrolled by the Police
Auckland Motorway Patrol. It is used to
split crashes out from the underlying
Not relevant for general users. No
Police Districts for detailed reporting to additional benefit above the other
z4_ak_mway
the Police. It can only be calculated
geographical variables already being
after the crash is geocoded. Values: 'M'
released.
(Inside the Ak motorway patrol area), 'N'
(Outside the Ak Mway patrol area), null
(Yet to be calculated).
Used for a Y/N flag to indicate that for
No privacy concerns, but Colin
this crash, the crash_locn2 field
Morrison advised no benefit in
z1
contains the name of a geocoding
releasing this, as it is an internal
feature, and not the usual sideroad
systems checking variable.
name.
The overall quality of the data for this
crash. Has the same value as the worst
No privacy concerns, but Colin
qual_stus
individual error for this crash. Values 'F' Morrison advised not to release this.
(fatal - unusable) 'S' (serious) 'M' (minor)
This variable is used to filter the
A Y/N flag indicating whether a crash
data for release (we will only publish
was police reported or not (if Y, also
police reported crashes). This
pol_rptd
indicates that the person who entered
variable therefore does not need to
the crash was an LTSA user)
be included in the output as they
will all be ‘yes’.
The social cost of this crash calculated
This is calculated as an average, and
by the Transit algorithm. It is
is therefore not appropriate to be
crash_cost
dependent on TLA, urban (based on
provided at the level of individual
speed limit) and crash_sev.
crash.
Person
VARIABLE
DESCRIPTION
ASSESSMENT
Unique identifiers are not to be
openly released. NB. a random
number may be assigned to each
crash if we need to reshape the
crash_id
The unique identifier for a crash
data into multiple tables. This will
be randomly generated and will not
under the Official Information Act 1982
enable any linkages with the source
system.
Unique identifiers are not to be
openly released. NB. a random
The unique id of a person in a crash. Note number may be assigned to each
this is unique for the crash, not globally.
person if we need to reshape the data
In the crash_cse_code table where the
into multiple tables. This will be
pers_id
cause code is either a vehicle or
randomly generated and will not
environmental code, the pers_id is set to 0 enable any linkages with the source
Released (zero).
system. Only person level information
cleared for release as per this
document will be made available.
age_5yr
A field derived from 'pers_age'. It is the
Information about age is considered
25 | P a g e
age in 5yr blocks: the value 20 includes
highly valuable, but is also potentially
ages 20-24, the value 25 is ages 25-29 etc sensitive (for example, if there was a
crash where alcohol is considered to
be a cause and a young child was
killed; and also in terms of assisting
in identifying people involved). As
such, 5 year age groups are
considered too sensitive for release.
However, we will derive a variable
with broader age groups as per
above.
1982
pers_adr1 -
Address lines
Personal information.
pers_adr3
Personal information – a derived
pers_age
The age of a person involved in a crash.
variable ‘age group’ will be provided.
Act
Potentially identifying information as
area units are relatively small, and
when considered with the other
information contained in the dataset
it would be possible to further limit
The area unit in which a person involved
pers_au
the potential population of interest.
in a crash lives.
(Area units normally contain 3,000 –
5,000 population though this can
vary due to such things as industrial
areas, port areas, rural areas and so
Information
on).
The date of birth of a person involved in a
pers_dob
crash. This will generally be known only
Personal information.
for drivers. YYYYMMDD.
A flag indicating whether a person was
Official
hospitalized. Applies to data 1980-87.
Open data is from 1 Jan 2000. This
pers_hosp
Values 'Y' (yes, hospitalized), 'N' (you can
variable is not available for that time
guess the meaning of this one), 'U'
period.
(unknown). May also come out of INCIS.
the
The name of the hospital to which this
Potentially personally
pers_hospital
person was sent.
identifying/sensitive.
pers_initials
The initials of a person involved in a crash Personally identifying.
Personally identifying – Meshblocks
under
are the smallest geographic unit, and
The meshblock in which a person involved vary in size from part of a city block
pers_mb
in a crash lives.
to large areas of urban land.
Releasing this could be very close to
releasing the persons address.
Does not apply to time period of the
The person's seating position in the
open data. Subject to quality, we will
vehicle. Values 'F' (front), 'R' (rear), 'O'
provide information on the number of
pers_seat_posn (other), ' ' (not known). Applies to data
front seat and rear seat passengers,
Released 1980-87.
and how many passengers were not
in the front or rear seat.
The surname of a person involved in a
pers_surname
Personal information.
crash
26 | P a g e
Vehicle
VARIABLE
DESCRIPTION
ASSESSMENT
Unique identifiers are not to
be openly released. NB. a
random number may be 1982
assigned to each crash if
we need to reshape the
crash_id
The unique identifier for a crash
data into multiple tables.
This will be randomly
Act
generated and will not
enable any linkages with
the source system.
A flag indicating the status of the
blood test if one was requested.
Values '' (not requested), 'R'
Administrative and sensitive
alc_blood_ref
(refused - a positive result), 'T'
information.
(taken - if the actual result is
known it will be in alc_blood_test.
If a blood alcohol test was taken,
alc_blood_test
Sensitive information.
Information
this is the result if we know it.
The status of the evidential breath
test if one was requested. Values
alc_evid_ref
'' (not requested), 'R' (refused - a
Sensitive information.
positive result), 'T' (taken but
result maybe unknown)
Evidential breath test result if
Official
alc_evid_test
Sensitive information.
known.
If a screening test for alcohol was
used, was it positive or negative?
the
alc_scrn_test
Sensitive information.
Values: '+' (positive), '-' (negative),
' ' (not tested)
Whether alcohol involvement was
suspected. Values: 'S' (suspected),
alc_susp
Sensitive and supposition.
'N' (not susp.), 'U' (unknown), 'R'
under (breath level 250-400)
A code indicating driver culpabilty Sensitive when considered
for this vehicle. Values '1' (1 veh,
with other information
dvr at fault) '2' (1 veh, no dvr
dvr_culp
(particularly demographics)
fault), '3' (>1 veh, dvr prime fault), being released, as it assigns
'4' (>1 veh, dvr part fault), '5' (>1
fault to an individual.
veh, no dvr fault).
Released
Potentially personally
The country of origin of a driver
identifying when considered
dvr_frgn_cntry
using an overseas licence.
with other information being
released. Whether the driver
was using an overseas
27 | P a g e
licence is provided in the
dvr_lic_stus variable.
The occupation of the driver of
dvr_occ
Personally identifying.
this vehicle.
Unique identifiers are not to
be openly released. NB. a
random number may be
The unique id of a person in a
assigned to each person if
crash. Note this is unique for the
we need to reshape the data
crash, not globally. In the
into multiple tables. This will
pers_id
crash_cse_code table where the
be randomly generated and
1982
cause code is either a vehicle or
will not enable any linkages
environmental code, the pers_id
with the source system. Only
is set to 0 (zero).
person level information
cleared for release as per
Act
this document will be made
available.
Combined with the other
information, could be
potentially identifying,
especially in the case of rare
Information
vehicles (eg. make/model
The make and model of the
veh_make_model
for a rare vehicle, combined
vehicle.
with location of damage to
the vehicle could identify the
vehicle in the crash, and
therefore imply the owners
involvement).
Official
A flag to indicate whether this
Attributes potential fault to
vehicle's speed contributed to the an individual.
veh_spd_contrib
crash. Values are 'Y' for yes, 'N'
Should not openly release
the
(guess!) and ' ' for unknown.
information that is a ‘guess’!
If a person was able to be
A flag indicating whether the
identified, this would
veh_wof_current
vehicle WOF or COF was current.
provide information about
their compliance behaviour.
under The expiry date of the WOF or
veh_wof_expy_date
As per above.
COF of the vehicle.
Causes
VARIABLE
DESCRIPTION
ASSESSMENT
Unique identifiers are not to
Released
be openly released. NB. a
crash_id
The unique identifier for a crash
random number may be
assigned to each crash if
we need to reshape the
data into multiple tables.
28 | P a g e
This will be randomly
generated and will not
enable any linkages with
the source system.
As CAS data relates to
Each party to a crash (vehicle,
factors PROBABLY
cycle, ped ) is given an 'LTSA role' contributing to crashes, we
to distinguish it from other
do not want to provide any
parties. The main participant is
information that implies a
assigned role one,secondary
particular individual was at
parties role 2, and so on.
ltsa_role
fault, when there is even the
Ltsa_role is also used to connect
1982
smallest possibility that
people to the vehicles they were
individual could be
in. In the crash_cse_code table
identified in the data. Any
where the cause code is
information on factors or
environmental, ltsa_role is set to
Act
causes will be assigned at
0 (zero).
the crash level.
The unique id of a person in a
crash. Note this is unique for the
crash, not globally. In the
pers_id
crash_cse_code table where the
As per above.
cause code is either a vehicle or
environmental code, the pers_id
is set to 0 (zero).
Information
Factors
• The following factors relating to reason for death/injury are not to be released
Reason for death/injury
531
Casualty drowned
Official
532
Casualty thrown from vehicle
535
Electrocution
536
Unsecured child seat
the
537
Child restrained failure/inappropriate
672
Seatbelt failed / defective
673
Air bag failed / defective
530
Other reasons for death/injury
under
Released
29 | P a g e
Document Outline