Measurements and Classification
ROCCA
Measurements and Classification
Classification technique
ROCCA classifies whistles, clicks and encounters using Random Forest classifiers. ROCCA currently uses a Random Forest classifier model based on the open-source statistical software package Weka. For more information on Random Forests and the WEKA package, the user is encouraged to refer to the book Data Mining: Practical Machine Learning Tools and Techniques
Whistle Contour Measurement and Classification
Table 1 lists the 50 variables measured from each whistle contour.
Variable Code
Variable Name
Units
Explanation
FREQBEGSWEEP
Beginning Sweep
Categorical variable
Slope of the beginning sweep (1=positive -1=negative, 0=zero)
FREQBEGUP
Positive beginning sweep
Binary variable
1=beginning slope is positive, 0=beginning slope is negative
FREQBEGDWN
Negative beginning sweep
Binary variable
1=beginning slope is negative, 0=beginning slope is positive
FREQENDSWEEP
Ending sweep
Categorical variable
Slope of the beginning sweep (1=positive -1=negative, 0=zero)
FREQENDUP
Positive ending sweep
Binary variable
1=ending slope is positive, 0=ending slope is negative
FREQENDDWN
Negative ending sweep
Binary variable
1=ending slope is negative, 0=ending slope is positive
FREQBEG
Beginning frequency
Hz
Beginning frequency
FREQEND
Ending frequency
Hz
Ending frequency
FREQMIN
Minimum frequency
Hz
Minimu frequency
DURATION
Duration
Seconds
Duration of the whistle
FREQRANGE
Frequency range
Hz
Maximum frequency - minimum frequency
FREQMAX
Maximum frequency
Hz
Maximum frequency
FREQMEAN
Mean frequency
Hz
Mean frequency
FREQMEDIAN
Median frequency
Hz
Median frequency
FREQSTDDEV
Standard deviation of the frequency
Hz
Standard deviation of the frequency
FREQSPREAD
Frequency spread
Hz
Difference between the 75th and the 25th percentile of the frequency
FREQQUARTER1
First quarter frequency
Hz
Frequency at one-quarter of the duration
FREQQUARTER2
Half frequency
Hz
Frequency at one-half of the duration
FREQQUARTER3
Third quarter frequency
Hz
Frequency at three-quarters of the duration
FREQCENTER
Center frequency
Hz
(minimum frequency + (maximum frequency - minimum frequency)) / 2
FREQRELBW
Relative bandwidth
Hz
(maximum frequency - minimum frequency)/center frequency
FREQMAXMINRATIO
Maximum-minimum ratio
None
Maximum frequency / minimum frequency
FREQBEGENDRATIO
Beginning-ending ratio
None
Beginning frequency / end frequency
FREQCOFM
Coefficient of frequency modulation
None
Take 20 frequency measurements equally spaced in time, then subtract each frequency value from the one before it. COFM is the sum of the absolute values of these differences, all divided by 10,000
FREQNUMSTEPS
Number of steps
None
10 percent or greater increase or decrease in frequency over two contour points
NUMINFLECTIONS
Number of inflection points
None
Changes from positive to negative or negative to positive slope
INFLMAXDELTA
Maximum delta
Seconds
Maximum time between inflection points
INFLMINDELTA
Minimum delta
Seconds
Minimum time between inflection points
INFLMAXMINDELTA
Maximum-minimum delta ratio
None
Maximum delta / minimum delta
INFLMEANDELTA
Mean delta
Seconds
Mean time between inflection points
INFLSTDDEVDELTA
Standard deviation delta
Seconds
Standard deviation of the time between inflection points
INFLMEDIANDELTA
Median delta
Seconds
Median of the time between inflection points
FREQSLOPEMEAN
Mean slope
Hz/second
Overall mean slope
FREQPOSSLOPEMEAN
Positive slope
Hz/second
Mean positive slope
FREQNEGSLOPEMEAN
Negative slope
Hz/second
Mean negative slope
FREQABSSLOPEMEAN
Absolute slope
Hz/second
Mean absolute value of the slope
FRQESLOPERATIO
Positive-negative slope ratio
None
Mean positive slope / mean negative slope
FREQSWEEPUPPERCENT
Percent positive
None
Percent of the whistle that has a positive slope
FREQSWEEPDWNPERCENT
Percent negative
None
Percent of the whistle that has a negative slope
FREQSWEEPFLATPERCENT
Percent flat
None
Percent of the whistle that has a zero slope
NUMSWEEPSUPDWN
Positive-negative slope
None
Number of inflection points that change from positive slope to negative slope
NUMSWEEPSDWNUP
Negative-positive slope
None
Number of inflection points that change from negative slope to positive slope
NUMSWEEPSUPFLAT
Positive-flat slope
None
Number of times the slope changes from positive to zero
NUMSWEEPSDWNFLAT
Negative-flat slope
None
Number of times the slope changes from negative to zero
NUMSWEEPSFLATDWN
Flat-negative slope
None
Number of times the slope changes from zero to negative
NUMSWEEPSFLATUP
Flat-positive slope
None
Number of times the slope changes from zero to positive
FREQSTEPUP
Steps up
None
Number of steps that have increasing frequency
FREQSTEPDOWN
Steps down
None
Number of steps that have decreasing frequency
INFLDUR
Inflection points / duration
None
Number of inflection points / duration
STEPDUR
Steps/duration
None
Number of steps / duration
To classify a whistle, the vector of variables measured from that whistle is analysed with the random forest model, which contains hundreds of classification trees. Each tree in the forest classifies the whistle and final classification is based on the species that the greatest percentage of trees voted for. If the greatest percentage of tree votes falls below the whistle threshold (as specified in the ROCCA Parameters window) , the whistle is classified as Ambiguous.
Click Classification
Table 2 lists the 17 variables measured from each click.
Variable Code
Variable Name
Units
Explanation
DURATION
Duration
Seconds
Duration of the click
FREQPEAK
Peak frequency
Hz
frequency with the highest amplitude
BW3DBLOW
-3dB bandwidth lower limit
Hz
First frequency lower than the peak frequency at which the amplitude has dropped by 3dB
BW3DBHIGH
-3dB bandwidth upper limit
Hz
First frequency higher than the peak frequency at which the amplitude has dropped by 3dB
BW3DB
-3dB bandwidth
Hz
BW3DBHIGH - BW3DBLOW
BW10DBLOW
-10dB bandwidth lower limit
Hz
First frequency lower than the peak frequency at which the amplitude has dropped by 10dB
BW10DBHIGH
-10dB bandwidth upper limit
Hz
First frequency higher than the peak frequency at which the amplitude has dropped by 10dB
BW10DB
-10dB bandwidth
Hz
BW10DBHIGH - BW10DBLOW
RMSSIGNAL
Signal RMS
dB
Root-mean-square of the click amplitude
RMSNOISE
Noise RMS
dB
Root-mean-square of the noise amplitude
SNR
Signal-to-noise ratio
dB
RMSSIGNAL - RMSNOISE
NCROSSINGS
Number of zero crossings
None
Number of times the waveform crosses zero
SWEEPRATE
Sweep rate
kHz/ms
sweep rate of the zero crossings
MEANTIMEZC*
Zero crossing mean time
ms
mean time between zero crossings
MEDIANTIMEZC*
Zero crossing median time
ms
median time between zero crossings
VARIANCETIMEZC
Zero crossing variance
ms2
variance of the time between zero crossings
ICI
Inter-click Interval
seconds
Time from the end of one click to the start of the next click
*Mean and median zero crossing times are not used in the current classifier, but still calculated by the Rocca algorithms. Rocca will ignore these variables during classification.
To classify a click, the vector of variables measured from that click is analysed with the random forest model, which contains hundreds of classification trees. Each tree in the forest classifies the click and final classification is based on the species that the greatest percentage of trees voted for. If the greatest percentage of tree votes falls below the click threshold (as specified in the ROCCA Parameters window) , the click is classified as Ambiguous.
School Classification
Table 3 lists the 17 variables calculated based on whistle and click detections for each encounter (if specified by the user in the ROCCA parameters window):
Variable Code
Variable Name
Units
Explanation
Encounter_Duration_s
Encounter duration
Seconds
Duration from the start of the first whistle/click to the end of the last whistle/click
Number_of_whistles
Number of whistles
None
Number of whistles
Whistle_Duration_s
Whistle duration
Seconds
Duration from the start of the first whistle to the end of the last whistle
Min_Time_Between_Whistle_Detections_s
Minimum time between whistles
Seconds
Minimum time between whistles
Max_Time_Between_Whistle_Detections_s
Maximum time between whistles
Seconds
Maximum time between whistles
Ave_Time_Between_Whistle_Detections_s
Average time between whistles
Seconds
Average time between whistles
Whistle_Detections_per_Second
Whistles per second
Counts/s
The number of whistles / whistle duration
Whistle_Density
Whistle density
None
Sum of the whistle durations / the encounter duration
Ave_Whistle_Overlap
Average whistle overlap
None
Total duration during which whistles overlap / encounter duration
Number_of_Clicks
Number of clicks
None
Number of clicks
Click_Duration_s
Click duration
Seconds
Duration from start of first click to end of last click
Min_Time_Between_Click_Detections
Minimum time between clicks
Seconds
Minimum time between clicks
Max_Time_Between_Click_Detections
Maximum time between clicks
Seconds
Maximum time between clicks
Ave_Time_Between_Click_Detections
Average time between clicks
Seconds
Average time between clicks
Click_Detections_per_Second
Clicks per second
Counts/s
Sum of the click durations / encounter duration
Ave_Click_Overlap
Average click overlap
None
Total duration during which clicks overlap / encounter duration
Lat*
Latitude
Deg
Latitude
Long*
Longitude
Deg
Longitude
*Latitude and Longitude are not measured from the whistle and click data, but taken from the GPS source as specified in the Rocca Parameters Window Source tab.
Each encounter number holds a list of possible species based on the whistle/click classifier models used. There are two values stored for each species: the number of times a whistle/click has been classified to that species (displayed), and a cumulative total of all the percentage tree votes for the species (not displayed). When a new whistle/click classification is saved to an encounter number, the count of the classified species is increased by one and the percentage tree votes for each species are added to the corresponding cumulative totals.
The encounter is classified in one of two ways:
If an encounter classifier has been loaded, the vector of encounter parameters and the random forest probabilities from the whistle and click classifiers are analysed with the encounter random forest and the encounter is classified as the species with the highest percentage of tree votes.
2. If no encounter classifier has been selected by the user, the encounter is classified as the species with the highest cumulative percentage of tree votes. Note that this may be different than the species most often classified - the value shown in the sidebar species list. If the highest cumulative percentage of tree votes falls below the school threshold (as specified in the ROCCA Parameters window), the detection is classified as Ambiguous.