Measurements and Classification
Classification technique
ROCCA classifies whistles, clicks and encounters using Random Forest classifiers. ROCCA currently uses a Random Forest classifier model based on the open-source statistical software package Weka. For more information on Random Forests and the WEKA package, the user is encouraged to refer to the book Data Mining: Practical Machine Learning Tools and Techniques
Whistle Contour Measurement and Classification
Table 1 lists the 50 variables measured from each whistle contour.
Variable Code | Variable Name | Units | Explanation |
---|---|---|---|
FREQBEGSWEEP | Beginning Sweep | Categorical variable | Slope of the beginning sweep (1=positive -1=negative, 0=zero) |
FREQBEGUP | Positive beginning sweep | Binary variable | 1=beginning slope is positive, 0=beginning slope is negative |
FREQBEGDWN | Negative beginning sweep | Binary variable | 1=beginning slope is negative, 0=beginning slope is positive |
FREQENDSWEEP | Ending sweep | Categorical variable | Slope of the beginning sweep (1=positive -1=negative, 0=zero) |
FREQENDUP | Positive ending sweep | Binary variable | 1=ending slope is positive, 0=ending slope is negative |
FREQENDDWN | Negative ending sweep | Binary variable | 1=ending slope is negative, 0=ending slope is positive |
FREQBEG | Beginning frequency | Hz | Beginning frequency |
FREQEND | Ending frequency | Hz | Ending frequency |
FREQMIN | Minimum frequency | Hz | Minimu frequency |
DURATION | Duration | Seconds | Duration of the whistle |
FREQRANGE | Frequency range | Hz | Maximum frequency - minimum frequency |
FREQMAX | Maximum frequency | Hz | Maximum frequency |
FREQMEAN | Mean frequency | Hz | Mean frequency |
FREQMEDIAN | Median frequency | Hz | Median frequency |
FREQSTDDEV | Standard deviation of the frequency | Hz | Standard deviation of the frequency |
FREQSPREAD | Frequency spread | Hz | Difference between the 75th and the 25th percentile of the frequency |
FREQQUARTER1 | First quarter frequency | Hz | Frequency at one-quarter of the duration |
FREQQUARTER2 | Half frequency | Hz | Frequency at one-half of the duration |
FREQQUARTER3 | Third quarter frequency | Hz | Frequency at three-quarters of the duration |
FREQCENTER | Center frequency | Hz | (minimum frequency + (maximum frequency - minimum frequency)) / 2 |
FREQRELBW | Relative bandwidth | Hz | (maximum frequency - minimum frequency)/center frequency |
FREQMAXMINRATIO | Maximum-minimum ratio | None | Maximum frequency / minimum frequency |
FREQBEGENDRATIO | Beginning-ending ratio | None | Beginning frequency / end frequency |
FREQCOFM | Coefficient of frequency modulation | None | Take 20 frequency measurements equally spaced in time, then subtract each frequency value from the one before it. COFM is the sum of the absolute values of these differences, all divided by 10,000 |
FREQNUMSTEPS | Number of steps | None | 10 percent or greater increase or decrease in frequency over two contour points |
NUMINFLECTIONS | Number of inflection points | None | Changes from positive to negative or negative to positive slope |
INFLMAXDELTA | Maximum delta | Seconds | Maximum time between inflection points |
INFLMINDELTA | Minimum delta | Seconds | Minimum time between inflection points |
INFLMAXMINDELTA | Maximum-minimum delta ratio | None | Maximum delta / minimum delta |
INFLMEANDELTA | Mean delta | Seconds | Mean time between inflection points |
INFLSTDDEVDELTA | Standard deviation delta | Seconds | Standard deviation of the time between inflection points |
INFLMEDIANDELTA | Median delta | Seconds | Median of the time between inflection points |
FREQSLOPEMEAN | Mean slope | Hz/second | Overall mean slope |
FREQPOSSLOPEMEAN | Positive slope | Hz/second | Mean positive slope |
FREQNEGSLOPEMEAN | Negative slope | Hz/second | Mean negative slope |
FREQABSSLOPEMEAN | Absolute slope | Hz/second | Mean absolute value of the slope |
FRQESLOPERATIO | Positive-negative slope ratio | None | Mean positive slope / mean negative slope |
FREQSWEEPUPPERCENT | Percent positive | None | Percent of the whistle that has a positive slope |
FREQSWEEPDWNPERCENT | Percent negative | None | Percent of the whistle that has a negative slope |
FREQSWEEPFLATPERCENT | Percent flat | None | Percent of the whistle that has a zero slope |
NUMSWEEPSUPDWN | Positive-negative slope | None | Number of inflection points that change from positive slope to negative slope |
NUMSWEEPSDWNUP | Negative-positive slope | None | Number of inflection points that change from negative slope to positive slope |
NUMSWEEPSUPFLAT | Positive-flat slope | None | Number of times the slope changes from positive to zero |
NUMSWEEPSDWNFLAT | Negative-flat slope | None | Number of times the slope changes from negative to zero |
NUMSWEEPSFLATDWN | Flat-negative slope | None | Number of times the slope changes from zero to negative |
NUMSWEEPSFLATUP | Flat-positive slope | None | Number of times the slope changes from zero to positive |
FREQSTEPUP | Steps up | None | Number of steps that have increasing frequency |
FREQSTEPDOWN | Steps down | None | Number of steps that have decreasing frequency |
INFLDUR | Inflection points / duration | None | Number of inflection points / duration |
STEPDUR | Steps/duration | None | Number of steps / duration |
To classify a whistle, the vector of variables measured from that whistle is analysed with the random forest model, which contains hundreds of classification trees. Each tree in the forest classifies the whistle and final classification is based on the species that the greatest percentage of trees voted for. If the greatest percentage of tree votes falls below the whistle threshold (as specified in the ROCCA Parameters window) , the whistle is classified as Ambiguous.
Click Classification
Table 2 lists the 17 variables measured from each click.
Variable Code | Variable Name | Units | Explanation |
---|---|---|---|
DURATION | Duration | Seconds | Duration of the click |
FREQPEAK | Peak frequency | Hz | frequency with the highest amplitude |
BW3DBLOW | -3dB bandwidth lower limit | Hz | First frequency lower than the peak frequency at which the amplitude has dropped by 3dB |
BW3DBHIGH | -3dB bandwidth upper limit | Hz | First frequency higher than the peak frequency at which the amplitude has dropped by 3dB |
BW3DB | -3dB bandwidth | Hz | BW3DBHIGH - BW3DBLOW |
BW10DBLOW | -10dB bandwidth lower limit | Hz | First frequency lower than the peak frequency at which the amplitude has dropped by 10dB |
BW10DBHIGH | -10dB bandwidth upper limit | Hz | First frequency higher than the peak frequency at which the amplitude has dropped by 10dB |
BW10DB | -10dB bandwidth | Hz | BW10DBHIGH - BW10DBLOW |
RMSSIGNAL | Signal RMS | dB | Root-mean-square of the click amplitude |
RMSNOISE | Noise RMS | dB | Root-mean-square of the noise amplitude |
SNR | Signal-to-noise ratio | dB | RMSSIGNAL - RMSNOISE |
NCROSSINGS | Number of zero crossings | None | Number of times the waveform crosses zero |
SWEEPRATE | Sweep rate | kHz/ms | sweep rate of the zero crossings |
MEANTIMEZC* | Zero crossing mean time | ms | mean time between zero crossings |
MEDIANTIMEZC* | Zero crossing median time | ms | median time between zero crossings |
VARIANCETIMEZC | Zero crossing variance | ms2 | variance of the time between zero crossings |
ICI | Inter-click Interval | seconds | Time from the end of one click to the start of the next click |
*Mean and median zero crossing times are not used in the current classifier, but still calculated by the Rocca algorithms. Rocca will ignore these variables during classification.
To classify a click, the vector of variables measured from that click is analysed with the random forest model, which contains hundreds of classification trees. Each tree in the forest classifies the click and final classification is based on the species that the greatest percentage of trees voted for. If the greatest percentage of tree votes falls below the click threshold (as specified in the ROCCA Parameters window) , the click is classified as Ambiguous.
School Classification
Table 3 lists the 17 variables calculated based on whistle and click detections for each encounter (if specified by the user in the ROCCA parameters window):
Variable Code | Variable Name | Units | Explanation |
---|---|---|---|
Encounter_Duration_s | Encounter duration | Seconds | Duration from the start of the first whistle/click to the end of the last whistle/click |
Number_of_whistles | Number of whistles | None | Number of whistles |
Whistle_Duration_s | Whistle duration | Seconds | Duration from the start of the first whistle to the end of the last whistle |
Min_Time_Between_Whistle_Detections_s | Minimum time between whistles | Seconds | Minimum time between whistles |
Max_Time_Between_Whistle_Detections_s | Maximum time between whistles | Seconds | Maximum time between whistles |
Ave_Time_Between_Whistle_Detections_s | Average time between whistles | Seconds | Average time between whistles |
Whistle_Detections_per_Second | Whistles per second | Counts/s | The number of whistles / whistle duration |
Whistle_Density | Whistle density | None | Sum of the whistle durations / the encounter duration |
Ave_Whistle_Overlap | Average whistle overlap | None | Total duration during which whistles overlap / encounter duration |
Number_of_Clicks | Number of clicks | None | Number of clicks |
Click_Duration_s | Click duration | Seconds | Duration from start of first click to end of last click |
Min_Time_Between_Click_Detections | Minimum time between clicks | Seconds | Minimum time between clicks |
Max_Time_Between_Click_Detections | Maximum time between clicks | Seconds | Maximum time between clicks |
Ave_Time_Between_Click_Detections | Average time between clicks | Seconds | Average time between clicks |
Click_Detections_per_Second | Clicks per second | Counts/s | Sum of the click durations / encounter duration |
Ave_Click_Overlap | Average click overlap | None | Total duration during which clicks overlap / encounter duration |
Lat* | Latitude | Deg | Latitude |
Long* | Longitude | Deg | Longitude |
*Latitude and Longitude are not measured from the whistle and click data, but taken from the GPS source as specified in the Rocca Parameters Window Source tab.
Each encounter number holds a list of possible species based on the whistle/click classifier models used. There are two values stored for each species: the number of times a whistle/click has been classified to that species (displayed), and a cumulative total of all the percentage tree votes for the species (not displayed). When a new whistle/click classification is saved to an encounter number, the count of the classified species is increased by one and the percentage tree votes for each species are added to the corresponding cumulative totals.
The encounter is classified in one of two ways:
If an encounter classifier has been loaded, the vector of encounter parameters and the random forest probabilities from the whistle and click classifiers are analysed with the encounter random forest and the encounter is classified as the species with the highest percentage of tree votes.
2. If no encounter classifier has been selected by the user, the encounter is classified as the species with the highest cumulative percentage of tree votes. Note that this may be different than the species most often classified - the value shown in the sidebar species list. If the highest cumulative percentage of tree votes falls below the school threshold (as specified in the ROCCA Parameters window), the detection is classified as Ambiguous.