Class LogRegWeka
- Author:
- mo55
-
Constructor Summary
ConstructorDescriptionMain Constructor - create a Multinomial Logistic Regression Model based on the passed training data, using the WEKA library -
Method Summary
Modifier and TypeMethodDescriptiongetAttFromProb
(double p) Important: this method ONLY works for binomial (2-class) datasets with a single attribute x, and will return null if that is not truedouble[][]
Return the coefficients of the classifierdouble[][]
double[]
getDistribution
(double[] x) Passing the input array into the current regression model, a double array is passed back which contains the percentage values for each of the possible output classifications.Return exception string from model if it failed to fit.getPrediction
(double[] x) Get a prediction (output variable) based on the passed input array.static void
Test model using data from https://machinelearningmastery.com/logistic-regression-tutorial-for-machine-learning/boolean
setTrainingData
(double[][] x, double[] y) Take the training data passed in, convert to something that WEKA understands and then create a logistic regression modelboolean
setTrainingData
(double[] xVar, double[] yVar) Logistic regression where y variable i spassed in as a single array (e.g.boolean
setTrainingData
(double[] xVar, int[] yVar) Logistic regression where y variable i spassed in as a single array (e.g.
-
Constructor Details
-
LogRegWeka
public LogRegWeka()Main Constructor - create a Multinomial Logistic Regression Model based on the passed training data, using the WEKA library
-
-
Method Details
-
setTrainingData
public boolean setTrainingData(double[] xVar, int[] yVar) Logistic regression where y variable i spassed in as a single array (e.g. array of ranges) and y Variable may not be 0's or 1's.- Parameters:
xVar
-yVar
-- Returns:
-
setTrainingData
public boolean setTrainingData(double[] xVar, double[] yVar) Logistic regression where y variable i spassed in as a single array (e.g. array of ranges) and y Variable may not be 0's or 1's.- Parameters:
xVar
-yVar
-- Returns:
-
setTrainingData
public boolean setTrainingData(double[][] x, double[] y) Take the training data passed in, convert to something that WEKA understands and then create a logistic regression model- Parameters:
x
- a 2d array of training data. Columns are any number of input variables (x1, x2, x3... aka attributes) and rows are data pointsy
- the output variable. Length of array should match number of rows in x parameter. Since this is a logistic regression, the output is considered 'nominal' and not numeric - a distinct classification, and not a continuous variable. It's odd that nominal values should be passed as doubles, but that's what WEKA wants. For best results, use continuous integers starting at 0 - e.g. 0, 1, 2, 3 etc.
Also, there can't be any gaps in the output of the training dataset - you can't have 0, 1, 2, 4. WEKA will throw an error.
Keep track in your own code of what each value represents (e.g. for a binomial problem, 0=yes and 1=no; for a weather problem, 0=cold, 1=warm, 2=hot, etc).- Returns:
- true=successful, false=unsuccessful
-
getPrediction
Get a prediction (output variable) based on the passed input array. The order of the elements in the x array must match the order that was used in the training data. The Double output references the unique values that were used in the training data (0, 1, 2, etc).- Parameters:
x
- an array containing the input variables to use in the regression- Returns:
-
getDistribution
public double[] getDistribution(double[] x) Passing the input array into the current regression model, a double array is passed back which contains the percentage values for each of the possible output classifications. Thus, if there are 3 potential classes (0, 1 and 2) then the method will return a 3-element array with a percentage in each index corresponding to the probability of the input variable falling into the corresponding category.- Parameters:
x
- an array containing the input variables to use in the regression- Returns:
-
getCoefficients
public double[][] getCoefficients()Return the coefficients of the classifier- Returns:
-
getCoeffUncertainty
public double[][] getCoeffUncertainty() -
getAttFromProb
Important: this method ONLY works for binomial (2-class) datasets with a single attribute x, and will return null if that is not true
Given the probability p of classification as the second class, this method returns the attribute x required. If interested in the probability of classification as the first class, pass the value 1-p instead.
The equation solved is P = 1 / (1 + e-(b0 + b1x)), where P is the probability desired for the second class, b0 and b1 are the coefficients, and x is the value that is solved for. Let's say there are 2 possible classes: undetected (y=0) and detected (y=1), and there is a single dependent attribute 'range'. If we want to know the range required for a 70% probability of classification as detected (y=1), we would call this method and pass it 0.7. If we wanted to know the range required for a 70% probability of classification as undetected (y=0), we would call this method and pass it 0.3.
- Parameters:
p
- the probability desired- Returns:
- a Double value for the attribute, or null if this method fails
- Throws:
ArithmeticException
- thrown if the attribute calculation returns infinity or NaN
-
main
Test model using data from https://machinelearningmastery.com/logistic-regression-tutorial-for-machine-learning/- Parameters:
args
-
-
getModelError
Return exception string from model if it failed to fit.- Returns:
- the modelError
-