Package Stats

Class LogRegWeka

java.lang.Object
Stats.LogRegWeka

public class LogRegWeka extends Object
Performs a multinomial logistic regression using WEKA library
Author:
mo55
  • Constructor Summary

    Constructors
    Constructor
    Description
    Main Constructor - create a Multinomial Logistic Regression Model based on the passed training data, using the WEKA library
  • Method Summary

    Modifier and Type
    Method
    Description
    getAttFromProb(double p)
    Important: this method ONLY works for binomial (2-class) datasets with a single attribute x, and will return null if that is not true
    double[][]
    Return the coefficients of the classifier
    double[][]
     
    double[]
    getDistribution(double[] x)
    Passing the input array into the current regression model, a double array is passed back which contains the percentage values for each of the possible output classifications.
    Return exception string from model if it failed to fit.
    getPrediction(double[] x)
    Get a prediction (output variable) based on the passed input array.
    static void
    main(String[] args)
    Test model using data from https://machinelearningmastery.com/logistic-regression-tutorial-for-machine-learning/
    boolean
    setTrainingData(double[][] x, double[] y)
    Take the training data passed in, convert to something that WEKA understands and then create a logistic regression model
    boolean
    setTrainingData(double[] xVar, double[] yVar)
    Logistic regression where y variable i spassed in as a single array (e.g.
    boolean
    setTrainingData(double[] xVar, int[] yVar)
    Logistic regression where y variable i spassed in as a single array (e.g.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • LogRegWeka

      public LogRegWeka()
      Main Constructor - create a Multinomial Logistic Regression Model based on the passed training data, using the WEKA library
  • Method Details

    • setTrainingData

      public boolean setTrainingData(double[] xVar, int[] yVar)
      Logistic regression where y variable i spassed in as a single array (e.g. array of ranges) and y Variable may not be 0's or 1's.
      Parameters:
      xVar -
      yVar -
      Returns:
    • setTrainingData

      public boolean setTrainingData(double[] xVar, double[] yVar)
      Logistic regression where y variable i spassed in as a single array (e.g. array of ranges) and y Variable may not be 0's or 1's.
      Parameters:
      xVar -
      yVar -
      Returns:
    • setTrainingData

      public boolean setTrainingData(double[][] x, double[] y)
      Take the training data passed in, convert to something that WEKA understands and then create a logistic regression model
      Parameters:
      x - a 2d array of training data. Columns are any number of input variables (x1, x2, x3... aka attributes) and rows are data points
      y - the output variable. Length of array should match number of rows in x parameter. Since this is a logistic regression, the output is considered 'nominal' and not numeric - a distinct classification, and not a continuous variable. It's odd that nominal values should be passed as doubles, but that's what WEKA wants. For best results, use continuous integers starting at 0 - e.g. 0, 1, 2, 3 etc.
      Also, there can't be any gaps in the output of the training dataset - you can't have 0, 1, 2, 4. WEKA will throw an error.
      Keep track in your own code of what each value represents (e.g. for a binomial problem, 0=yes and 1=no; for a weather problem, 0=cold, 1=warm, 2=hot, etc).
      Returns:
      true=successful, false=unsuccessful
    • getPrediction

      public Double getPrediction(double[] x)
      Get a prediction (output variable) based on the passed input array. The order of the elements in the x array must match the order that was used in the training data. The Double output references the unique values that were used in the training data (0, 1, 2, etc).
      Parameters:
      x - an array containing the input variables to use in the regression
      Returns:
    • getDistribution

      public double[] getDistribution(double[] x)
      Passing the input array into the current regression model, a double array is passed back which contains the percentage values for each of the possible output classifications. Thus, if there are 3 potential classes (0, 1 and 2) then the method will return a 3-element array with a percentage in each index corresponding to the probability of the input variable falling into the corresponding category.
      Parameters:
      x - an array containing the input variables to use in the regression
      Returns:
    • getCoefficients

      public double[][] getCoefficients()
      Return the coefficients of the classifier
      Returns:
    • getCoeffUncertainty

      public double[][] getCoeffUncertainty()
    • getAttFromProb

      public Double getAttFromProb(double p) throws ArithmeticException

      Important: this method ONLY works for binomial (2-class) datasets with a single attribute x, and will return null if that is not true

      Given the probability p of classification as the second class, this method returns the attribute x required. If interested in the probability of classification as the first class, pass the value 1-p instead.

      The equation solved is P = 1 / (1 + e-(b0 + b1x)), where P is the probability desired for the second class, b0 and b1 are the coefficients, and x is the value that is solved for. Let's say there are 2 possible classes: undetected (y=0) and detected (y=1), and there is a single dependent attribute 'range'. If we want to know the range required for a 70% probability of classification as detected (y=1), we would call this method and pass it 0.7. If we wanted to know the range required for a 70% probability of classification as undetected (y=0), we would call this method and pass it 0.3.

      Parameters:
      p - the probability desired
      Returns:
      a Double value for the attribute, or null if this method fails
      Throws:
      ArithmeticException - thrown if the attribute calculation returns infinity or NaN
    • main

      public static void main(String[] args)
      Test model using data from https://machinelearningmastery.com/logistic-regression-tutorial-for-machine-learning/
      Parameters:
      args -
    • getModelError

      public String getModelError()
      Return exception string from model if it failed to fit.
      Returns:
      the modelError