Choosing your classifier in multi-classification (brief introduction)

Abdalrahmenyousif
2 min readOct 13, 2022

--

Scikit-learn groups classification under Supervised Learning, and in that category you will find many ways to classify. The variety is quite bewildering at first sight. The following methods all include classification techniques:

  • Linear Models
  • Support Vector Machines
  • Stochastic Gradient Descent
  • Nearest Neighbors
  • Gaussian Processes
  • Decision Trees
  • Ensemble methods (voting Classifier)
  • Multiclass and multioutput algorithms (multiclass and multilabel classification, multiclass-multioutput classification)

You can also use neural networks to classify data

A better approach

A better way than wildly guessing, however, is to follow the ideas on this downloadable ML Cheat sheet. Here, we discover that, for our multiclass problem, we have some choices

What classifier to go with?

So, which classifier should you choose? Often, running through several and looking for a good result is a way to test. Scikit-learn offers a side-by-side comparison on a created dataset, comparing KNeighbors, SVC two ways, GaussianProcessClassifier, DecisionTreeClassifier, RandomForestClassifier, MLPClassifier, AdaBoostClassifier, GaussianNB and QuadraticDiscrinationAnalysis

Reasoning

Let’s see if we can reason our way through different approaches given the constraints we have:

  • Neural networks are too heavy. Given our clean, but minimal dataset, and the fact that we are running training locally via notebooks, neural networks are too heavyweight for this task.
  • No two-class classifier, We do not use a two-class classifier, so that rules out one-vs-all.
  • Decision tree or logistic regression could work. A decision tree might work, or logistic regression for multiclass data.
  • Multiclass Boosted Decision Trees solve a different problem.
  • The multiclass boosted decision tree is most suitable for nonparametric tasks, e.g. tasks designed to build rankings.

Scikit-learn offers this table to explain how solvers handle different challenges presented by different kinds of data structures:

Conclusion

Take some time to read through the many options Scikit-learn provides to classify data. Dig deeper into the concept of ‘solver’ to understand what goes on behind the scenes.

--

--