Skip to main content

Introduction to machine learning

There are certain type of problems we face in everyday life that cannot be solved by definitive algorithm, e.g., object recognition, predicting financial markets etc. In such cases, we collect a set of input and output samples (called training set); and based on the trends on those data, we try to predict future outcomes.

We will see there are three broad category of machine learning problems: (1) supervised, (2) unsupervised learning, and (3) reinforcement learning. As the names suggest in the first case, we have data available to train our model.

(1) Supervised machine learning problems has two categories (a) classification and (b) regression. In case of classification, we have discrete targets/labels ( e.g., whether an email is real or spam, whether the image is of a cat or dog), while in case of regression, we need to predict a quantitative value (e.g., predicting second hand car price based on a set of features like brand, color, mileage driven, engine size etc.). The supervised regression is more like standard fitting program.

(2) On the other hand, in case of unsupervised learning problem, we do not have associated data labels. Like segmenting customers for a supermarket or bank, clustering similar music or movies etc.

(3) Finally, reinforcement learning is based on the interaction, it finds the optimal solution for a dataset in order to maximize a reward.

The Machine Learning Process

  1. Data collection and preparation
  2. Feature selection
  3. Algorithm selection
  4. Parameter and Model selection
  5. Training
  6. Evaluation

Machine learning algorithms

  • Supervised learning:
    • Decision trees
    • Naive Bayes
    • Least squares regression
    • Logistic regression
    • Neural Networks
    • Support Vector Machines
    • Ensemble methods
  • Unsupervised learning:
    • Clustering
    • Principal Component Analysis
    • Singular Value Decomposition
    • Independent Component Analysis
  • Reinforcement learning:
    • Q Learning
    • T Learning
    • Adversarial Learning
    • Genetic Algorithms
Application Scope

Today data science is heavily utilized by technology companies to power recommendation systems like advertisements, news feed, songs, videos, shopping; supply and demand analysis, speech recognition/ natural language processing, lifestyle improvements including medical applications, recognize pattern to detect frauds etc. Most of these applications requires extremely large and complex datasets. However, sometimes trained models are openly available for ready made usage. Beyond those, are there applications where small datasets from our day to day life can be useful in a practical way?