Week 13 - Machine Learning - Unsupervised Learning from M.L. course by Stanford Uni (Week 1 Part III)

Liaw Bei Le · June 16, 2021

Unsupervised Learning:

  • Clustering algorithms
  • Non-clustering algorithms

I am learning about Machine Learning via the course offered by Coursera named: Machine Learning by Stanford University. This post covers Week 1 materials.

What I learnt:

Unsupervised learning

  • Second major problem type in machine learning
  • In unsupervised learning, we get unlabeled data & we can use machines to structure this dataset
    Supervised learning Unsupervised learning
  • Allows us to approach problems with little or no idea what our results should look like
    • One way of doing this would be to cluster unlabeled data into to groups (derive structure by clustering data based on relationships among the variables in the data.)
    • We can derive structure from data where we don’t necessarily know the effect of the variables.
    • Automatically generate structure

With unsupervised learning, there is no feedback based on the prediction results.

Clustering algorithm:

Example of clustering algorithms:

  • Google news
    • Groups news stories into cohesive groups
      Google news Google news 2
  • Genomics
    • Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.
      Genes grouped
      • Microarray data
      • Have a group of individuals
      • On each measure expression of a gene
      • Run algorithm to cluster individuals into types of people
  • Organize computer clusters
    • Large data centers, that is large computer clusters and trying to figure out which machines tend to work together and if you can put those machines together, you can make your data center work more efficiently.
  • Social network analysis
    • Given knowledge about which friends you email the most or given your Facebook friends or your Google+ circles, automatically identify which are cohesive groups of friends, which are groups of people that all know each other
  • Market semgentation
    • Look at customer data set and automatically discover market segments and automatically group your customers into different market segments so that you can automatically and more efficiently sell or market your different market segments together
  • Astronomical data analysis
    • Clustering algorithms gives surprisingly interesting useful theories of how galaxies are formed

Non-clustering algorithm:

Example of non clustering algorithm

E.g. 1: Cocktail party problem

The “Cocktail Party Algorithm”, allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).

Cocktail party problem

Overlapping voices, 2 people talking.
Microphones at different distances from speakers.
Record slightly different versions of the conversation depending on where microphones are.

Cocktail party problem recordings

Steps:

  • Obtain recordings of the conversation from each microphone.
  • Process them in a ‘cocktail party’ algorithm
  • Algorithm processes audio recordings
  • Determines there are two audio sources
  • Separates out the two sources

Not easy to identify but, programs can be short!
Algorithm can be done with one line of code:
[W,s,v] = svd((repmat(sum(x.x,1), size(x,1),1).x)*x’);

Using Octave (or MATLAB) for algorithms are very useful:

  • Often prototype algorithms in Octave/ MATLAB to test as it’s very fast
  • Only when it works on Octave, migrate it to C++ or Java
  • Gives a much faster agile development

Cocktail party algorithm

Understanding this algorithm:

  • svd - linear algebra routine which is built into octave
  • In C++ this would be very complicated!
  • Using Octave / MATLAB to prototype is a really good way to do this.

Twitter, Facebook