Week 13 - Machine Learning - Unsupervised Learning from M.L. course by Stanford Uni (Week 1 Part III)

Liaw Bei Le · June 16, 2021

Unsupervised Learning:

Clustering algorithms
Non-clustering algorithms

I am learning about Machine Learning via the course offered by Coursera named: Machine Learning by Stanford University. This post covers Week 1 materials.

What I learnt:

Unsupervised learning

Second major problem type in machine learning
In unsupervised learning, we get unlabeled data & we can use machines to structure this dataset
Allows us to approach problems with little or no idea what our results should look like
- One way of doing this would be to cluster unlabeled data into to groups (derive structure by clustering data based on relationships among the variables in the data.)
- We can derive structure from data where we don’t necessarily know the effect of the variables.
- Automatically generate structure

With unsupervised learning, there is no feedback based on the prediction results.

Clustering algorithm:

Example of clustering algorithms:

Google news
- Groups news stories into cohesive groups
Genomics
- Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.
  - Microarray data
  - Have a group of individuals
  - On each measure expression of a gene
  - Run algorithm to cluster individuals into types of people
Organize computer clusters
- Large data centers, that is large computer clusters and trying to figure out which machines tend to work together and if you can put those machines together, you can make your data center work more efficiently.
Social network analysis
- Given knowledge about which friends you email the most or given your Facebook friends or your Google+ circles, automatically identify which are cohesive groups of friends, which are groups of people that all know each other
Market semgentation
- Look at customer data set and automatically discover market segments and automatically group your customers into different market segments so that you can automatically and more efficiently sell or market your different market segments together
Astronomical data analysis
- Clustering algorithms gives surprisingly interesting useful theories of how galaxies are formed

Non-clustering algorithm:

Example of non clustering algorithm

E.g. 1: Cocktail party problem

The “Cocktail Party Algorithm”, allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).

Cocktail party problem

Overlapping voices, 2 people talking.
Microphones at different distances from speakers.
Record slightly different versions of the conversation depending on where microphones are.

Cocktail party problem recordings

Steps:

Obtain recordings of the conversation from each microphone.
Process them in a ‘cocktail party’ algorithm
Algorithm processes audio recordings
Determines there are two audio sources
Separates out the two sources

Not easy to identify but, programs can be short!
Algorithm can be done with one line of code:
[W,s,v] = svd((repmat(sum(x.x,1), size(x,1),1).x)*x’);

Using Octave (or MATLAB) for algorithms are very useful:

Often prototype algorithms in Octave/ MATLAB to test as it’s very fast
Only when it works on Octave, migrate it to C++ or Java
Gives a much faster agile development

Cocktail party algorithm

Understanding this algorithm:

svd - linear algebra routine which is built into octave
In C++ this would be very complicated!
Using Octave / MATLAB to prototype is a really good way to do this.

Share: Twitter, Facebook