Mixture ModelsĀ¶

  • Data is assumed to come from one of many distributions

*\(D_{1}\)(x), \(D_{2}\)(x) ā€¦\(D_{k}\)(x) - the k difference distributions *Each data point is picked from one of these

Motivation:Ā¶

Formalizing the data generative model and coming up with principled soft/hard clustering algorithms

#General Mixture Model

  • Model can be ā€œhardā€ (each point from single cluster), or ā€œsoftā€ (each point belongs to multiple clusters)

  • Appropriateness depends on context

  • Captures difference between hard and soft clustering

  • Prominent examples

  • Gaussian mixture

  • Topic models for documents

Applications for mixture modelsĀ¶

  • Formalization of clustering - allows us to view clustering as a real parameter estimation problem, and thereby derive the clustering definition

  • Often, even when data is not cleanly clusterable, can fit a mixture to estimate different parameters of subgroups

Example 1 - house price modellingĀ¶

  • Data - prices of N houses

  • House price depends on various factors

  • Generative model

  • Price = function(neighborhood, area, design)+noise

Example 2 - topics in a documentĀ¶

  • Data = set of documents

  • document = bag of words

  • Topic = a probability distribution over vocabulary

  • Generative model

  • Choose a topic at random

  • Choose words for that topic

Example 3 - handwriting recognitionĀ¶

  • Data = 64 x 64 images, each of one handwritten digit

  • Generative model = pixel intensities from ideal digit + noise

  • Each mixture component =

(fix an ā€œidealā€ way of writing a digit) + noise

  • \( D_{1}(x) = Ī¼_{1} + noise \) intensities that come from ā€œideal 9ā€ query handwritten digit

../_images/12.png

Fig. 30 Here is my figure caption!Ā¶

#Gaussian Mixture Model

Each data point is generated first by choosing the particular Gaussian and then generating a sample from it

Note

Given the points, can we learn all the parameters, including the correct label of every point?

  • D_{i}(x) = N(Ī¼_{i}, Ļƒ^2_{i}), i = 1,2,ā€¦..,

../_images/Movie.gif

Fig. 31 An example of Gaussian Mixture in image segmentation with grey histogramĀ¶

Each data point is generated in the following manner:

  • Toss a k-faced dice, where probability of i^th face = \(W_{i}\)

  • If face j appears, then generate a point by sampling from \( D_{i}(x) = N(Ī¼_{i},Ļƒ^2_{i}) \)

../_images/mixture.png

Fig. 32 This is an example of guassian mixture model with 2 componenetsĀ¶

Given the points only can we learn all the parameters, including the correct label of every point

Combined pdf = \( \Sigma _{iā‰¤n} w_{i}N(Ī¼_{i},Ļƒ^2_{i}) \)

../_images/34.png

Fig. 33 Here is my figure caption!Ā¶

Note

Bayesian Gaussian mixture model using plate notation.

../_images/56.png

Fig. 34 Here is my figure caption!Ā¶

ObservationĀ¶

For the Gaussian mixture model, the pdf is \( \Sigma _{iā‰¤n} w_{i}N(Ī¼_{i},Ļƒ^2_{i}) \)

Is this the same as pdf to \( w_{1}Z_{1} + ... w_{k}Z_{k}, where \) \( Z_{i}~N(Ī¼_{i},Ļƒ^2_{i})? \)

#Bit of history!

  • Originally proposed in 1894, Karl Pearson, to classify crab population

  • Proposed a ā€œmethod of momentsā€, required solving a 9th degree polynomial to find the roots!

  • Subsequent research yielded more efficient algorithm, under certain assumptions

Author(s): Abhinav Kumar, Talari Venkata Sunny, Mekala Rishitha Ravi