Skip to the content.

Sift features

Scale - Invariant feature transform

They were used for the feature extraction of Bag of Sift and Spatial Pyramid.

SIFTexplaining

siftExampleOnline

sift

SIFT features between 2 images with the same scene


Different color spaces

The color spaces with more than 1 channel had a different approach than the grayscale color:

  1. Concatenate the color channels into a 2D matrix

  2. Use vl_phow

    • Dense SIFT features

    • Extract SIFTs from all channels separately

      Not much improvement with this method

colorChannels

Spatial Pyramid

spatial_pyramidHistExample

Additional scene recognition method:

Fisher Encoding

Introduced in Image Classification with the Fisher Vector: Theory and Practice (Perronnin et al. 2007)

Build upon the Bag Of Visual Words method

Fisher Vocabulary:
  1. Extract SIFT descriptors from each image.
  2. Apply Gaussian Mixture Model (GMM) to the obtained features.
    • Instead of clustering
    • returns the means, covariances, priors that are used as a vocabulary
Fisher Vector:
  1. Extract SIFT descriptors
  2. Compute the fisher vector of each image by using their SIFT features and the already computed vocabulary
    • Each vector represents an image

Fisher

Comparison with BoS:

Advantages:

Disadvantages:

Steps:

Steps for SIFT extraction

Step 1: Obtain the set of bag of features

build_vocabulary.m

  1. get images

  2. extract sift features from images

  3. get descriptors from extracted features

  4. cluster the descriptors

    will find similar features in each image and create visual words for each of it

  5. obtain dictionary with visual words.

SIFTexplaining

Image illustrating the process of creating a vocabulary of visual words

Step 2: Obtain the bag of features for an image

get_bag_of_sifts.m

  1. extract sift features of the image
  2. get the descriptor for each point
  3. match the feature descriptors with the vocabulary of visual words (vocab.mat)
  4. build the histogram with the features descriptors

    it will be created with the frequency of each feature in an image each feature will correspond to a visual word in the dictionary

  5. the visual words with the highest frequency will is the class of that image (prediction)

visual words -> a set of numbers representing a feature


Steps for Spatial Pyramid

spatial_pyramid.m

  1. get images

  2. extract sift features from images

  3. get descriptors from extracted features

  4. find the minimum distance of the the extracted features and the ons from the already computed vocabulary
    D = vl_alldist2(vocab',features)
    [~,ind] = min(D);.

  5. construct a histogram with those values.

    It will be the histogram with SIFT features for Level 0 of the pyramid.

  6. Create a matrix with the total levels of the pyramid
    6.1 Each level will have a number of quadrants
    6.2 Each quadrant will be represented with a histogram of its SIFT features.
    6.3 Then each level will have those histograms concantated into a row, for the pyramid.

    In will result into a bigger histogram

  7. Apply the appropriate weight to each level

spatial_pyramid


Classification

1. kNN

2. SVM




Results

Bag of Sift

kNN

knnBoSsift

SVM

svmBoSsift

RGB Confusion Matrix

GRAYSCALE Confusion Matrix

knnsvmBoSphow

RGB Confusion Matrix

GRAYSCALE Confusion Matrix


Spatial Pyramid

kNN

knnSPsift

RGB Confusion Matrix

GRAYSCALE Confusion Matrix

SVM

svmSPsift

RGB Confusion Matrix

GRAYSCALE Confusion Matrix


Fisher Vector

kNN

knnFISHER

RGB Confusion Matrix

GRAYSCALE Confusion Matrix

SVM

svmFISHER

RGB Confusion Matrix

GRAYSCALE Confusion Matrix


Conclusion