Sift features

Scale - Invariant feature transform

Invariant to transformation
Features based on image’s gradients
It produces a dictionary of visual words
- with size 128xN
  
  N is the size of the dictionary.
- Each word is a histogram of sift descriptors
  
  (eg. kitchen, store, etc.)

They were used for the feature extraction of Bag of Sift and Spatial Pyramid.

SIFTexplaining

siftExampleOnline

sift

SIFT features between 2 images with the same scene

Different color spaces

The color spaces with more than 1 channel had a different approach than the grayscale color:

Concatenate the color channels into a 2D matrix
Use vl_phow
- Dense SIFT features
- Extract SIFTs from all channels separately
  
  Not much improvement with this method

colorChannels

Spatial Pyramid

Collection of orderless feature histograms
Each level consists of a grid with histograms
Histograms are created by the local SIFT descriptors on each quadrant
For each level a weight is applied

spatial_pyramidHistExample

Additional scene recognition method:

Fisher Encoding

Introduced in Image Classification with the Fisher Vector: Theory and Practice (Perronnin et al. 2007)

Build upon the Bag Of Visual Words method

Fisher Vocabulary:

Extract SIFT descriptors from each image.
Apply Gaussian Mixture Model (GMM) to the obtained features.
- Instead of clustering
- returns the means, covariances, priors that are used as a vocabulary

Fisher Vector:

Extract SIFT descriptors
Compute the fisher vector of each image by using their SIFT features and the already computed vocabulary
- Each vector represents an image

Fisher

Comparison with BoS:

Advantages:

It can be computed with much smaller vocabularies

Disadvantages:

Takes more storage
- (2D+1)N –1

Steps:

Steps for SIFT extraction

Step 1: Obtain the set of bag of features

build_vocabulary.m

get images
extract sift features from images
get descriptors from extracted features
cluster the descriptors

will find similar features in each image and create visual words for each of it
obtain dictionary with visual words.

SIFTexplaining

Image illustrating the process of creating a vocabulary of visual words

Step 2: Obtain the bag of features for an image

get_bag_of_sifts.m

extract sift features of the image
get the descriptor for each point
match the feature descriptors with the vocabulary of visual words (vocab.mat)
build the histogram with the features descriptors

it will be created with the frequency of each feature in an image each feature will correspond to a visual word in the dictionary
the visual words with the highest frequency will is the class of that image (prediction)

visual words -> a set of numbers representing a feature

Steps for Spatial Pyramid

spatial_pyramid.m

get images
extract sift features from images
get descriptors from extracted features
find the minimum distance of the the extracted features and the ons from the already computed vocabulary
D = vl_alldist2(vocab',features)
[~,ind] = min(D);.
construct a histogram with those values.

It will be the histogram with SIFT features for Level 0 of the pyramid.
Create a matrix with the total levels of the pyramid
6.1 Each level will have a number of quadrants
6.2 Each quadrant will be represented with a histogram of its SIFT features.
6.3 Then each level will have those histograms concantated into a row, for the pyramid.

In will result into a bigger histogram
Apply the appropriate weight to each level

spatial_pyramid

Classification

1. kNN

2. SVM

useful lecture: https://youtu.be/iGZpJZhqEME

Results

Bag of Sift

kNN

knnBoSsift

SVM

svmBoSsift

RGB Confusion Matrix

GRAYSCALE Confusion Matrix

knnsvmBoSphow

RGB Confusion Matrix

GRAYSCALE Confusion Matrix

Spatial Pyramid

kNN

knnSPsift

RGB Confusion Matrix

GRAYSCALE Confusion Matrix

SVM

svmSPsift

RGB Confusion Matrix

GRAYSCALE Confusion Matrix

Fisher Vector

kNN

knnFISHER

RGB Confusion Matrix

GRAYSCALE Confusion Matrix

SVM

svmFISHER

RGB Confusion Matrix

GRAYSCALE Confusion Matrix

Conclusion

The less the step size the slower and more memory MATLAB was using
Spatial Pyramid gave good results till level 2 with RGB color space
- After level 2, not really better results, much more computational power
Feature Step Size of 5 seemed to worked fine with all methods
Fisher Vector method worked better with smaller vocabulary
kNN classifier was really slow in comparison with SVM
- Too much data