Sift features
Scale - Invariant feature transform
Invariant to transformation
Features based on image’s gradients
It produces a dictionary of visual words
with size 128xN
N is the size of the dictionary.
Each word is a histogram of sift descriptors
(eg. kitchen, store, etc.)
They were used for the feature extraction of Bag of Sift and Spatial Pyramid.
SIFT features between 2 images with the same scene
Different color spaces
The color spaces with more than 1 channel had a different approach than the grayscale color:
Concatenate the color channels into a 2D matrix
Use vl_phow
Dense SIFT features
Extract SIFTs from all channels separately
Not much improvement with this method
Spatial Pyramid
- Collection of orderless feature histograms
- Each level consists of a grid with histograms
- Histograms are created by the local SIFT descriptors on each quadrant
- For each level a weight is applied
Additional scene recognition method:
Fisher Encoding
Introduced in Image Classification with the Fisher Vector: Theory and Practice (Perronnin et al. 2007)
Build upon the Bag Of Visual Words method
Fisher Vocabulary:
- Extract SIFT descriptors from each image.
- Apply Gaussian Mixture Model (GMM) to the obtained features.
- Instead of clustering
- returns the means, covariances, priors that are used as a vocabulary
Fisher Vector:
- Extract SIFT descriptors
- Compute the fisher vector of each image by using their SIFT features and the already computed vocabulary
- Each vector represents an image
Comparison with BoS:
- It can be computed with much smaller vocabularies
- Takes more storage
- (2D+1)N –1
Steps for SIFT extraction
Step 1: Obtain the set of bag of features
get images
extract sift features from images
get descriptors from extracted features
cluster the descriptors
will find similar features in each image and create visual words for each of it
obtain dictionary with visual words.
Image illustrating the process of creating a vocabulary of visual words
Step 2: Obtain the bag of features for an image
- extract sift features of the image
- get the descriptor for each point
- match the feature descriptors with the vocabulary of visual words (vocab.mat)
- build the histogram with the features descriptors
it will be created with the frequency of each feature in an image each feature will correspond to a visual word in the dictionary
- the visual words with the highest frequency will is the class of that image (prediction)
visual words -> a set of numbers representing a feature
Steps for Spatial Pyramid
get images
extract sift features from images
get descriptors from extracted features
find the minimum distance of the the extracted features and the ons from the already computed vocabulary
D = vl_alldist2(vocab',features)
[~,ind] = min(D);
. -
construct a histogram with those values.
It will be the histogram with SIFT features for Level 0 of the pyramid.
Create a matrix with the total levels of the pyramid
6.1 Each level will have a number of quadrants
6.2 Each quadrant will be represented with a histogram of its SIFT features.
6.3 Then each level will have those histograms concantated into a row, for the pyramid.In will result into a bigger histogram
Apply the appropriate weight to each level
1. kNN
2. SVM
- useful lecture:
Bag of Sift
Spatial Pyramid
Fisher Vector
- The less the step size the slower and more memory MATLAB was using
- Spatial Pyramid gave good results till level 2 with RGB color space
- After level 2, not really better results, much more computational power
- Feature Step Size of 5 seemed to worked fine with all methods
- Fisher Vector method worked better with smaller vocabulary
- kNN classifier was really slow in comparison with SVM
- Too much data