Sift features
Scale - Invariant feature transform
-
Invariant to transformation
-
Features based on image’s gradients
-
It produces a dictionary of visual words
-
with size 128xN
N is the size of the dictionary.
-
Each word is a histogram of sift descriptors
(eg. kitchen, store, etc.)
-
They were used for the feature extraction of Bag of Sift and Spatial Pyramid.
SIFT features between 2 images with the same scene
Different color spaces
The color spaces with more than 1 channel had a different approach than the grayscale color:
-
Concatenate the color channels into a 2D matrix
-
Use vl_phow
-
Dense SIFT features
-
Extract SIFTs from all channels separately
Not much improvement with this method
-
Spatial Pyramid
- Collection of orderless feature histograms
- Each level consists of a grid with histograms
- Histograms are created by the local SIFT descriptors on each quadrant
- For each level a weight is applied
Additional scene recognition method:
Fisher Encoding
Introduced in Image Classification with the Fisher Vector: Theory and Practice (Perronnin et al. 2007)
Build upon the Bag Of Visual Words method
Fisher Vocabulary:
- Extract SIFT descriptors from each image.
- Apply Gaussian Mixture Model (GMM) to the obtained features.
- Instead of clustering
- returns the means, covariances, priors that are used as a vocabulary
Fisher Vector:
- Extract SIFT descriptors
- Compute the fisher vector of each image by using their SIFT features and the already computed vocabulary
- Each vector represents an image
Comparison with BoS:
Advantages:
- It can be computed with much smaller vocabularies
Disadvantages:
- Takes more storage
- (2D+1)N –1
Steps:
Steps for SIFT extraction
Step 1: Obtain the set of bag of features
build_vocabulary.m
-
get images
-
extract sift features from images
-
get descriptors from extracted features
-
cluster the descriptors
will find similar features in each image and create visual words for each of it
-
obtain dictionary with visual words.
Image illustrating the process of creating a vocabulary of visual words
Step 2: Obtain the bag of features for an image
get_bag_of_sifts.m
- extract sift features of the image
- get the descriptor for each point
- match the feature descriptors with the vocabulary of visual words (vocab.mat)
- build the histogram with the features descriptors
it will be created with the frequency of each feature in an image each feature will correspond to a visual word in the dictionary
- the visual words with the highest frequency will is the class of that image (prediction)
visual words -> a set of numbers representing a feature
Steps for Spatial Pyramid
spatial_pyramid.m
-
get images
-
extract sift features from images
-
get descriptors from extracted features
-
find the minimum distance of the the extracted features and the ons from the already computed vocabulary
D = vl_alldist2(vocab',features)
[~,ind] = min(D);
. -
construct a histogram with those values.
It will be the histogram with SIFT features for Level 0 of the pyramid.
-
Create a matrix with the total levels of the pyramid
6.1 Each level will have a number of quadrants
6.2 Each quadrant will be represented with a histogram of its SIFT features.
6.3 Then each level will have those histograms concantated into a row, for the pyramid.In will result into a bigger histogram
-
Apply the appropriate weight to each level
Classification
1. kNN
2. SVM
- useful lecture: https://youtu.be/iGZpJZhqEME
Results
Bag of Sift
kNN
SVM
Spatial Pyramid
kNN
SVM
Fisher Vector
kNN
SVM
Conclusion
- The less the step size the slower and more memory MATLAB was using
- Spatial Pyramid gave good results till level 2 with RGB color space
- After level 2, not really better results, much more computational power
- Feature Step Size of 5 seemed to worked fine with all methods
- Fisher Vector method worked better with smaller vocabulary
- kNN classifier was really slow in comparison with SVM
- Too much data