0

Individual choice of the number of features in the recognition of graphic images

There is a whole class of algorithms for recognizing graphic images based on the comparison of features (“special” points) of an unknown object with etalon images. One of the most well-known mechanisms for distinguishing features is presented in [2] and implemented in the popular OpenCV software library. Other known algorithms are described in [1, 3, 4, 5]. The advantage of the recognition algorithm on the basis of features is the ability to find objects on images, where they are rotated to arbitrary angles and can partially overlap. The probability of correct recognition in this algorithm, first of all, depends on the number and content of the reference images, as well as on the number of features being compared.

When using algorithms based on features, the speed problem is significantly affected by the increase in the number of etalon images, when the features of an unknown image must be compared with all the features of all etalon images. Excessive increase in the number of features leads to a decrease in the performance of the algorithm, so the idea arises to choose the number of features individually for each reference image, which will reduce the recognition time while maintaining quality.

The essence of the described algorithm on the basis of features is as follows. On the images that are compared with each other there are features (function cvExtractSURF in OpenCV), after which they are compared with each other in order to find matching pairs. The more features, the longer the comparison time. But if there are not enough features, you will not be able to recognize objects. It is shown in [2] that the most accurate and effective in performance are features based on Hessian. For each image point p = (x, y), the Hessian H (p, σ) is calculated, where σ is the scale, as follows:

where Lxx (p, σ) are Gaussian partial derivatives of the second order for the image at the point p, similarly for Lxy (p, σ) and Lyy (p, σ).

To limit the number of features, a threshold is selected, and each point of the image whose Hessian is less than the threshold level is discarded.
Let there are 10 objects (Figure 1), which can be recognized using features (there is a sufficient number and quality of features). Image sizes are 384 by 288 pixels in grayscale. It is necessary, one reference image for each image, to recognize the remaining images. For testing we used a personal computer with an Intel Core 2 Duo processor 2.2 GHz.

Having obtained 10 feature groups for 10 different images, these features were compared with features obtained from 100 input images. Comparisons of the recognition results were carried out for the setting of a different Hessian threshold value. In tables 1-2, the results of recognition of 10 images (10 for each object) are shown line by line, obtained for the previously defined features of the etalon images (columns). The test images were compared with the etalons (columns). Table 1 is constructed for the Hessian threshold value of 4000, the recognition time of all 100 images was 10.9 seconds with the recognition accuracy of 0.66.

Table 1. Recognition results for the threshold Hessian of 4000

1 2 3 4 5 6 7 8 9 10
1 10 0 0 0 0 0 0 0 0 0
2 0 3 3 1 0 0 0 0 3 0
3 0 0 10 0 0 0 0 0 0 0
4 0 0 0 10 0 0 0 0 0 0
5 3 0 0 0 1 0 0 0 6 0
6 0 0 0 0 0 0 0 0 9 1
7 0 0 0 0 0 0 10 0 0 0
8 0 0 0 0 0 0 0 2 8 0
9 0 0 0 0 0 0 0 0 10 0
10 0 0 0 0 0 0 0 0 0 10

With a Hessian threshold of 800 (Table 2), the recognition time of 100 images was 18.6, and the recognition accuracy was 0.96.

Table 2. Recognition results for the Hessian threshold of 800

1 2 3 4 5 6 7 8 9 10
1 10 0 0 0 0 0 0 0 0 0
2 0 10 0 0 0 0 0 0 0 0
3 0 0 10 0 0 0 0 0 0 0
4 0 0 0 10 0 0 0 0 0 0
5 0 0 0 0 10 0 0 0 0 0
6 0 0 0 0 0 6 0 0 4 0
7 0 0 0 0 0 0 10 0 0 0
8 0 0 0 0 0 0 0 10 0 0
9 0 0 0 0 0 0 0 0 10 0
10 0 0 0 0 0 0 0 0 0 10

The criterion for the effectiveness of the real-time image recognition system is not only the reliability of pattern recognition, but also performance. Therefore, it is desirable that the reliability should tend to 100%, and the recognition time to a minimum. As shown earlier, choosing for each image your operating mode of the system, you can achieve the optimal values ​​of these parameters. However, in order to be able to choose the operating mode of the algorithm, it is necessary to know approximately the complexity of the recognized image. If the flow of images arriving on the system is somehow related to each other, then the complexity value of the previous image can be used. But in this task the images are not connected with each other. In spite of this, in the considered algorithm, in order to reduce the recognition time and increase the reliability of recognition, it is possible to modify the approach to controlling the recognition process.

If a threshold level is known for each etalon image, in which images are recognized with the required level of confidence, then upon receipt of an unknown image, it can be compared with the features obtained for different Hessian threshold levels, which will reduce the total recognition time. Table 3 shows the number of errors for each image (in rows), confidence levels and recognition time for a different threshold level.

Table 3. The number of errors, the reliability (D), and the recognition time (T) for different threshold levels of the Hessian

200 500 700 800 900 1400 3000 4000
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 2 0 7
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 5 9
6 0 1 2 4 7 10 10 10
7 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 1 8
9 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0
D 1 0.99 0.98 0.96 0.93 0.88 0.84 0.66
T, сек. 34.7 23.1 19.8 18.6 17.5 14.3 11.3 10.9

When an unknown image enters the input of the algorithm, it is compared with each etalon, the characteristics of which are obtained for different threshold levels. Using the optimal threshold level noted in Table 3, the approach was tested on the same 100 images. The reliability was 100% with a recognition time of 15.9 seconds. This is 17% faster than at the threshold level of 800, 25% faster than at the level of 700, and 45% faster than at the level of 500. At the same time, the reliability of the recognition is higher.
The disadvantage of the algorithm optimization described above is the need to experimentally calculate the required mode for each etalon image. When the number of etalon images is increased by an order of magnitude, it becomes necessary to automatically select the Hessian threshold level. The criterion for choosing a threshold level is to use the total number of features.

Suppose that there are N reference images ω1, ω2, …, ωN, for each of which one can choose one of the M possible threshold levels of the Hessian Ht1, Ht2, …, HtM, and

Ht1 < Ht2 < … < HtM

You can choose the minimum number of features (C), at which to carry out pattern recognition. Then, when classifying a classifier, it is necessary to sequentially sort out the Hessian threshold levels, beginning with HtM, until the number of features becomes large or equal to C.
During testing, 100 reference images were used, and 1000 test images (10 for each etalon image). The following Hessian threshold levels were used in training: 200, 500, 800, 1400, 4000. In tests, the Hessian level was taken to be 200 (using other threshold values, for example 500, increases performance, but reduces the level of confidence in recognition). Variable C was chosen equal to 30.

Figure 2 shows the dependencies of the average authenticity of pattern recognition and the total recognition time of 100 images at a fixed threshold value, as well as using automatic threshold selection according to the method described above. When using the automatic approach: a) the significance of the reliability is much greater than in the other modes; b) the performance of the recognition system is increased in comparison with all modes except 4000.

Important when using the individual threshold level of Hessian is the question of choosing the optimal number of features. Thus, in the previous example, an approximate value of 30 was chosen, but it is desirable to select a number of features at which the maximum efficiency of the recognition system will be achieved, with maximum confidence with a minimum of time for recognition.
To determine the dependencies between the reliability, the time of recognition and the number of features in the formation of reference images, a computational experiment was carried out. In order to determine the same number of features for each image, its Hessian threshold level was selected in the range from 200 to 4000. The same images as in the previous case were used for the experiment. Figure 3 shows the obtained dependencies of the recognition of images on the number of features for each etalon image, the total time of recognition of thousands of test images from the number of features.

Figure 3 shows that the optimal recognition mode is achieved with the number of features about 35. It is better to choose a little more features, because with a smaller amount, a sharp degradation of the recognition reliability begins.

Usually the optimal number of features is in the range (30, 50), however, the reference image and possible background images (larger in size than the reference image) play an important role. Therefore, for the most effective recognition, it is necessary to carry out the training procedure, which consists of several stages.

Stage one – selection of background images and layouts of the etalon. Formation of test samples is carried out by combining the background image and the etalon image. Let there be a set of background images I = {I1, I2, …, IN}, where N is the number of background images, then M test images are formed from each image. Figure 4 shows an example of combining background and reference images. When creating test images, the location and the reference on the background are fixed.


Depending on the requirements for the recognition algorithm, you can change various parameters: position, slope, size, tilt not in the image plane.

Stage two – the recognition of the etalon image on background images. After the previous stage, the total number of test images is N * M. On each of these images, you need to search for a etalon image. The recognized image will be considered only if the returned location will cover at least 50% of the location of the sought object.

Step three is the determination of the optimal number of features. The criterion of optimality is the maximum value of reliability, since the recognition time increases linearly. Therefore, there are no difficulties in determining the maximum.

To test this approach, 2 etalon images

The dependencies of the recognition accuracy on the number of features used in the standard for the first etalon by SURF (1), the FREAK method (2)

The dependencies of the recognition accuracy on the number of features used in the standard for the first etalon by SURF (1), for the second standard by FREAK (2)

Referenses:

  1. Alahi A., Ortiz R., Vandergheynst P. FREAK: Fast Retina Keypoint. In IEEE Conference on Computer Vision and Pattern Recognition, 2012.
  2. Bay H., Tuytelaars T., Van Gool L. Surf: Speeded up robust features. In 9th European Conference on Computer Vision, Graz Austria, May 2006.
  3. Leutenegger S., Chli M., Siegwart R. BRISK: Binary Robust Invariant Scalable Keypoints // ICCV 2011: pp. 2548-2555.
  4. Lowe D. G. Distinctive Image Features from Scale-Invariant Keypoints // International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.
  5. Rublee E., Rabaud V., Konolige K., Bradski G. R. ORB: An efficient alternative to SIFT or SURF. ICCV 2011, pp. 2564-2571.

Aleksandr Kruchinin

Leave a Reply

Your email address will not be published. Required fields are marked *