Face Recognition and Skin Extraction under a Dynamic Video- Juniper Publishers
Juniper Publishers- Journal of Robotics
Abstract
As society develops further, the demand for more
complex information increases. With this in mind, the use of the naked
eye becomes unsatisfactory, as the desired information may present
itself in a manner that requires more advanced technology in order to
capture it. Therefore, we need to understand how to capture and confirm
what it is. For the reason, in this paper, we demonstrate first how to
capture the video information, deal, and store each frame as an image
then by comparing the current frame with its previous frame, finally
return a histogram matching algorithm value (Bhattacharyya Coefficient)
for their similarity, meanwhile determine whether it is a fast-moving
object. As an object appears on a dynamic video, the first frame
differential dichotomy processes the image and disregards the
background. The object then appears again in a second image via the same
processing while a background image is taken simultaneously. These
steps allow for the corresponding regions to be analysed so that the
object is identified via a collaboration of the two separate images.
Afterwards, face recognition and skin pigmentation are extracted and
detailed. By using Face ++ interface for face and skin recognition
experiment, after some tests, in the case of a good picture resolution,
the recognition rate is about 70%; the skin color extraction is one kind
of relevant color extracting, when it has a well-lit background, so the
effect will be better.
Keywords: Similarity; Frame differential dichotomy; Face recognition; Skin color extraction
Abbrevations: FLD: Fisher Linear Discriminator; DLA: Dynamic Link Architecture; PCA: Principal Component Analysis
Introduction
Nowadays, the human vision is more and more limited
with the development trend of the world. For observation, when an object
flash too fast or small in some cases, the human’s eye sometimes is
unable to catch these things due to its structure and characteristics
[1]. Because of these limitations of human eye, however people still
want to acquire and understand these information, we need to take our
eyes to observe things through the computer.
Scientists and researches utilize this study to
detect, determine and track the video (static background) of dynamic
objects. Video object recognition and tracking for the purposes of this
aspect of computer vision research is not an alien concept, and then in
recent years to identify objects and real-time tracking of the object of
everyone’s attention more and more technology, and now are usually
intelligent wisdom under the circumstances, for the city, it is hoped
that more and better urban environment, crime rates drop, so they know
we are not satisfied with the recorded data, extract information not
satisfied, we want to tap the regularity of the content in the video,
for example, through a person’s posture we can see this is a good man or
a bad man hovering waiting crime of opportunity there that we need to
face life video recognition. Do people face security control, you need a
video
device on the grade. This includes human face acquisition, data
extraction and expression, and so on. By these methods, it is possible
to recognize facial expressions more clearly [2-4]. There by understand
this technology has a broad space for development and market, and thus
produced for this research and understanding of the idea.
The object recognition is a very easy thing due to
its real applications related on human vision, however, it is not an
easy thing for a computer system [5-6]. Since the object recognition
scene is complex and would change at any time, thatwill cause some
difficulties for object identification and tracking by a computer. In
addition, the most important thing is the recognition accuracy will drop
a lot. So for this reason, this studymainly focus on understanding and
exploring a dynamic video under a static background.
The main processes of this study first is through
the video analysis and background information to calculate these image
in video, then re-take the image of each frame by comparing their degree
of acquaintance among images in video to determine whether there has
been an object can be identified, and then after these return-related
information, which will be marked on the feedback moment in the image
frame for easy observation. If the object is identified and recognized
the facecase, it will make a different kind of feedback information
such as a color recognition or the number of individual
appearing in the video.
For the above process described, this paper will analyze
and use a number of different algorithms and methodologies
to deal with the various stages in order to understand each
algorithm and compare the rate and efficiency of operations,
and then improve them after summarizing conclusions. The
operating platform is selected on the PC online using Open CV
and MATLAB caculation.
Methods and Experiments
The present trend of object recognition
Identifying object in a video is belonging to aspects of the
image recognition, for the purpose, it needs to experience
three development stages: character recognition, digital
image processing and recognition, object recognition. Text
recognition began to be studied from 1950, and became more
and more strong after the digital image storage capacity
promoted, it has a convenience and a huge advantage to usbut
is easy to distort, so people began to study the digital image
processing and recognition methodologies to tackle it. The
main object recognition refers to the three-dimensional world
of objects and the environment perception and understanding,
it is an advanced computer vision scope [7]. In the image
recognition mode, there are three kind of development process
ways to have been identified: statistical pattern recognition,
structural pattern recognition, fuzzy pattern recognition [8].
In the past ten years,the objectrecognition in video has been
the study core for many researchers. Up to nowdevelopment,
there is indeed a great achievement, such as domestic face ++
and HW Cloud for the study of face recognition, and there are
some domestic companies for the image recognition in the text
and the image recognition for studying bank card numbers
and other identification strings, even the network has a lot of
relevant information.
It means that the image recognition technology at home
and abroad is growing fast, however, for object recognition
in video, this research is not very mature, because most of
the studies are not very complex at specific scene collection
and analysis, and do some non-specific or random scene still
existing a certain degree of difficulty, that is, an adaptive
performance is poor, if the target image has a strong noise, or
a larger image defects, it will has usually no desired results [9-
11]
Video deal & image similarity
When identifying objects in video, we first need to
deal with the video, the video needs to be converted to the
corresponding frame picture output which can be learned by
image processing and recognition.When one image in video is
converted into each frame, because the object may not always
be necessarily to appear in the video, when detecting whether
it is an object, many frames without the object appearing will
be also included in the detected queue, for example, if a period
of 20 s video only has 1swith an object appearing, then this
will be the case many unnecessary image framesare detected,
and thus will have a longer run hours and be inefficient. And
we need to understand how to filter out these unwanted
image frames. To solve the above problem, in this research,we
comparetheir acquaintance between theseimage frame to
filter out some useless image frame.
Image similarity algorithm introduction
Histogram matching
Histogram matching must first be a gray-scale, then
transfer it into the desired histogram. It has the following
three steps: 1) the histogram of original image is equalized of
gray; 2) form the desiredhistogram; 3) reverse the first step
conversion [12-14] .
When comparing the acquaintancedegree uses thehistogram
matching, it usually calculates Bhattacharyya distance
and Bhattacharyya coefficient,where the pasteurized factor
is used to measure two discrete probability distribution. The
pasteurized factor is the amount of overlap of two statistical
samples’ myopia calculation [15-16]. Why it is used for the
processing image similarity,because its effect on the image
acquaintance degree is the best. These formulasare the
followingEquation (1) and (2). Bhattacharyya distance:

where q and p refer to the two discrete probability
distribution in the number domain, BC refers to the pasteurized
factor, Bhattacharyya coefficient as follows:

where a and b are for the two samples, n is the number of
sub-blocks, ai and bi are the a and b number in section i.
We start histogram based matching approach to compare
their image acquaintance by the above description. Theimage
must be first gray-scale processing, here we use MATLAB
fuction to transform as the following statement: I = rgb2gray
(M), where M is theinput image andI is the grayscale image
referredto M. Then use the statement [Count, x] = imhist (I)
to read the grayscale image histogram information, where
Count is the histogram data of vector grayscale image and x
is the corresponding color vectorrespectively, then go through
the calculation of distance and coefficientformulas. Another
method instead of MATLAB for grayscale images are as
follows: For grayscale images, the image is actually through
the weighted average of RGB three components to acquire the
heaviest gray value as the following standard formula: gray =
0.11B + 0.59G + 0.3R, where is just a one kind of weight from
the perspective of human physiology. As Figure 1(a)-(b) show
the original image, then got the results as shown in Figure 2
observation.


As shown in these results, the hitis their respective
histogram data referringthe pasteurized factor as the
similarity of the two images. But the method is indeed obvious
weak point, for example,one picture with above white color and
below black color compared with one picture with aboveblack
color and below white color, their similarity is 100, hit = 1.00,
this is an apparent error.
Sift Transform Algorithm
Sift is also known as scale-invariant feature transform,
which was developed by [9], and made further advance in
2004; its application includes object recognition, robot map
perception and navigation, image stitching, 3D modeling,
gesture recognition, video tracking and motion match; this algorithm has been patent and ownedby the University of
British Columbia.Sift algorithm consists of four steps:
a) detection of the extreme value in scale space;
b) use the neighborhood pixel at key points for the
distribution parameters of gradient direction as each
critical point in the specified direction, so that the
operators pose rotational invariances;
c) generate the sift feature vector and rotate the axis of
feature point to ensure its rotational invariance;
d) do the feature matching [17].
Perceptual hash algorithm
This is a kind of method that it will generate a ‘fingerprint’
string after each image is processed, and then make a
comparison with it, if it is closer to the result, it would has
much more similarity. Usually the first step of the algorithm
is for image size shrinking to reduce the differences caused
by the different image scaleand then simplifies color to gray
image and compares each gray pixel again after calculating the
average of gray value, If the mean greater is than or equal to
the average, denoted by 1, others by 0.5, more thanassemblesas
the above results, it is a so-called image ‘fingerprint’, finally
followsup the same rule for the different images. The advantage
for this method is that no matter how you deal with any image,
change its size or color, its ‘fingerprint’ will not change.
With the above three algorithms, the latter two algorithms
havea more powerful calculation and higher accuracy for these
high complexity image than the first algorithm, howeverour
study does not fix on the high image acquaintance, andwe just
want to know whether there is an object in the video on the
line, so that this study uses and implements a simple histogram
matching algorithm.
Video background extraction
There are lots methodologies for video background
extraction [18], for example, the time average, the multiframe
average, the codebook method [19], and the Gaussian
mixture background model method [20]. Here we use to interframe
difference method to realize it, which uses to images
adding plus meaning to deal the video background extraction.
Assuming the taken all image in frames are designated
picture (N), where N refers to the N-th frame of video, specific
implementation as follows:
For N=1:NumberofFrames
Backg=Backg+picture(N)
End
Backg=Backg/NumberofFrames
But when we have some real-time requirements for the
background processing, then do as follows:
For N=1:mov.NumberofFrames
If(N<=NumberofFrames)
Backg=Backg+picture(N)
Else
Backg=Backg+picture(N)-picture(N-NumberofFrames)
End
End
Backg=Backg/NumberofFrames
The video background extractions can update real-time
through the manner described above.
Mark motion objects
The object recognition is carried out by determining the
differences between two images to confirm the object; it can
be performed as follows:
C=picture(N)-Backg;


Here it is described by two images as shown in Figure 3 & 4,
and the result is shown in Figure 5. The image will be binarized
after image subtraction, which uses iterative method based on
the approximation value, first find the maximum and minimum
gray value recorded as Rmax and Rmin, then make the
threshold T = (Rmax+ Rmin) 2 , divide it into two groups
R1 and R2 based on the mean gray value after the threshold value, then obtain two average gray value of μ1 and μ2, finally
obtain the new threshold value T = (μ1+μ 2) 2 . Furthermore,
let f (x, y) is a function of the input gray-scale image, and g (x, y)
is a binary image output as follows:


Figure 6 shows the display after the manner described
above, then extract the inside information from the image
binarized result and feed back to the picture (N).

Mark the recognized objects
By the manner described above, we have successfully found
the objects in video meantime label them, so now we want to
identify items. Because each object has its own characteristics,
even the same thing for basic terms that have different forms,
for this reason, we will mainly fix the recognized object marks
for face recognition.
Face Recognition
The researchers have a lot of attention for face recognition
in the recent studies. Because the recognition from the image
of a person’s face is a much challenged project, the face size,
orientation and posture in a image have different changes;
even also some special cases, such as too bright and object
occlusion cases; the above situations will affect the recognition
efficiency. Here we introduce the definition of face recognition:
it is to determine the location size and posture of all face in the
input image if present; face recognition is a key technology of
face information processing in recent years, and it has become
a research subject within the field of pattern recognition and
computer vision [14].
Principal components analysis
Principal Component Analysis(PCA) is for the optimal
orthogonal transformation in image compression, [15-16]
proposed the first PCA for face recognition application based
on the Eigen face concept using the main component vector
to reshape human face. The principle is based on the optimal
orthogonal variation to expand network recursive manner
to achieve its recognition. For this mode, it is assumed there
are X categories in face gallery, and each category has Y face
training images. Each image will be treated as a sample Pij (the
jth people image in the ith category), where 1≤i≤X, 1≤j≤Y, and
image is N * N, first the image icon will be vectorized into N2 *
1, then the average training vector in X face image is as follows:

uses Pij-μ to acquire the mean difference vector eij, after
that forms a N2 * XY dimension matrix as follows:

extract the eigenvalues and orthonormal eigenvectors of
total scatter matrix AAT through this matrix, denoted by B =
AAT, then introduce R = ATA, and find out the corresponding
eigenvalues λi and eigenvectors vi, so can obtain the feature
vector:

Finally sort out i λ , and select the corresponding eigenvector
subspace according to requirement, thus complete the
dimensionality reduction and feature extraction purposes
[16], PCA uses the described above mathematics for face
recognition. However, this approach has some shortcomings,
such as light and size, etc., which will degrade its recognition
rate.
Neural networks face recognition mode
Artificial neural network is a set of brain science, neural
psychology and information science crossover studies, its
algorithm is a mathematical model to simulate the human
brain systems and applications; it is a lot number of processing
units support each other with non-linear, adaptive information
processing system [12-13]. This method can be for a lowresolution
face image, partial autocorrelation function, and
partial second-order matrix. The main advantages for this
method are
a) simulate a person’s thinking in image;
b) have a massively parallel and collaboration processing
power;
c) strong self-learning ability and adaptability;
d) has a good fault tolerance;
e) have a non-linear mapping.
Elastic image matching face recognition
Elastic graph matching method is based on dynamic link
architecture (DLA) method, which presents the face with a
trellis sparse diagram, nodes in image use the Gabor wavelet
of image position to obtain the feature vector mark called the
jet, and the image margin is tagged with the distant vector.
Its matching process flowchart is shown in Figure 7. Wavelet
analysis is characterized by a time-frequency analysis, when
the point in space at different frequencies in response to the
surrounding area constitutes the point feature string, the
high-frequency portion will correspond to the small details
within the scope, and the low frequency part of the point will
be a wide range around the scope. The wavelet transform
elastic image matching algorithm takes into account both the
face local detail and retains its spatial distribution, therefore,
Gabor function is often used as a wavelet base function [9].

The Gabor transform is generated by an analog of the human
visual system. The retinal imaging can be decomposed into a
set of filter images by simulating the human visual system; each
image can be decomposed to reflect these intensity changes
in frequency and direction within the local area. The texture
features can be obtained by a set of multi-channel Gabor filters,
which actually is to design Gabor transforms, where fisher
linear discriminator (FLD) refers to an improved feature face.
Although this method can tolerate a certain degree of change
in posture, facial expression and lighting, etc., but due to the
high complexity of time and space, it is difficult to meet the
requirements of large-scale real-time face recognition.
Face recognition method summary
At moment, there are many different algorithms for face
recognition, not only just above described, for example, the
Hidden Markov model, support vector machine (SVM) for
face recognition [6], the line segment of Hausdorff distance
for face recognition and human face recognition methods are
based on skin color. Each method has its own advantages and
disadvantages, however the present study, we tend to mix a
variety of methods.
Experiment Tests
Main process
Here the main study is to implement whole operation for
some complicated tests like multi-faces recognition and multiskin
color extraction under a video flow, the main process
flowchart is shown in Figure 8. After each experiment, we will
explore their effects.

Video processing
Because this test video has no fast-speed objects, we use
second speed rate for image processing in video and intercept
their background by a few frames in a second. Now the video
in seconds is intercepted, then their background is updated for
image display in real time. In this research, the video is mainly
static background, so we can directly use the background of
this scene as the background, and it is updated in one second time in several frames of video as it relates to the quality of
the background image and the entire experimental results.
Table 1 shows the image similarity rate for the filtered image
collection after the moving object is identified.




The next recognition test is by the face ++ interface, we
found that some time they still cannot be identified from these
results and thus it is needed to do more processing. When it is
confirmed in the previous image, in the frontal face appearing,
and then if no face appears again through the process at the
current image, then confirms it is in the different frame and
identifies it. Furthermore, when the received human face
number of Jason data from the face++ interface does not
return 0, then put this frame as the current output. Therefore,
we make a further process based on the modified process as
Figure 9 shown and the processed output is shown in Figure 10
(a). When these faces are recognized by the above-described
manner, the skin color region extraction is followed up and the
results are shown in Figure 10 (b). Because the background
color and the body color are somewhat similar, so some part of
the background sometime will be also extracted.
The experimental tests show a more successful set of
results. In general, the method described above is for the
PC base, it can be seen in the study there is still some effect,
although fast object recognition marked area is sometimes not
very obvious and face recognition in many people’s situation
cannot be fully recognized and skin color is not very complete
extracted. The main reason is that the software process speed
cannot match with the hardware’s.
Conclusion
In the study, we demonstrated some characteristics of the
image operations in a video through some simple algorithms
and methods and made some improvement such as a sift
transform algorithm and perceptual hashing algorithm to
fulfill. The face recognition rate can be promoted by increasing
the image resolution, and then takes into account the speed, so
the video resolution is set on 640 × 360. We can extract more
background in frames by summing and averaging them, then
the effect is better. For recognizing object in video, we try a
running marked in real time, and face recognition and skin
color extraction have been realized by extracting a video image
to observe their effect and implemented some algorithms, the
experimental tests show a more successful set of results.
Acknowledgement
This research was supported by HuaQiao University, Fujian,
P.R. China under the HuaQiao Scientific Research Foundation
for Talents plan.
Authors’ Contribution
First demonstrate how to capture the video information,
deal, and store each frame as a image then by comparing the
current frame with its previous frame, and then analyze the
object corresponding region marks on images then determine
what the object is and meanwhile do the face recognition and
color skin extraction.
For More Open Access Journals Please Click on: Juniper Publishers
Fore More Articles Please Visit: Robotics & Automation Engineering Journal
Comments
Post a Comment