Artificial Intelligence: Why do facial recognition systems fail?

Artificial Intelligence: Why do facial recognition systems fail?

Contrary to password-protected systems, our biometric information is widely available and relatively easy to obtain. Therefore, there are some types of attacks that are easy to implement and that can be successful if there are no measures to avoid them. In particular, facial recognition systems can be compromised using one of the following methods:

  • A photography
  • A video
  • A 3D face model

Various methods have been developed to deal with the problem of spoofing with face images. These can be divided into two approaches: dynamic characteristics and static characteristics.

Dynamic feature approaches seek to detect motion in a video sequence by analyzing the trajectory of specific segments of the face. These reveal valuable information to discriminate between real faces and static copies. Some typical methods are those based on the detection of the lids of the eyes; head and face gestures (nodding, smiling, or looking in different directions) and face and gaze tracking through flow estimation. These techniques are highly effective at detecting attacks that use photos, but are less effective when it comes to videos.

In order to increase the performance in video attacks, specific methods of liveness detection in videos have been developed. For example, exploring the 3D structure of videos, analyzing a large number of 2D images with different head positions; context-based analysis to take advantage of the non-facial information available in the samples, such as characteristics of movements in the scene (movement in the background vs. foreground), and others. Modified versions of Local Binary Patterns or LBP are also being used, mostly to take advantage of the temporal information present in the video or to analyze the dynamic textures in comparison with rigid objects such as photos and masks.

The search for solutions

One way to tackle the problem is to focus on detecting life. For this, it is necessary to consider a spatio-temporal representation that combines the facial aspect and its dynamics. To achieve this, the key lies in using a spatio-temporal representation based on LBP due to the performance shown in the modeling of face movement and recognition of facial expressions, and also in the recognition of dynamic texture.

How is spoofing in facial recognition detected?

The LBP operator for texture analysis is defined as a texture-invariant grayscale measure, derived from a general definition in a local area. This is a powerful texture descriptor, and its properties for real-world applications include its discriminative power, computational simplicity, and tolerance to monotonic grayscale changes.

The LBP operator was initially conceived to deal with spatial information. However, its use has been extended to space-time representations for dynamic texture analysis, giving way to the Volume Local Binary Pattern (VLBP) operator.

VLBP consists of finding the dynamic texture in a video, which is represented as a volume (X, Y, T), where X and Y denote the spatial coordinates and T represents the frame index. On the other hand, the area close to each pixel is defined in a three-dimensional environment. The volume of VLBP can be defined by orthogonal planes, giving way to what is known as LBP-TOP or LBP Three Orthogonal Planes. Here the XY, XT and YT planes are defined. From them, the LBP maps are extracted for each plane, denoted as XY-LBP, XT-LBP and YT-LBP and then they are concatenated to obtain the LBP representation considering a pixel of the volume as the center, as shown in the figure .

LBP in three orthogonal planes. (a) The planes intersect one pixel. (b) LBP histograms of each plane. (c) Concatenation of the histograms.

In the LBP-TOP operator, the radius of the LBP algorithm on the X axis is denoted Rx, on the Y axis it is denoted Ry and on the T axis it is denoted by Rt.

The number of neighboring points in the XY, XT, and YT planes is PXY, PXT, and PYT, respectively. The type of operator in each plane can vary, these can be, uniform patterns (u2) or uniform patterns invariant to rotation (rui2).

Unlike photographs, real faces are non-rigid objects with contractions of the facial muscles that result in temporary deformations. For example, eyelids and lips. Therefore, it is assumed that specific patterns of facial movement should be detected when a living human is observed with a frontal camera. The movement of a photograph in front of a camera causes distinctive movement patterns that do not describe the same pattern as a genuine face.

The figure presents the anti-spoofing methodology, which consists of the following stages:

LBPTOP-based anti-spoofing method block diagram.
  1. Each frame of the original sequence is converted to grayscale and ran through a face detector.
  2. The detected faces are geometrically normalized to 64 × 64 pixels. This, in order to reduce the noise of the face detector, the same bounding box is used for each set of frames used in the calculation with the LBP-TOP operator.
  3. The LBP operator is applied in each plane (XY, XT and YT) and the histograms are calculated and then concatenated.
  4. A binary classifier is used to determine what the actual data is.

Each of the videos, whether of actual attacks or accesses, is transformed into a 3D and grayscale arrangement that represents the spatial distribution X, Y, T. Then, they are divided into sequences of 75 frames to which it is applied a face detection algorithm in the center frame.

This method is useful for preventing simple attacks (such as photographs), but not recommended for more complex attacks. The objective of the method is to identify temporary variations, which can be easily violated with a mask. That is why it is always suggested to combine methods to build a robust biometric system.

For more information and the code of the developed project visit the project on GitHub.




Cielo nocturno estrellado

How to apply AI tools for health innovation?

Since its inception, the development of artificial intelligence has been exposed to much scrutiny and even some mistrust from the scientific communities and especially the general public. However, the constant advances of AI tools have sought to overcome these obstacles to find solutions to the great problems of humanity.

In November 2018, the Duke University Health System Emergency Department launched "Sepsis Watch." The tool was designed through deep learning to help professionals in the area detect the first signs of one of the leading causes of hospital death worldwide: infections and their overwhelming ability to wreak havoc on the human body.

The dreaded sepsis occurs when an infection triggers inflammation throughout the body, which can cause immediate -and multiple- organ failure. Fever, shortness of breath, low blood pressure, fast heartbeat, and mental confusion are just some of its symptoms. Although its effects are extremely harmful, the truth is that it can be treated with an early diagnosis. However, this is easier said than done since its early signs are often confused with other ailments.

Sepsis Watch is the product of three and a half years of development, during which medical records were digitized and 32 million data points were analyzed. Subsequently, the Duke University team focused on designing a simple interface so that the tool could be used in the form of an iPad app. The app checks each patient's information and assigns them a rating based on their probability of developing the condition. Once a doctor confirms the diagnosis, an immediate treatment strategy is put in place.

The result is a drastic reduction in the deaths of patients from sepsis. Currently, the AI tool is part of a federally registered clinical trial. The preliminary results of which will be available by 2021.

VOYAGER: AI Tools solution for the health area made in Chile

Similar to the cases of death due to sepsis, arterial hypertension, Alzheimer's, schizophrenia, retinitis pigmentosa, asthma and diabetes mellitus are pathologies with high mortality rates according to the WHO. Due to the complexity of their diagnosis, their treatment normally consists of rigid protocols, the results of which may vary from one patient to another.

VOYAGER, developed by UNIT, focuses on exponentially improving the management of these diseases, known as multifactorial. Through the use of artificial intelligence, the system is capable of processing data collected by voice interfaces to fully understand the status of each patient and perform predictive and automated monitoring of their treatment.

Similar to what happens with Sepsis Watch, this translates into more efficient diagnoses and identification of higher risk cases, directly impacting the fatality rate of these diseases. In concrete terms, VOYAGER's goal is to reduce serious hospitalizations by 50% for those suffering from diabetes, cerebrovascular diseases, hypertension and even obesity, both in public and private health.