The 'EarSpy' Attack Can Eavesdrop On Conversations Using Motion Sensors, Research Found

EarSpy

Sounds come from something that vibrates. And because of that fact, vibrations can carry lots of information.

Eavesdropping on telephone conversations has a long history, with the practice that can be traced back to the early days of telephony. The, in the 20th century, when phones were common household items, it was common for eavesdroppers to listen to other people's conversations by "tapping" to their telephone line.

As technology advances, so too the methods and tools used for eavesdrop.

This time, a team of researchers managed to develop an attack on Android devices that can recognize the caller's gender and identity, and even discern speech. Calling it the 'EarSpy,' it's a side-channel attack that explores new possibilities of eavesdropping through capturing sensor data readings caused by vibrations from the speakers in mobile devices.

The project is an academic effort of researchers from five American universities (Texas A&M University, New Jersey Institute of Technology, Temple University, University of Dayton, and Rutgers University).

EarSpy is capable of doing this successfully to a certain degree, and this makes it worrisome.

In the past, researchers have researched this type of attack, and they have found that data could be extracted from vibrations from smartphone loudspeakers and ear speakers.

But at the time, the vibrations were considered too weak to generate enough viable data, which in turn made it a minor security issue.

However, modern smartphones use increasingly powerful stereo speakers compared to older models. Some smartphone models can generate 50dB of sound, and even beyond 100dB.

In close proximity, that is about the same as a power tool or a car horn.

EarSpy
Ransom note of the newer version of Azov. (Credit: Check Point Research)

Sound source, check. Next, is how to steal it.

Smartphones are packed with lots of sensors that make them "smart." Among the sensors, include motion sensors to measure motion and gyroscope sensors to measure orientation and angular velocity.

On modern phones, these two sensors are extremely sensitive, and can even record the tiniest resonances coming from the speakers.

The researchers have proven that the earphone of a 2016 OnePlus 3T barely registers on the spectrogram, whereas the stereo ear speakers of a 2019 OnePlus 7T produce significantly more data.

The research team used a OnePlus 7T and OnePlus 9 device in their experiments, along with varying sets of pre-recorded audio that was played only through the ear speakers of the two devices.

Then, using a third-party app called the ‘Physics Toolbox Sensor Suite,’ they captured accelerometer data during a simulated call and then fed it to MATLAB for analysis.

"We developed a program in MATLAB to analyze the accelerometer data and detect the word region. When a speech is played on the ear speakers, spikes can be noticed in the Z-axis value of the accelerometer," the researchers explained.

After extracting the data from it, the researchers fed it to a machine-learning algorithm, which was previously trained on readily available datasets to recognize speech content, caller identity, and gender.

While test data varied and depends heavily on the dataset and the quality of the sources, but overall, it produced promising results for eavesdropping via the ear speaker.

EarSpy
Ransom note of the newer version of Azov. (Credit: Check Point Research)

According to the researchers, as explained on their paper (PDF):

"We evaluate the time and frequency domain features with classical ML algorithms, which show the highest 56.42% accuracy."

"As there are ten different classes here, the accuracy still exhibits five times greater accuracy than a random guess, which implies that vibration due to the ear speaker induced a reasonable amount of distinguishable impact on accelerometer data."

On the OnePlus 7T ranged between 77.7% and 98.7%, caller ID classification ranged between 63.0% and 91.2%, and speech recognition ranged between 51.8% and 56.4%.

On the OnePlus 9 device, the gender identification topped at 88.7%, identifying the speaker dropped to an average of 73.6%, while speech recognition ranged between 33.3% and 41.6%.

Back in 2020, the researchers managed to reach higher results when sourcing the data from a phone when the loudspeaker is on at its highest volume.

At that time, the researchers used the ‘Spearphone’ app, caller gender and ID accuracy reached 99%, while speech recognition reached an accuracy of 80%.

EarSpy
Ransom note of the newer version of Azov. (Credit: Check Point Research)

Making EarSpy attack dangerous, is also because of the arrangement of the device’s hardware components.

Since manufacturers of modern smartphones have to pack more things into tighter space, the tightness of the assembly impacts the diffusion of speaker reverberation.

Countering this type of attack is difficult.

Again, because sound comes from something that vibrates, and that there could be no sound without vibration, there is no proper way to nullify the attack.

The only thing that is plausible, is to reduce the efficiency of the EarSpy attack, and that is by reducing the volume of the ear speaker. A lower volume should prevent eavesdropping via this side-channel attack, because the vibration it creates is lower.

Also, it should be more comfortable for the ear.

Android 13 has introduced a restriction in collecting sensor data without permission for sampling data rates beyond 200 Hz. While this prevents speech recognition at the default sampling rate (400 Hz – 500 Hz), it only drops the accuracy by about 10% if the attack is performed at 200 Hz.

The researchers suggest that phone manufacturers should ensure sound pressure stays stable during calls and place the motion sensors in a position where internally-originating vibrations aren’t affecting them or at least have the minimum possible impact.

Published: 
03/01/2023