This ‘Trojan Source’ Bug Threatens The Security Of All Computer Codes, Research Found

Trojan Source, invisible source code vulnerability

Programs are pretty much software made from a bunch of codes. From the many programming languages, software needs to be compiled before the codes can be executed.

Compilers are essentially the key that help human-written codes to run on machines in which the codes present. They literally translate the "high-level" coding that is human-readable, into machine-readable format that consists mostly of binary bits. Machines only understand bits, and this is why codes need compilers to run.

And this time, pretty much all of the world's computers are vulnerable to an exploit.

First discovered by researchers at the University of Cambridge in the UK, the exploit they call the 'Trojan Source' vulnerability affects compilers, in which bad actors can exploit to run malicious codes for malicious purposes.

With the exploit, bad actors could hypothetically feed machines their own codes instead of the ones that are originally intended by the developers.

In other words, the hackers can effectively override the instructions in a program, without the developers even know.

According to the report, the issue is found on Unicode, which is a component of the digital text encoding standard.

Using Unicode, computers can exchange information regardless of the programming language is being used.

At this time, Unicode has more than 143,000 characters across 154 different language scripts (in addition to many non-script character sets, such as emojis).

And the weakness here, involves its bi-directional algorithm.

Also called the 'Bidi' the algorithm handles the way Unicode displays text, which also includes mixed scripts with different display orders.

In other words, Bidi can override text, and make sentences to run from left-to-right, or right-to-left, for example. It also deals with homoglyphs, or characters that appear near identical.

“In some scenarios, the default ordering set by the Bidi Algorithm may not be sufficient,” explained the researchers at Cambridge. “For these cases, Bidi override control characters enable switching the display ordering of groups of characters.”

Bidi's overriding system can even override single-script characters.

“Therefore, by placing Bidi override characters exclusively within comments and strings, we can smuggle them into source code in a manner that most compilers will accept. Our key insight is that we can reorder source code characters in such a way that the resulting display order also represents syntactically valid source code,” the research paper stated.

“Bringing all this together, we arrive at a novel supply-chain attack on source code. By injecting Unicode Bidi override characters into comments and strings, an adversary can produce syntactically-valid source code in most modern languages for which the display order of characters presents logic that diverges from the real logic. In effect, we anagram program A into program B.”

Trojan Source
The attack can control characters embedded in comments and strings and reorder them. In this example, it can make comments to appear as if it were codes.

The issue here is because most programming languages let developers use Bidi to override comments and strings.

The thing is, according to the researchers, this is a bad practice because this is ignored by compilers and interpreters.

Also, it’s bad because most programming languages allow string literals that may contain arbitrary characters, including control characters.

“So you can use them in source code that appears innocuous to a human reviewer [that] can actually do something nasty,” said Ross Anderson, a professor of computer security at Cambridge, the co-author of the research.

“That’s bad news for projects like Linux and Webkit that accept contributions from random people, subject them to manual review, then incorporate them into critical code.

Not only that the scope is significant, because such attack by exploiting the bug can be extremely challenging since the rendered source code may look perfectly acceptable by the developers and even the code reviewers.

Trojan Source
A similar attack exists which uses homoglyphs, or characters that appear near identical.

"If the change in logic is subtle enough to go undetected in subsequent testing, an adversary could introduce targeted vulnerabilities without being detected," the researchers said.

Making things worse, Bidi can also override characters through copy-and-paste functions on most modern browsers, editors, and operating systems.

Because of this, this “Trojan” could hypothetically be used to cripple systems through large-scale supply chain attacks.

Attacks could be made silent but malicious, and affecting the entire software ecosystem a target is using.

This is why the researchers wrote that such vulnerability is “an immediate threat,” and could threaten “supply-chain compromise across the industry.”

"The fact that the Trojan Source vulnerability affects almost all computer languages makes it a rare opportunity for a system-wide and ecologically valid cross-platform and cross-vendor comparison of responses," the paper stated.

"As powerful supply-chain attacks can be launched easily using these techniques, it is essential for organizations that participate in a software supply chain to implement defenses."

Published: 
03/11/2021