w00tsec: openwrt

This post covers some common techniques that I use to analyze and reverse firmware images. These techniques are particularly useful to dissect malicious firmwares, spot backdoors and detect unwanted modifications.

Backdooring and re-flashing firmware images is becoming mainstream: malicious guys are infecting embedded devices and inserting trojans in order to achieve persistence. Recent articles covered the increasing number of trojanized android firmwares and routers that are being permanently modified.

Attackers with a privileged network position may MITM your requests and forge fake updates containing malicious firmwares. Writing Evilgrade modules for this is really simple, as most vendors keep failing to deliver updates securely, right ASUS?

All your HTTP packets are belong to us...

Older versions of ASUS firmwares were vulnerable to MITM attacks (CVE-2014-2718) because it transmitted updates over HTTP and there were no security/signature checks. ASUS silently patched the issue on 3.0.0.4.376+ and they're now verifying RSA signatures via /sbin/rsasign_check.:

Valid signature -> nvram_set("rsasign_check", "1")

NoConName 2014 CTF Finals: Vodka

I'll keep my tradition of writing posts based on CTF challenges because ~~everybody upvotes CTF posts on reddit~~ it's cool.

The challenge "Vodka", from NoConName 2014 CTF Finals was created by @MarioVilas, who kindly provided the files here (thanks dude!).

I did not participate on the CTF finals, but I found the challenge really interesting because there were many different ways to solve it, summarizing the actions needed to audit a compromised firmware. In my opinion, the best CTF challenges are the ones that require us to develop/use new techniques and improve existing tools.

NoConName 2014 Finals: Vodka
Challenge Category: Forensics
Description: No hints :( just get the flag.

This challenge description is not very intriguing, so I hired a couple of marketing specialists to ~~design a new logo~~ add some Infosec drama and reformulate it:

A mysterious bug affected one of the core routers at a major Internet service provider in Syria. The failure of this router caused the whole country to suddenly lose all connection to the Internet. The Syrian government recorded a traffic capture right before the crash and hired you to perform a forensic analysis.

Download provided: https://github.com/MarioVilas/write-ups/blob/master/ncn-ctf-2014/Vodka/vodka

Network Forensics

The download provided is a packet capture using the PCAP-NG format. Wireshark is too mainstream, so let's convert the PCAP-NG to PCAP and open it using Network Miner:

Network Miner makes it very easy for us to understand what's going on: there's some sort of file transfer via TFTP and the filename seems to be related to an OpenWRT firmware image.

Firmware structure

We always binwalk all the things but very few people stop to analyze and understand the firmware structure properly. We know that the firmware image was downloaded using TFTP, a common way used by many routers to transfer config files/updates and it is probably based on the OpenWRT project.

So what does binwalk tell us?

The Commom Firmware Environment (CFE) is a firmware interface/bootloader present on Broadcom SOCs. It is analogous to the BIOS on PC platforms and it is responsible for CPU initialization and bootstrap code on embedded processors. The CFE is also referred as PMON and it is generally mapped to mtd0.

The JFFS2/NVRAM filesystem is the non-volatile partition. They store all the configuration parameters, including router settings, passwords and logs.

Bear in mind firmware updates generally do not include the CFE/NVRAM partition. You can access the CFE console using serial and you can also dump them on a live system using DD or via SPI. Let's focus on the firmware sections included on the provided image (openwrt-wrtsl54gs-squasfs.bin):

TRX (Offset 0x20)

The TRX header is just an encapsulation, describing a series of information from the firmware, including the image size, CRC, flags, version information and partition offsets. Binwalk wasn't recognizing the header and the relative offsets properly so I submitted these two pull requests. Creating custom signatures for binwalk is pretty straightforward.

Some firmwares (like the newer ones from ASUS and Netgear) use this TRX structure but don't include a loader: the Linux Kernel and the RootFS may be shifted on this occasion.

If the firmware includes any extra header before the TRX, you have to sum their size with the displayed partition offsets in order to find the real values. Some firmwares for SOHO modems out there won't include it, so these values should be right on most cases. The downloaded OpenWRT image had the following offsets:

Loader: 0x20 + 0x1C = 0x3C
Kernel: 0x20 + 0x8D8 = 0x8F8
RootFS: 0x20 + 0x7E400 = 0x7E420

In this specific case, we have a BinHeader right before the TRX, indicating the board ID, the FW Date and the Hardware Date. The struct is described on cyutils.h:

This extra header appears on a few routers like the WRT54G series: the Web GUI checks for this pattern before actually writing the firmware.

We are particularly interested on the fwdate field (Firmware Date), composed by the hex values 07 02 03. According to addpattern.c, the first byte defines the year, the second one is the month and the third byte refers to the day the firmware was created. The fwdate seems to be 03-February-2007, save that for later, we will need that =)

GZ'd LZMA Loader (Offset 0x3C)

According to OpenWRT Wiki, the boot loader has no concept of filesystems: it assumes that the start of the TRX data section is executable code.

The boot loader boots into an LZMA program which decompresses the kernel into RAM and executes it. It turns out the boot loader does know gzip compression, so we have a gzip-compressed LZMA decompression program at 0x3C.

You can find the source code for this lzma-loader here and here. Note the TEXT_START offset at 0x80001000: we may need to adjust the Loading Address on our Disassembler in order to reverse the compiled loader. Don't forget to decompress it (gunzip) before reversing the file.

Most embedded toolchains would strip the binaries in order to reduce the firmware size. If you want to reverse a friendlier version of the loader, grab the latest OpenWRT ImageBuilder and search for loader.elf:

Woohoo, blue code =)

Note that if we modify the loader to include a backdoor, we would have our very own Router Bootkit, cool isn't it?

LZMA'd Kernel (Offset 0x8F8)

Instead of just putting a kernel directly onto flash, most embedded devices compress the kernel using LZMA. The boot loader boots into an LZMA program which decompresses the kernel into RAM and executes it.

Binwalk has a signature to find Kernel strings in raw Linux Kernels. The identified string lists the toolchain used to compile the Kernel, as well as the compiled date and version information:

And why did binwalk manage to find all these information from the Kernel? The answer can be found on the toolchain's Makefile:

If we follow the steps from my previous post we can build a customized Kernel for OpenWRT. The generated vmlinux is generally an ELF file, but in our case, the object was stripped using objcopy:

Did you notice the compile date was 03-February-2007? Let's save that for later as well.

SquashFS (Offset 0x72420)

The last part is the actual filesystem. Most embedded Linux devices use SquashFS and many vendors hack it in order to get better compression and faster performance. Hopefully we don't have to worry about that as Sasquatch handles different SquashFS header/compression formats.

The filesystem has the standard OpenWRT directories and files, including a banner from the 0.9 build (White Russian).

Both binwalk and sasquatch display the SquashFS superblock information, including the creation/last append time:

Did you spot the date 29-October-2014? There's definitely something going on here =)

Directory Tree Diff & Fuzzy Hashing

Now that we have unpacked & unsquashed the firmware, let's use binwally to compare the directory tree and find the needle in the haystack.

After googling the filename (openwrt-wrtsl54gs-squashfs.bin), we get three possible candidates:

- https://downloads.openwrt.org/whiterussian/0.9/default/openwrt-wrtsl54gs-squashfs.bin
- https://downloads.openwrt.org/whiterussian/0.9/micro/openwrt-wrtsl54gs-squashfs.bin
- https://downloads.openwrt.org/whiterussian/0.9/pptp/openwrt-wrtsl54gs-squashfs.bin

OpenWRT offers different builds for the same device because of constraints like limited flash size. Let's download these three candidates, unpack and compare them:

binwally.py ctf/_openwrt-wrtsl54gs-squashfs.bin.extracted/ micro/_openwrt-wrtsl54gs-squashfs.bin.extracted/

The "micro" build has the highest overall match score (99%), let's spot the differences:

binwally.py ctf/_openwrt-wrtsl54gs-squashfs.bin.extracted/ micro/_openwrt-wrtsl54gs-squashfs.bin.extracted/ | grep -E -v "ignored|matches"

After carefully reviewing these files, we notice that the "/etc/profile" was modified to include a call to the nc backdoor.

The LZMA'd Kernel (offset 0x8F8) is the same on both images, even though binwally reports a difference. This happens because binwalk extraction doesn't know when to stop and both files also contain additional data like the SquashFS partition.

The backdoor located at "/bin/nc" is a simple bash script that checks the MD5 from "/etc/profile" and draws a Nyan Cat along with the challenge key. In order to get the proper key, we simply modify the file location to the relative path "./etc/banner", to avoid overlapping with the file from the original system.

After running the file, we get the key NCNdeadb6adec4c77a40c23e04770924d3c5b18face.

This was just too easy right? But what if we didn't have a known template for comparison?

Timeline Analysis

My tool of choice to perform timeline analysis is Plaso, created by @el_killerdwarf. The tool is python-based, modular and very fast. What I like most about it is the ease to output results to ELK. If you don't know about Plaso and the ELK stack, read this quick tutorial and set up your environment.

Let's use log2timeline to create a dump file, pointing to the extracted SquashFS path:

log2timeline.py output.dump squashfs-root/

Let's fire up psort and include data in the timeline:

psort.py -o elastic output.dump

That's all, Plaso uses the filestat parser to extract metadata from the files, outputting results to Elasticsearch.

We already identified the following dates from the firmware:

03 February 2007 (??:??:??): BinHeader firmware creation date
03 February 2007 (13:16:08): Linux Kernel compile date
29 October 2014 (16:53:25): SquashFS creation or last append time

First let's filter the filesystem attributes: we just want to display the mtime (modified) timestamp, so we are going to perform a micro analysis to include the value. The filter should be something like this: field must | field timestamp_desc | query: "mtime".

The histogram view is very helpful to get a big picture of what's going on:

We can clearly see that the files included/modified on 2014-10-29 had a malicious nature. The ~~state sponsored~~ attacker did not modify other files from the OpenWRT base image.

At this point it is pretty clear that the firmware was modified using the OpenWRT Image Builder, which is a pre-compiled OpenWrt build environment. The BinHeader and the Kernel timestamps were left untouched and the only partition modified was the SquashFS one.

Of course these timestamps, like any kind of metadata, could be tampered by the malicious hacker. However, they are very helpful during the initial phases, speeding up investigations and narrowing the analysis to a smaller set of data.

ELF Structural Information

I always get impressed when AV vendors manage to profile APT and State-sponsored attackers based on PE timestamps. Techniques like the imphash are generally used exclusively on Windows.

PE Imports are the functions that a piece of software calls from other files (typically DLLs). To track these imports, a hash is created based on library/API names and their specific order within the executable. Because of the way a PE’s import table is generated, we can use the imphash value to identify related malware samples, for example.

Everybody does that for Windows binaries but what about Linux? Virustotal recently included detailed ELF information on their engine. We can also use these sections to identify useful information from the binaries, including the toolchain used to compile them.

We generally don't have any timestamp information on the ELF section, but there are many other interesting fields. This quick guide on using strip summarizes some topics:

When an executable is produced from source code, there are two stages - compilation and linking. Compiling takes a source file and produces an object file. Linking concatenates these object files into a single executable. The concatenation occurs by section. For example, the .comment section for the final executable will contain the contents of the .comment section of each object file that was linked into the executable.

If we examine the contents of the .comment section we can see the compiler used, plus the version of the compiler

It's pretty simple to read and parse the .comment sections from ELF files. GNU readelf (part of binutils) and pyelftools include all the necessary functions parse them.

I always try to display information from object files using different toolchains in order to find out which one understands the file structure properly. On this specific case, I'm going to use mipsel-linux-gnu-readelf (part of Emdebian toolchain), but the regular readelf also does the job.

for i in $(find .) ; do echo $i ; mipsel-linux-gnu-readelf -p .comment $i ; done > comment-section.txt

./lib/modules/2.4.30/diag.o

String dump of section '.comment':
[ 1] GCC: (GNU) 3.4.4 (OpenWrt-1.0)

./lib/modules/2.4.30/switch-adm.o

String dump of section '.comment':
[ 1] GCC: (GNU) 3.4.4 (OpenWrt-1.0)

./lib/modules/2.4.30/switch-robo.o

String dump of section '.comment':
[ 1] GCC: (GNU) 3.4.4 (OpenWrt-1.0)

./lib/modules/2.4.30/switch-core.o

String dump of section '.comment':
[ 1] GCC: (GNU) 3.4.4 (OpenWrt-1.0)

./lib/modules/2.4.30/wlcompat.o

String dump of section '.comment':
[ 1] GCC: (GNU) 3.4.4 (OpenWrt-1.0)

Just a few ELF files included the comment section, others got stripped during the compilation/linking phase. If we download OpenWRT 0.9 sources we can see that GCC 3.4.4 was indeed used:

TheMoon Worm exploited a command injection to infect Linksys wireless routers with a self-replicating malware. If we analyze its .comment section, we can see that it was probably compiled and linked using GCC 4.2.4 and 3.3.2. If we search for a .comment section on the router E4200, targeted by the worm, we can't find any reference because the toolchain stripped all of them. Having a file compiled with a different toolchain or containing extra ELF sections (that others files don't) is something highly suspicious.

The .comment section for the final executable includes the contents of the .comment section of each object file that was linked into the executable. If we compare the comment section on ASUS RT-AC87U Firmwares v3.0.0.4.378.3885 and v3.0.0.4.376.2769, we can spot an extra line on the newer version from tfat.ko:

If you want to dump all sections from the ELF file you may use this command line (kind of hacky, but works):

for i in $(find .) ; do echo "$i" ; for j in $(readelf -S "$i" | grep \\[ | cut -d"]" -f2 | cut -d " " -f2 | grep -v "Name") ; do mipsel-linux-gnu-readelf -p "$j" "$i" ; done ; done > list.txt

The output will be a bit too verbose, you may want to narrow the analysis to the following sections:

.comment - contains version control information
.modinfo - displays information from a kernel module
.notes - comments put there by the compiler/linker toolchain
.debug - contains information for symbol debugging
.interp - contains the name of the dynamic loader

For more information regarding the ELF file structure, check the ELF man and the Chapter 5 from Malware Forensics Field Guide for Linux Systems.

Conclusion

Without further clues or context these information may not be relevant, but in conjunction with other data they're helpful to get a big picture of what's going on:

Diffing the content from previous firmwares may be useful to find out when backdoors were first installed, modified and/or removed.

Artifact timeline creation and analysis also helps to speed up investigations by correlating the vast amount of information found on system.

The contents from the ELF section will likely reveal the toolchain and the compiler version used to compile a suspect executable. Clues such as this are attribution identifiers, contributing towards identifying the platform used by the attacker to craft his code.

We can use the timestamps from the kernel partition to correlate different firmwares from the same family, for example. We can also compare the timestamps from each partition to find deviations: a firmware header created on 2007, with a Kernel timestamp from 2007 and a SquashFS partition dated to 2014 is highly suspicious.

The Firmware.RE project is performing a large scale analysis, providing a better understanding of the security issues related to firmwares. A broader view on firmwares is not only beneficial, but necessary to discover new vulnerabilities and backdoors, correlating different device families and showing how vulnerabilities reappear across different products. This is a really cool project to track how firmwares are evolving and getting security fixes.

All the media outlets are reporting that Embedded Malware is becoming mainstream. This is something totally new and we never heard of this before, right? The high number of Linux SOHO routers with Internet-facing administrative interfaces, the lack of firmware updates and the ease to craft exploits make them a perfect target for online criminals. The Internet of Threats is wildly insecure, but definitely not unpatchable.

To all infosec people out there, it's important to understand these threats and report it properly to the media. Some top-notch researchers recently uncovered "Massive Botnets" infecting refrigerators, microwaves, gaming consoles, soda machines and tamagotchis. The problem is that they never provided any sort of evidence, no malware samples, no IOC's and did not write a Hakin9 article describing it.

Refrigerator Botnet? Revd. Pastor Laphroaig says Show the PoC || GTFO

The aim for this post is to provide more information to identify/execute embedded binaries, describing how to set your own virtual lab. In case you missed it, head to the first post from the "Analyzing and Running binaries from Firmware Images" series.

TheMoon Worm

Johannes from SANS provided me a sample from "TheMoon" malware and posted some interesting information on their handler's diary. Their honeypots captured the scanning activity and linked the exploit to a vulnerable CGI script running on specific firmwares from the following Linksys routers: E4200, E3200, E3000, E2500, E2100L, E2000, E1550, E1500, E1200, E1000,E900.

SANS handlers classified TheMoon as a Worm because of the self-replicating nature of the malware. The worm searches for a "HNAP1" URL to fingerprint and identify potentially vulnerable routers. If you check your FW and Server logs you may find lot's of different IP's probing this URL.

The worm was named like this because it contains images from the movie "The Moon". It's possible to carve a few PNG's inside the ELF binary:

Identifying the Binary

A total of seven different samples were provided: they all seem to be variants from the same malware due to the ssdeep matching score.

Let's start by running the file utility and readelf to identify the architecture (MIPS R3000 / Little Endian):

The EXr.pdf variant (MD5 88a5c5f9c5de5ba612ec96682d61c7bb) had a VirusTotal Detection Rate of 3 / 50 on 2014-02-16.

QEMU

We'll be using QEMU to run the binaries on a controlled environment. I commonly use two different setups to run MIPS Linux binaries, both based on the Malta platform.

OpenWRT MIPS

OpenWRT Malta CoreLV platform is intended to be used with QEMU (in big or little endian mode). The install procedure is pretty straightforward using OpenWRT Buildroot. OpenWrt Buildroot is the buildsystem for the distribution and it works on Linux, BSD or MacOSX. In case you didn't remember, authors from Carna Botnet used it to cross-compile its binaries.

Installing prerequisites (on your favorite Debian Derivative):

Now head to the openwrt folder and set the proper settings for your Linux Kernel, choosing "MIPS Malta CoreLV board (qemu)" for the Target System and "Little Endian" for the subtarget. Don't forget to save the config.

Now build your image (use the -j switch to speed up if you have multiple cores, e.g "-j 3"):

Your image will be ready after a couple of minutes. Now you need to install QEMU full system emulation binaries and start it with the right command switches:

To exit the console simply hit CTRL+A followed by C and Q.

If you want to connect your emulated machined to a real network, follow the steps from Aurelien's Blog or simply run the following commands to get Internet access:

If you don't want to compile the Kernel by yourself, you can grab the pre-compiled binaries from here or here (at your own risk).

You may remember that it was not possible to run busybox-simet using the standalone qemu-mips-static. It's possible to fix that by manually patching QEMU or you can run it inside the proper virtual machine (OpenWRT Malta MIPS/Big Endian):

Debian MIPS Linux

I won't describe how to set up your Debian MIPS Linux because Zach Cutlip already did an amazing job describing it on this blog post. The process is quite similar to the OpenWRT one and if you're too lazy to build your own environment, Aurelien provides pre-compiled binaries here. Don't forget to set you network connections properly

Dynamic Analysis

In order to emulate the Linksys Environment, let's download and unpack the Firmware from E2500v2 (v1.0.07).

Let's copy and extract the root filesystem (e2500.tar.gz) and the malicious binary (EXr.pdf) to our test machine (Debian MIPS). Remember to copy the worm to the appropriate "/tmp" folder. Backup your QEMU image, start sniffing the connections from the bridged network (tap1 on my case) and bind the necessary pseudo-devices to the chrooted path. You can run the binary directly on your Debian MIPS environment, but using chroot and the target filesystem is highly recommended. If you try to chroot and run the worm without linking these devices, it will refuse to run and it won't drop the second stage binary.

You can use strace to log the syscalls and start your chrooted shell to run the malicious binary. I had some issues using strace on the 2.6.32 Debian MIPS Kernel (vmlinux-2.6.32-5-4kc-malta). The 3.2.0 (vmlinux-3.2.0-4-4kc-malta) version seems to be running fine.

If you don't want to use strace, simply start sh chrooted and run the malware:

The worm tries to remove files containing certain extensions and perform a series of system checks. After a few seconds the binary is removed from /tmp/ and three files are written on the disk: .L26 (PID), .L26.lunar (Lunar Base URL) and .L26.out (Debug log).

It's possible to dump QEMU's physical memory using the pmemsave command by hitting CTRL+A, C (to enter QEMU's administrative interface) and entering:

The 256MB raw dump will be saved on your host's local path. You can now try to use volatility or run strings against it.

The worm starts scanning for ports 80 and 8080 on a hardcoded list of networks. If the /HNAP/ URL returns a string identifying the targeted routers, the malware sends a HTTP POST trying to exploit a command injection on the vulnerable CGI.

Decoded POST:

TheMoon will also start an HTTPS server ("Lunar Base") on the router using the random port identified on the .L26.lunar file. The certificate's Common Name, Organization and Organizational Unit are hardcoded and other values seem to be random. Trying to find these entries on scans.io SSL certificates datasets would be really interesting.

The HTTPS server hosts three files: gerty.png, lunar.png and favicon.ico:

Rkhunter reports a few warnings on the infected system. I have upload the complete output from rkhunter to Pastebin, get it here.

Another useful technique is to compare the contents from the filesystem with a known good template. You can use binwally, WinMerge or binwalk's hashmatch.

Conclusion

I did not spend much time reversing the files and its functions as the main purpose of this post was to provide information to identify and execute embedded binaries, describing how to set your own virtual lab using QEMU.

It's still possible to improve the analysis by faking the nvram, by running a GDB server with QEMU or using Volatility with the proper profile and debugging structures, but this post is already way too long. You should also have a look on Avatar, from EURECOM. Avatar's goal is to enable complex dynamic analysis of embedded firmware in order to assist in a wide range of security-related activities, including malware analysis, reverse engineering and vulnerability discovery.

Let's keep drawing public awareness on the security issues of the Internet of Threats, persuading manufactures, ISP's and final users to collaborate to address these problems.

w00tsec

Monday, February 9, 2015

Firmware Forensics: Diffs, Timelines, ELFs and Backdoors

Monday, February 17, 2014

Analyzing Malware for Embedded Devices: TheMoon Worm