Monday, February 9, 2015

Firmware Forensics: Diffs, Timelines, ELFs and Backdoors

This post covers some common techniques that I use to analyze and reverse firmware images. These techniques are particularly useful to dissect malicious firmwares, spot backdoors and detect unwanted modifications.

Backdooring and re-flashing firmware images is becoming mainstream: malicious guys are infecting embedded devices and inserting trojans in order to achieve persistence. Recent articles covered the increasing number of trojanized android firmwares and routers that are being permanently modified.

Attackers with a privileged network position may MITM your requests and forge fake updates containing malicious firmwares. Writing Evilgrade modules for this is really simple, as most vendors keep failing to deliver updates securely, right ASUS?

All your HTTP packets are belong to us...
Older versions of ASUS firmwares were vulnerable to MITM attacks (CVE-2014-2718) because it transmitted updates over HTTP and there were no security/signature checks. ASUS silently patched the issue on and they're now verifying RSA signatures via /sbin/rsasign_check.:

Valid signature -> nvram_set("rsasign_check", "1")

NoConName 2014 CTF Finals: Vodka

I'll keep my tradition of writing posts based on CTF challenges because everybody upvotes CTF posts on reddit it's cool.

The challenge "Vodka", from NoConName 2014 CTF Finals was created by @MarioVilas, who kindly provided the files here (thanks dude!).

I did not participate on the CTF finals, but I found the challenge really interesting because there were many different ways to solve it, summarizing the actions needed to audit a compromised firmware. In my opinion, the best CTF challenges are the ones that require us to develop/use new techniques and improve existing tools.

NoConName 2014 Finals: Vodka
Challenge Category: Forensics
Description: No hints :( just get the flag.

This challenge description is not very intriguing, so I hired a couple of marketing specialists to design a new logo add some Infosec drama and reformulate it:

A mysterious bug affected one of the core routers at a major Internet service provider in Syria. The failure of this router caused the whole country to suddenly lose all connection to the Internet. The Syrian government recorded a traffic capture right before the crash and hired you to perform a forensic analysis.

Download provided:

Network Forensics

The download provided is a packet capture using the PCAP-NG format. Wireshark is too mainstream, so let's convert the PCAP-NG to PCAP and open it using Network Miner:

Network Miner makes it very easy for us to understand what's going on: there's some sort of file transfer via TFTP and the filename seems to be related to an OpenWRT firmware image.

Firmware structure

We always binwalk all the things but very few people stop to analyze and understand the firmware structure properly. We know that the firmware image was downloaded using TFTP, a common way used by many routers to transfer config files/updates and it is probably based on the OpenWRT project.

So what does binwalk tell us?

The Commom Firmware Environment (CFE) is a firmware interface/bootloader present on Broadcom SOCs. It is analogous to the BIOS on PC platforms and it is responsible for CPU initialization and bootstrap code on embedded processors. The CFE is also referred as PMON and it is generally mapped to mtd0.

The JFFS2/NVRAM filesystem is the non-volatile partition. They store all the configuration parameters, including router settings, passwords and logs.

Bear in mind firmware updates generally do not include the CFE/NVRAM partition. You can access the CFE console using serial and you can also dump them on a live system using DD or via SPI. Let's focus on the firmware sections included on the provided image (openwrt-wrtsl54gs-squasfs.bin):

TRX (Offset 0x20)

The TRX header is just an encapsulation, describing a series of information from the firmware, including the image size, CRC, flags, version information and partition offsets. Binwalk wasn't recognizing the header and the relative offsets properly so I submitted these two pull requests. Creating custom signatures for binwalk is pretty straightforward.

Some firmwares (like the newer ones from ASUS and Netgear) use this TRX structure but don't include a loader: the Linux Kernel and the RootFS may be shifted on this occasion.

If the firmware includes any extra header before the TRX, you have to sum their size with the displayed partition offsets in order to find the real values. Some firmwares for SOHO modems out there won't include it, so these values should be right on most cases. The downloaded OpenWRT image had the following offsets:

  • Loader: 0x20 + 0x1C = 0x3C
  • Kernel:  0x20 + 0x8D8 = 0x8F8
  • RootFS: 0x20 + 0x7E400 = 0x7E420

In this specific case, we have a BinHeader right before the TRX, indicating the board ID, the FW Date and the Hardware Date. The struct is described on cyutils.h:

This extra header appears on a few routers like the WRT54G series: the Web GUI checks for this pattern before actually writing the firmware.

We are particularly interested on the fwdate field (Firmware Date), composed by the hex values 07 02 03. According to addpattern.c, the first byte defines the year, the second one is the month and the third byte refers to the day the firmware was created. The fwdate seems to be 03-February-2007, save that for later, we will need that =)

GZ'd LZMA Loader (Offset 0x3C)

According to OpenWRT Wiki, the boot loader has no concept of filesystems: it assumes that the start of the TRX data section is executable code.

The boot loader boots into an LZMA program which decompresses the kernel into RAM and executes it. It turns out the boot loader does know gzip compression, so we have a gzip-compressed LZMA decompression program at 0x3C.

You can find the source code for this lzma-loader here and here. Note the TEXT_START offset at 0x80001000: we may need to adjust the Loading Address on our Disassembler in order to reverse the compiled loader. Don't forget to decompress it (gunzip) before reversing the file.

Most embedded toolchains would strip the binaries in order to reduce the firmware size. If you want to reverse a friendlier version of the loader, grab the latest OpenWRT ImageBuilder and search for loader.elf:

Woohoo, blue code =)

Note that if we modify the loader to include a backdoor, we would have our very own Router Bootkit, cool isn't it?

LZMA'd Kernel (Offset 0x8F8)

Instead of just putting a kernel directly onto flash, most embedded devices compress the kernel using LZMA. The boot loader boots into an LZMA program which decompresses the kernel into RAM and executes it.

Binwalk has a signature to find Kernel strings in raw Linux Kernels. The identified string lists the toolchain used to compile the Kernel, as well as the compiled date and version information:

And why did binwalk manage to find all these information from the Kernel? The answer can be found on the toolchain's Makefile:

If we follow the steps from my previous post we can build a customized Kernel for OpenWRT. The generated vmlinux is generally an ELF file, but in our case, the object was stripped using objcopy:

Did you notice the compile date was 03-February-2007? Let's save that for later as well.

SquashFS (Offset 0x72420)

The last part is the actual filesystem. Most embedded Linux devices use SquashFS and many vendors hack it in order to get better compression and faster performance. Hopefully we don't have to worry about that as Sasquatch handles different SquashFS header/compression formats.

The filesystem has the standard OpenWRT directories and files, including a banner from the 0.9 build (White Russian).

Both binwalk and sasquatch display the SquashFS superblock information, including the creation/last append time:

Did you spot the date 29-October-2014? There's definitely something going on here =)

Directory Tree Diff & Fuzzy Hashing

Now that we have unpacked & unsquashed the firmware, let's use binwally to compare the directory tree and find the needle in the haystack.

After googling the filename (openwrt-wrtsl54gs-squashfs.bin), we get three possible candidates:

OpenWRT offers different builds for the same device because of constraints like limited flash size. Let's download these three candidates, unpack and compare them: ctf/_openwrt-wrtsl54gs-squashfs.bin.extracted/ micro/_openwrt-wrtsl54gs-squashfs.bin.extracted/

The "micro" build has the highest overall match score (99%), let's spot the differences: ctf/_openwrt-wrtsl54gs-squashfs.bin.extracted/ micro/_openwrt-wrtsl54gs-squashfs.bin.extracted/ | grep -E -v "ignored|matches"

After carefully reviewing these files, we notice that the "/etc/profile" was modified to include a call to the nc backdoor.

The LZMA'd Kernel (offset 0x8F8) is the same on both images, even though binwally reports a difference. This happens because binwalk extraction doesn't know when to stop and both files also contain additional data like the SquashFS partition.

The backdoor located at "/bin/nc" is a simple bash script that checks the MD5 from "/etc/profile" and draws a Nyan Cat along with the challenge key. In order to get the proper key, we simply modify the file location to the relative path "./etc/banner", to avoid overlapping with the file from the original system.

After running the file, we get the key NCNdeadb6adec4c77a40c23e04770924d3c5b18face.

This was just too easy right? But what if we didn't have a known template for comparison?

Timeline Analysis

My tool of choice to perform timeline analysis is Plaso, created by @el_killerdwarf. The tool is python-based, modular and very fast. What I like most about it is the ease to output results to ELK. If you don't know about Plaso and the ELK stack, read this quick tutorial and set up your environment.

Let's use log2timeline to create a dump file, pointing to the extracted SquashFS path: output.dump squashfs-root/

Let's fire up psort and include data in the timeline: -o elastic output.dump

That's all, Plaso uses the filestat parser to extract metadata from the files, outputting results to Elasticsearch.

We already identified the following dates from the firmware:

  • 03 February 2007 (??:??:??): BinHeader firmware creation date
  • 03 February 2007 (13:16:08): Linux Kernel compile date
  • 29 October  2014 (16:53:25): SquashFS creation or last append time

First let's filter the filesystem attributes: we just want to display the mtime (modified) timestamp, so we are going to perform a micro analysis to include the value. The filter should be something like this: field must | field timestamp_desc | query: "mtime".

The histogram view is very helpful to get a big picture of what's going on:

We can clearly see that the files included/modified on 2014-10-29 had a malicious nature. The state sponsored attacker did not modify other files from the OpenWRT base image.

At this point it is pretty clear that the firmware was modified using the OpenWRT Image Builder, which is a pre-compiled OpenWrt build environment. The BinHeader and the Kernel timestamps were left untouched and the only partition modified was the SquashFS one.

Of course these timestamps, like any kind of metadata, could be tampered by the malicious hacker. However, they are very helpful during the initial phases, speeding up investigations and narrowing the analysis to a smaller set of data.

ELF Structural Information

I always get impressed when AV vendors manage to profile APT and State-sponsored attackers based on PE timestamps. Techniques like the imphash are generally used exclusively on Windows.

PE Imports are the functions that a piece of software calls from other files (typically DLLs). To track these imports, a hash is created based on library/API names and their specific order within the executable. Because of the way a PE’s import table is generated, we can use the imphash value to identify related malware samples, for example.

Everybody does that for Windows binaries but what about Linux? Virustotal recently included detailed ELF information on their engine. We can also use these sections to identify useful information from the binaries, including the toolchain used to compile them.

We generally don't have any timestamp information on the ELF section, but there are many other interesting fields. This quick guide on using strip summarizes some topics:

When an executable is produced from source code, there are two stages - compilation and linking. Compiling takes a source file and produces an object file. Linking concatenates these object files into a single executable. The concatenation occurs by section. For example, the .comment section for the final executable will contain the contents of the .comment section of each object file that was linked into the executable.
If we examine the contents of the .comment section we can see the compiler used, plus the version of the compiler
It's pretty simple to read and parse the .comment sections from ELF files. GNU readelf (part of binutils) and pyelftools include all the necessary functions parse them.

I always try to display information from object files using different toolchains in order to find out which one understands the file structure properly. On this specific case, I'm going to use mipsel-linux-gnu-readelf (part of Emdebian toolchain), but the regular readelf also does the job.

for i in $(find .) ; do echo $i ; mipsel-linux-gnu-readelf -p .comment $i ; done > comment-section.txt

String dump of section '.comment':
  [     1]  GCC: (GNU) 3.4.4 (OpenWrt-1.0)


String dump of section '.comment':
  [     1]  GCC: (GNU) 3.4.4 (OpenWrt-1.0)


String dump of section '.comment':
  [     1]  GCC: (GNU) 3.4.4 (OpenWrt-1.0)


String dump of section '.comment':
  [     1]  GCC: (GNU) 3.4.4 (OpenWrt-1.0)


String dump of section '.comment':
  [     1]  GCC: (GNU) 3.4.4 (OpenWrt-1.0)

Just a few ELF files included the comment section, others got stripped during the compilation/linking phase. If we download OpenWRT 0.9 sources we can see that GCC 3.4.4 was indeed used:

TheMoon Worm exploited a command injection to infect Linksys wireless routers with a self-replicating malware. If we analyze its .comment section, we can see that it was probably compiled and linked using GCC 4.2.4 and 3.3.2. If we search for a .comment section on the router E4200, targeted by the worm, we can't find any reference because the toolchain stripped all of them. Having a file compiled with a different toolchain or containing extra ELF sections (that others files don't) is something highly suspicious.

The .comment section for the final executable includes the contents of the .comment section of each object file that was linked into the executable. If we compare the comment section on ASUS RT-AC87U Firmwares v3. and v3., we can spot an extra line on the newer version from tfat.ko:

If you want to dump all sections from the ELF file you may use this command line (kind of hacky, but works):

for i in $(find .) ; do echo "$i" ; for j in $(readelf -S "$i" | grep \\[ | cut -d"]" -f2 | cut -d " " -f2 | grep -v "Name") ; do mipsel-linux-gnu-readelf -p "$j" "$i" ; done ; done > list.txt

The output will be a bit too verbose, you may want to narrow the analysis to the following sections:

  • .comment - contains version control information
  • .modinfo - displays information from a kernel module
  • .notes - comments put there by the compiler/linker toolchain
  • .debug - contains information for symbol debugging
  • .interp - contains the name of the dynamic loader

For more information regarding the ELF file structure, check the ELF man and the Chapter 5 from Malware Forensics Field Guide for Linux Systems.


Without further clues or context these information may not be relevant, but in conjunction with other data they're helpful to get a big picture of what's going on:

  • Diffing the content from previous firmwares may be useful to find out when backdoors were first installed, modified and/or removed.

  • Artifact timeline creation and analysis also helps to speed up investigations by correlating the vast amount of information found on system.

  • The contents from the ELF section will likely reveal the toolchain and the compiler version used to compile a suspect executable. Clues such as this are attribution identifiers, contributing towards identifying the platform used by the attacker to craft his code.

We can use the timestamps from the kernel partition to correlate different firmwares from the same family, for example. We can also compare the timestamps from each partition to find deviations: a firmware header created on 2007, with a Kernel timestamp from 2007 and a SquashFS partition dated to 2014 is highly suspicious.

The Firmware.RE project is performing a large scale analysis, providing a better understanding of the security issues related to firmwares. A broader view on firmwares is not only beneficial, but necessary to discover new vulnerabilities and backdoors, correlating different device families and showing how vulnerabilities reappear across different products. This is a really cool project to track how firmwares are evolving and getting security fixes.


  1. After extracted, I got these files:
    3C 7E420.squashfs 8F8 8F8.7z

    How to use gunzip to decompress 3C file ?

    1. Binwalk seems to be extracting gzip compressed data automatically, so you don't need to gunzip it:

      $ binwalk -Me openwrt-wrtsl54gs-squashfs.bin

      Scan Time: 2015-02-21 17:11:18
      Target File: openwrt-wrtsl54gs-squashfs.bin
      MD5 Checksum: 89eb04626aef962d5d15d584c500a7ab
      Signatures: 328

      0 0x0 BIN-Header, board ID: W54U, hardware version: 4702, firmware version: 2.8.8, build date: 2007-02-03
      32 0x20 TRX firmware header, little endian, image size: 1323008 bytes, CRC32: 0x6CAC483, flags: 0x0, version: 1, header size: 28 bytes, loader offset: 0x1C, linux kernel offset: 0x8D8, rootfs offset: 0x7E400
      60 0x3C gzip compressed data, maximum compression, from Unix, NULL date (1970-01-01 00:00:00)
      2296 0x8F8 LZMA compressed data, properties: 0x6D, dictionary size: 8388608 bytes, uncompressed size: -1 bytes
      517152 0x7E420 Squashfs filesystem, little endian, version 2.1, size: 805671 bytes, 269 inodes, blocksize: 65536 bytes, created: 2014-10-29 18:53:25

      $ binwalk _openwrt-wrtsl54gs-squashfs.bin.extracted/3C


      $ hexdump -C -s 0x3C openwrt-wrtsl54gs-squashfs.bin | head
      0000003c 1f 8b 08 00 00 00 00 00 02 03 a5 56 41 6c 1b 69 |...........VAl.i|
      0000004c 15 fe fc cf 24 71 d2 b8 4c 1c b7 9a 96 aa 9a bf |....$q..L.......|

      $ hexdump -C 3C | head
      00000000 30 80 0a 3c 00 00 44 21 00 80 05 3c 30 11 a5 24 |0..<..D!...<0..$|
      00000010 00 80 06 3c 78 1e c6 24 00 00 a8 8c 00 00 88 ac |...<x..$........|

    2. I used mipsb processor in IDA to open this file but IDA couldn't decode the instructions. What is the correct processor type of this ?

    3. You can use these checks to find the correct processor:

      $ binwalk 3C -A

      972 0x3CC MIPSEL instructions, function epilogue
      1204 0x4B4 MIPSEL instructions, function epilogue
      1324 0x52C MIPSEL instructions, function epilogue
      1496 0x5D8 MIPSEL instructions, function epilogue
      1776 0x6F0 MIPSEL instructions, function epilogue
      1908 0x774 MIPSEL instructions, function epilogue
      2056 0x808 MIPSEL instructions, function epilogue
      2152 0x868 MIPSEL instructions, function epilogue
      2360 0x938 MIPSEL instructions, function epilogue
      2576 0xA10 MIPSEL instructions, function epilogue
      3696 0xE70 MIPSEL instructions, function epilogue

      $ binwalk 3C --disasm -v
      Scan Time: 2015-02-23 00:05:57
      Target File: 3C
      MD5 Checksum: 3108cdb7f0ab697feff245032b282109

      908 0x38C MIPS executable code, 32/64-bit, little endian, at least 699 valid instructions
      908 0x38C lw $v0, 0x94($sp)
      912 0x390 j 0x30024c
      916 0x394 addu $v1, $v1, $v0
      920 0x398 ori $v0, $v0, 0x1000
      924 0x39C jalr $v0
      928 0x3A0 nop

      Try to disassemble this loader before (in ELF format) ->