Substrate - hooking C on Android

Substrate is one of the best dynamic instrumentation frameworks. It is very flexible and allows you to easily hook various Java, Objective-C or native C/C++ of your Android or iOS apps. It can even hook non-exported functions, but that’s a matter for another blog post.

I use Substrate quite a lot during mobile app security testing, along with various other tools. This post is the second of a two part walkthrough on hooking C code on iOS and Android platforms using Substrate. This post aims to provide you with a start to finish demonstration of how you can hook C functions on Android. Check out the post on iOS here: https://hexplo.it/substrate-hooking-native-code-iosandroid/

Of course, Substrate is by no means the only tool to offer hooking. Plain-old libary injection can achieve similar results and there are newer and shiny options such as Frida. Each comes with its prons and cons, for example if you prefer library injection to ptracing etc. This post is not about comparing hooking approaches.

Hooking C functions on Android

Preparing the environment

I’m going to assume you’re using linux for building your app code and substrate modules. Windows or Mac can also be used. You also need to have a rooted device to load your substrate modules on to.

PRE-REQ: Substrate on the device

We will be installing CydiaSubstrate on your rooted device and using it to perform our hooking. It can be easily installed via the apk but for the purpose of this article, I’m not going to document this. Please note that an emulator can be used instead of a physical device. However installing substrate on an emulator is tricky. Another blog post will follow for that part.

PRE-REQ: Android SDK & NDK

You will need the Android SDK which can be obtained from the Android Developer Site. You will also need the Android NDK whick can be obtained from the Android Developer Site as well. As we’ll be using command line tools only, an IDE is not necessary for following this blogpost. You’ll also need to install ant to build apps easily.

PRE-REQ: Substrate SDK headers

Substrate modules are built in your development environment and then loaded onto your device, just like Android applications. In order to build them you are going to need the Substrate SDK headers. Instractions on how to install those are found here.

Make sure Substrate works

It’s a good idea to double check that Substrate modules work before proceeding further than this point. You can easily do so by installing a harmless Substrate module such as Violet.apk - This just changes the font color of some system menus to pink.

Create a target application

Let’s create a small Android application that includes a native C library. We’ll call this the “target app” (as in hooking target). Once this Android application is loaded, it will execute a function from the included native library which prints a message. The application name will be targetApp1 and the package name is io.koz.targetApp1

You can download the full code for this app (targetApp1) here

First, create a new folder and in it, the project structure, using the following command (this assumes your paths are set-up correctly)

create project --target android-22 --path targetApp1 --package io.koz.targetApp1 --activity targetApp1

Then, the source code of the native C library. Create a directory named jni inside targetApp. Save the following code as targetLib.c inside the jni folder.

#include <string.h>
#include <jni.h>
#include <android/log.h>
#include <unistd.h>
#include <stdarg.h>
#include <stdio.h>

#define  LOGI(...)  __android_log_print(ANDROID_LOG_INFO, "targetApp1-native", __VA_ARGS__)
int getAge(void)
{
    LOGI("[i] Verbose - getAge located at %p\n", &getAge);
    return 21;
}
jstring
Java_io_koz_targetApp1_targetApp1_doThings( JNIEnv* env,
                                                  jobject thiz )
{
    int r = arc4random() % 10000;
    LOGI("[+] John Smith is %d years old.\n", getAge());
    LOGI("[+] The totally reliable random seed is: %d\n\n", r);
    return 0;
}

As you can see, the doThings() functions, when run by the Java part of the app, will print out two statements to the system log: one containing the integer 21, returned from the getAge() function and next, the result of a call to arc4random(). The implementation of arc4random() lives inside the bionic standard library (on the system), while the implementation of getAge() is inside our native library.

This is just the native library part of the target application. To build it, you need this Android.mk file to be placed in the jni folder with the following contents:

LOCAL_PATH := $(call my-dir)
include $(CLEAR_VARS)
LOCAL_MODULE    := targetLib
LOCAL_SRC_FILES := targetLib.c
LOCAL_LDLIBS := -L$(SYSROOT)/usr/lib -llog
include $(BUILD_SHARED_LIBRARY)

You will also need an Application.mk file in your jni folder to build the library for both ARM and x86 architectures. However please note that the contents of this blog post have only been tested on ARM - let me know if there are issues while using x86.

APP_ABI := armeabi,armeabi-v7a,x86

We also need the java part of the application. Save the following code segment in your target app folder under src\io\koz\targetApp1\targetApp1.java (replace the contents of this file if it already exists):

package io.koz.targetApp1;

import android.app.Activity;
import android.util.Log;
import android.view.View;
import android.widget.Button;
import android.widget.TextView;
import android.os.Bundle;

public class targetApp1 extends Activity
{
    /** Called when the activity is first created. */
    @Override
    public void onCreate(Bundle savedInstanceState)
    {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);
        final Button button = (Button) findViewById(R.id.button);
        button.setOnClickListener(new View.OnClickListener() {
            public void onClick(View v) {
				doThings();
            }
        });
    }
    public native String doThings();
    static {
        System.loadLibrary("targetLib");
    }
}

Add a button by adding the following code in res\layout\main.xml

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:orientation="vertical"
    android:layout_width="fill_parent"
    android:layout_height="fill_parent"
    >
<TextView
    android:layout_width="fill_parent"
    android:layout_height="wrap_content"
    android:text="Hello World, targetApp1"
    />
<Button
    android:id="@+id/button"
    android:layout_width="wrap_content"
    android:layout_height="wrap_content"
    android:text="Click this to run doThings()"
    />
</LinearLayout>          

Build and run it:

First, ndk-build then ant debug. Install it using adb install bin/targetApp1-debug.apk

Here are two example runs (use adb logcat to view the print statements)

I/targetApp1-native( 2256): [i] Verbose - getAge located at 0x4a777d4d
I/targetApp1-native( 2256): [+] John Smith is 21 years old.
I/targetApp1-native( 2256): [+] The totally reliable random seed is: 9942
I/targetApp1-native( 2256):
I/targetApp1-native( 2256): [i] Verbose - getAge located at 0x4a777d4d
I/targetApp1-native( 2256): [+] John Smith is 21 years old.
I/targetApp1-native( 2256): [+] The totally reliable random seed is: -2411

Let’s verify a few things about the library by running two commands.

  • $ nm -aDC --defined-only libs/armeabi-v7a/libtargetLib.so

output (ignore all starting with_)

00000d4c T getAge
00000d75 T Java_io_koz_targetApp1_targetApp1_doThings

This command shows which symbols are exported from the binary. Other libraries or applications can call these functions, thus run the relevant function implemented by the library. The Java_io_koz_targetApp1_targetApp1_doThings symbol is the one that gets called from the system when it executes the Java function doThings()

The following command shows which symbols are required by the binary to run, but are undefined: This is because their implementation lives in a different library, such as bionic.

  • $ nm -aDCu libs/armeabi-v7a/libtargetLib.so

output (check only the Undefined symbols)

         U abort
         U arc4random
         U memcpy
         U raise

You’ll notice that the library calls arc4random(), an undefined symbol.

Create a Substrate Module

Now that the target app is up and runnding, we’ll create a substrate module that “attacks” the target app. We will attempt to hook the getAge() and arc4random() functions at runtime and change the value they return.

hooking external functions

Creating a Substrate Module is similar to creating an android application with a native library.

You can download the full code for this app (nativeHook1) here

android create project --target android-22 --name nativeHook1 --package io.koz.nativeHook1 --path nativeHook1 --activity nativeHookActivity1

In the nativeHook1 folder that was created, the create project command should have created an AndroidManifest.xml file. Replace this with the following:

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
      android:installLocation="internalOnly"
      package="io.koz.nativeHook1"
      android:versionCode="1"
      android:versionName="1.0">
    <application android:label="@string/app_name" android:icon="@drawable/ic_launcher" android:hasCode="false" />
    <uses-permission android:name="cydia.permission.SUBSTRATE" />
</manifest>

Create a folder named jni. In it, create an Android.mk file:

LOCAL_PATH := $(call my-dir)
include $(CLEAR_VARS)
SUBSTRATE_LIB_PATH_ARM := [your/path/to/adroid/sdk]/extras/saurikit/cydia_substrate/lib/armeabi
SUBSTRATE_LIB_PATH_x86 := [your/path/to/adroid/sdk]/extras/saurikit/cydia_substrate/lib/x86
LOCAL_MODULE    := nativeHook1.cy
LOCAL_SRC_FILES := nativeHook1.cy.cpp
LOCAL_LDLIBS := -L$(SUBSTRATE_LIB_PATH_ARM) -L$(SUBSTRATE_LIB_PATH_x86) -lsubstrate -lsubstrate-dvm -llog       
LOCAL_C_INCLUDES := [your/path/to/adroid/sdk]/sdk/extras/saurikit/cydia_substrate
include $(BUILD_SHARED_LIBRARY)

Adjust the paths above as needed for the location of your SDK.

Also create an Application.mk file with the following contents:

APP_ABI := armeabi,armeabi-v7a,x86

Finally, the code of the module in jni\nativeHook1.cy.cpp

#include <android/log.h>
#include <substrate.h>

#define LOG_TAG "SUBhook"

#define LOGI(...)  __android_log_print(ANDROID_LOG_INFO, LOG_TAG, __VA_ARGS__)

void cigi_hook(void *orig_fcn, void* new_fcn, void **orig_fcn_ptr)
{
	MSHookFunction(orig_fcn, new_fcn, orig_fcn_ptr);
}
MSConfig(MSFilterExecutable, "/system/bin/app_process")

int (*original_arc4random)(void);
int replaced_arc4random(void)
{
    return 1234;
}
MSInitialize {
    cigi_hook((void *)arc4random,(void*)&replaced_arc4random,(void**)&original_arc4random);
}

This code will replace the return value of the arc4random() function so that it returns 1234. This is the easy part, because arc4random() is a function of the standard android libraries - it is not defined in the target library code. Thus, there is no need to know the address of the symbol - it will be automatically resolved, as the function is exported by bionic.

We build the module using ndk-build, ant debug and install it on the device with adb install bin/nativeHook1-debug.apk. Then, a soft-reset of substrate is required for the new module to be loaded. This can be done either via the relevant button in the substrate application or via using the following command: adb shell setprop ctl.restart zygote

Results:

I/targetApp1-native( 2814): [i] Verbose - getAge located at 0x4a77dd4d
I/targetApp1-native( 2814): [+] John Smith is 21 years old.
I/targetApp1-native( 2814): [+] The totally reliable random seed is: 1234

I/targetApp1-native( 2814): [i] Verbose - getAge located at 0x4a77dd4d
I/targetApp1-native( 2814): [+] John Smith is 21 years old.
I/targetApp1-native( 2814): [+] The totally reliable random seed is: 1234

hooking internal exported functions

Now, we will attempt to hook getAge(). In this case, getAge() is an exported symbol, as there are no visibility options set and it is not static. Any other application can call this symbol by referencing its address, which can be looked up using the dladdr() family of functions.

You can download the full code for this app (nativeHook2) here

Let’s change our hooking code to the following:

#include <android/log.h>
#include <substrate.h>
#include <stdio.h>

#define LOG_TAG "SUBhook"

#define LOGI(...)  __android_log_print(ANDROID_LOG_INFO, LOG_TAG, __VA_ARGS__)

void cigi_hook(void *orig_fcn, void* new_fcn, void **orig_fcn_ptr)
{
	MSHookFunction(orig_fcn, new_fcn, orig_fcn_ptr);
}
MSConfig(MSFilterExecutable, "/system/bin/app_process")

int (*original_getAge)(void);
int replaced_getAge(void) {
    return 99;
}
int (*original_arc4random)(void);
int replaced_arc4random(void)
{
    return 1234;
}
void* lookup_symbol(char* libraryname,char* symbolname)
{
	void *imagehandle = dlopen(libraryname, RTLD_GLOBAL | RTLD_NOW);
	if (imagehandle != NULL){
		void * sym = dlsym(imagehandle, symbolname);
		if (sym != NULL){
			return sym;
			}
		else{
			LOGI("(lookup_symbol) dlsym didn't work");
			return NULL;
		}
	}
	else{
		LOGI("(lookup_symbol) dlerror: %s",dlerror());
		return NULL;
	}
}
MSInitialize {
    cigi_hook((void *)arc4random,(void*)&replaced_arc4random,(void**)&original_arc4random);
    void * getAgeSym = lookup_symbol("/data/data/io.koz.targetApp1/lib/libtargetLib.so","getAge");
    cigi_hook(getAgeSym,(void*)&replaced_getAge,(void**)&original_getAge);
}

What is going on here? We use a lookup_symbol() function to look up the address of the getAge() function inside the libtargetLib.so library, shipped with our target application. It must be noted that our substrate module will first run inside the context of app_process, also known as zygote. This is the initial process where all android applications are forked from. This process does not have the libtargetLib.so library loaded into its process space, so a normal direct lookup for the symbol would not work. What we are doing here is that we are manually opening and loading the ELF library in zygote’s memory using dlopen(), then looking up the symbol using dlsym().

Substrate provides MSGetImageByName() and MSFindSymbol() for the same task, but they didn’t work correctly for me. If you can make them work, please do get in touch!

Results..

E/targetApp1-native( 5272): [+] John Smith is 99 years old.
E/targetApp1-native( 5272): [+] The totally reliable random seed is: 1234

E/targetApp1-native( 5272): [+] John Smith is 99 years old.
E/targetApp1-native( 5272): [+] The totally reliable random seed is: 1234

Success! We’ve successfully hooked both arc4random(), called from bionic, and the internal getAge() et.

hooking internal non-exported functions

But, what if getAge() is not exported?

Developers are advised to export the minimum required number of functions (more on this in another blog post..). getAge() should have been marked as static. Alternatively, -fvisibility=hidden should have been used, with JNIEXPORT in our doThings() and nowhere else. This would hide all symbols except doThings(), which is exported as it must be called from our Java app. [even that symbol can be non-exported using another technique]

If we recompile the app after marking getAge() as static, the following will happen:

  • $ nm -aDC --defined-only libs/armeabi-v7a/libtargetLib.so"

output (ignore all starting with ‘_’ )

00000d75 T Java_io_koz_targetApp2_targetApp2_doThings

As you can see, the getAge symbol is not visible any more. This means that our previous Substrate module will only be able to hook arc4random() but not getAge() because dlsym() will not be able to find the address of the symbol.

You can download the full code for this app (targetApp2) here

Here is the output after running our target app and test substrate moodule:

E/HELLOJNI-native(19646): [+] John Smith is 21 years old.
E/HELLOJNI-native(19646): [+] The totally reliable random seed is: 1234

We can still hook the function, provided we can reliably predict where this function is in memory. After all a symbol is essentially a memory pointer.

Locating the function statically

We need to locate the offset where our function lives inside the library. If the symbol is exported this would have been easy. Recall the output of the nm command on the first version of the target library, where getAge() was not static:

00000d4d T getAge
00000d75 T Java_io_koz_targetApp1_targetApp1_doThings

In this case 0xd4d is the offset we are looking for. However now this line does not exist in nm output of the library where getAge is not visible.

Open "libtargetlib.so" inside a dissassembler of your choise, e.g. Hopper or IDA. objdump could also be used. The goal is to manually locate the getAge function without knowing its name. It should look something like this:

======== B E G I N N I N G   O F   P R O C E D U R E ========
             sub_d2c:
00000d2e         movs       r0, #0x4
00000d30         ldr        r1, = 0x16c2            ; 0xd44 (sub_d2c + 0x18)
00000d32         ldr        r2, = 0x16d0            ; 0xd48 (sub_d2c + 0x1c)
00000d34         ldr        r3, = 0xffffffef        ; 0xd4c (sub_d2c + 0x20)
00000d36         add        r1, pc                  ; 0x23fc
00000d38         add        r2, pc                  ; 0x240c
00000d3a         add        r3, pc                  ; 0xd2d (sub_d2c + 0x1)
00000d3c         blx        __android_log_print@PLT
00000d40         movs       r0, #0x15
00000d42         pop        {r3, pc}                ; endp
00000d44         dd         0x000016c2              ; XREF=sub_d2c+4
00000d48         dd         0x000016d0              ; XREF=sub_d2c+6
00000d4c         dd         0xffffffef              ; XREF=sub_d2c+8

In this case the function is easy to spot as there’s only a few functions to go through, and there’s no code obfuscation involved. Also, we know that our target function returns 21 (0x15 in hex found in movs r0, #0x15) and that it has a call to the android log print function.

At the start of the above snippet we can see that the function is located at address 0xd2c and it is named sub_d2c. We can use this in our module to hook the function. However, it is not that simple: 0xd2c is the offset from the library base address. The library itself is loaded at some place in memory: We need to find this, at runtime, then add the function offset.

Updating our module to use a function address offset

If you noticed during our previous example, our substrate module loaded the target native library in zygote’s memory before even starting our target application. This has a side effect that when our target application starts, because it forks zygote, it will receive an exact copy of zygote’s memory space. Thus, the target native library will be have the same base address! We still need to locate the base address. The following code does just that. In this code, we just load the library into zygote’s memory and then read the base address from /proc/self/maps.

void * get_base_of_lib_from_maps(char *soname)
{
  void *imagehandle = dlopen(soname, RTLD_LOCAL | RTLD_LAZY);
  if (soname == NULL)
    return NULL;
  if (imagehandle == NULL){
	  return NULL;
  }
  uintptr_t * irc = NULL;
  FILE *f = NULL;
  char line[200] = {0};
  char *state = NULL;
  char *tok = NULL;
  char * baseAddr = NULL;
  if ((f = fopen("/proc/self/maps", "r")) == NULL)
    return NULL;
  while (fgets(line, 199, f) != NULL)
  {
    tok = strtok_r(line, "-", &state);
    baseAddr = tok;
    tok = strtok_r(NULL, "\t ", &state);
    tok = strtok_r(NULL, "\t ", &state); // "r-xp" field
    tok = strtok_r(NULL, "\t ", &state); // "0000000" field
    tok = strtok_r(NULL, "\t ", &state); // "01:02" field
    tok = strtok_r(NULL, "\t ", &state); // "133224" field
    tok = strtok_r(NULL, "\t ", &state); // path field

    if (tok != NULL) {
      int i;
      for (i = (int)strlen(tok)-1; i >= 0; --i) {
        if (!(tok[i] == ' ' || tok[i] == '\r' || tok[i] == '\n' || tok[i] == '\t'))
          break;
        tok[i] = 0;
      }
      {
        size_t toklen = strlen(tok);
		size_t solen = strlen(soname);
		if (toklen > 0) {
		  if (toklen >= solen && strcmp(tok + (toklen - solen), soname) == 0) {
			fclose(f);
            return (uintptr_t*)strtoll(baseAddr,NULL,16);
		  }
		}
      }
    }
  }
  fclose(f);
  return NULL;
}

A different way for locating our base address is by looking up the soinfo structures in memory. This will only work on ARM, not x86, as there’s some magic going on.

void * get_base_of_lib_from_soinfo(char *soname)
{
  if (soname == NULL)
    return NULL;
  void *imagehandle = dlopen(soname, RTLD_LOCAL | RTLD_LAZY);
  if (imagehandle == NULL){
          return NULL;
  }
        char *basename;
        char *searchname;
        int i;
        void * libdl_ptr=dlopen("libdl.so",3);
        basename = strrchr(soname,'/');
        searchname = basename ? basename +1 : soname;
        for(i =(int) libdl_ptr; i!=NULL; i=*(int*)(i+164)){
                if(!strcmp(searchname,(char*)i)){
                        unsigned int *lbase= (unsigned int*)i+140;
                        void * baseaddr = (void*)*lbase;
                        return baseaddr;
                }
        }
        return NULL;
}

So we insert this function in our module source code and update it as follows. Assuming that using static analysis we found getAge() is at offset 0xdc2, then:

    void* lib_base = get_base_of_lib_from_maps("/data/app-lib/io.koz.targetApp2-1/libtargetLib.so");
//OR
//    void* lib_base = get_base_of_lib_from_soinfo("/data/app-lib/io.koz.targetApp2-1/libtargetLib.so");

	LOGI("lib base is %p",lib_base);
	if (lib_base!=NULL){
		void * getAgeSym = lib_base + 0xd2d;
		LOGI("getAge() should be at %p. Let's hook it",getAgeSym);
		cigi_hook(getAgeSym,(void*)&replaced_getAge,(void**)&original_getAge);

Note that we used 0xd2d instead of 0xd2c as the offset. This is because we know that the assembly instruction at 0xd2c is a thumb instruction, thus we need to make sure the processor knows that by adding +1 to make it odd.

And the result:

I/targetApp2-native(25806): [+] John Smith is 99 years old.
I/targetApp2-native(25806): [+] The totally reliable random seed is: 1234

You can find the code for all examples on my github page

mobile security, static & dynamic analysis, automation, payments

London, UK