Archive for October, 2009

Getting Pointers from Leaky Interpreters

Thursday, October 29th, 2009

Note: I haven’t seen this anywhere before but I wouldn’t be surprised if it had been done, so let me know if I should credit someone. It was inspired in some really abstract way by a USENIX Security paper from 2003.

Goal

To create an ActionScript function that takes an Object and computes the address when run in Tamarin.

Introduction

Despite the growing adoption of SDLs and the proliferation of code analysis tools, many commercial applications are still sprinkled with memory corruption bugs. Microsoft has implemented address space layout randomization (ASLR) and data execution prevention (DEP) to make the exploitation of these vulnerabilities much more difficult. Researchers and attackers have been forced to develop application specific strategies to circumvent the mitigations. Mark Dowd and Alex Sotirov wrote a really useful Blackhat USA 2008 paper explaining the implementation decisions made for each of the mitigations (for each version of Windows) and some attacks they’ve developed to take advantage of those decisions. I waited way too long to read this paper — don’t make my mistake. These techniques are not universal and for each “class” of application new exploit techniques must be developed. Thinking about application specific techniques has been fun and produced a few cool gadgets. This note talks about one of them.

Knowing the address of attacker controllable data before triggering an exploit is useful (if not necessary). Many times, a heap spray is enough to place some shellcode or data structure at a known address. Despite their effectiveness, heap sprays just feel dirty. So, out of a desire to pimp my exploits, I set out in search of a way to leak addresses of attacker controlled structures.

Tagged Pointers

Many interpreters represent atomic objects (we’ll call them atoms) as tagged pointers. Each base type is given a tag and an atom is represented by placing this tag in the lower bits while the atoms value is encoded in the upper bits. The Tamarin virtual machine uses 3 bit tags. The Tamarin integer tag is 6, so, for example, the atom for 42 is encoded as:

>>> '0x%08x' %  (42 << 3 | 6)
'0x00000156'

Similarly, a Tamarin Object tag is 2, so an Object at 0xaabbccd0 is encoded as:

>>> '0x%08x' %  (0xaabbccd0 | 2)
'0xaabbccd2'

Integers that don't fit into 29 bits are interned as strings. This technique is quite old and was used on the Lisp Machines and was discussed in both SICP (footnote 8) and at least one [PDF] of the early 'Lambda Papers'.

Tamarin Dictionaries

Tamarin Dictionary objects store Object to Object mappings and are implemented internally as hashtables. The hashtable implementation maintains a table that is the smallest power of 2 greater than the number of entries * 5/4. In other words, the table is never more than 80% full (see the source). When an insert causes the table to become more full, the table is grown and all entries are rehashed (see the source). The hash function operates on atoms; it shifts off the lower 3 tag bits and masks off enough top bits to fit the table (see the source).

The "for x in y" looping construct and the AVM2 instructions "hasnext2", "nextname", and "nextvalue" allow the interpreted program to iterate over a Dictionary (or generic Object -- the difference is not made clear in the AVM2 documentation, but the Tamarin implementation makes a distinction). This iteration is accomplished by walking the hashtable from start of table to end. For example, if all integers inserted into the Dictionary are less than the size of the hashtable, the integers will come back out in ascending order. We will leverage this to determine the value of any atom (well, most of it, anyway).

Integer Sieves

Now that all of the background is out of the way, I can explain the general idea. Since integers are placed into the hashtable by using their value as the key (of course, the any top bits will be masked off), we can determine the value of some other atom by comparing the integers before and after it. Since Object atoms are basically just pointers, we can disclose as many bits of a pointer as we can grow the hashtable. To avoid the problem of a hash collision, we create two Dictionaries, one with all the even integers and one with all the odd integers (up to some power of two -- the larger, the more bits we discover). After creating the Dictionaries, we insert the victim object into both Dictionaries (the value you map it to does not matter for this trick -- in fact, the values are of no use at all). Next, search each Dictionary using the for-in construct, recording the last key visited and breaking when the current key is the victim object. We now have two values, the two values should differ by 17. This is due to the linear probe; when a hashtable insert collides, it uses a quadratic probe to find an empty slot. It begins at hash(atom) + 8 (collides -- even + even = even, odd + even = odd), then tries hash(atom) + 17 (success -- even + odd = odd, odd + odd = even). So, we know that when the two values differ by 17, the lower value is the one from the Dictionary that didn't have the collision. When it isn't 17 (wrapped around), the larger value is from the Dictionary that didn't have the collision. We now have the integer that, when turned into an atom is 8 (aka 1 << 3) smaller than the victim atom. Finally, to get the victim atom from the integer, x: (x << 3 + 8) or more to the point ((x + 1) << 3).

Sample Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
package getptr
{
  import avmplus.System;
  import flash.utils.Dictionary;
 
  print("getptr - dion@semantiscope.com");
 
  function objToPtr(obj) {
    var i;
    var even = new Dictionary();
    var odd = new Dictionary();
 
    print("[+] Creating hashtables");
    for (i = 0; i < (1024 * 1024 * 2); i += 1) {
      even[i * 2] = i;
	  odd[i * 2 + 1] = i;
    }
 
    print("[+] Triggering hashtable insert");
    even[obj] = false;
    odd[obj] = false;
 
    var evenPrev = 0;
    var oddPrev = 0;
    var curr;
 
    print("[+] Searching even hashtable");
    for (curr in even)
    {
      if (curr == obj) { break; }
	  evenPrev = curr;
    }
 
    print("[+] Searching odd hashtable");
    for (curr in odd)
    {
      if (curr == obj) { break; }
	  oddPrev = curr;
    }
 
    var ptr;
    if (evenPrev < oddPrev) {
      ptr = evenPrev;
      if (evenPrev + 8 + 9 != oddPrev) {
        print("[-] Something went wrong " + evenPrev + ", " + oddPrev);
      }
    } else {
      ptr = oddPrev;
      if (oddPrev + 8 + 9 != evenPrev) {
        print("[-] Something went wrong " + oddPrev + ", " + evenPrev);
      }
    }
    ptr = (ptr + 1) * 8;
 
    return ptr;
  }
 
  var victim = "VICTIM";
  var ptr = objToPtr(victim);
 
  print("[+] ptr = " + ptr);
 
  System.debugger();
}

Notes: All test were performed with Tamarin pulled from tamarin-central @ dab354bc047c

To test this code, compile up a Tamarin avmshell, place a breakpoint on SystemClass::debugger(), and run the sample script. Once the debugger hits, check the address spit out. On an XP system, the heap is gonna start pretty low, so the output address will probably be correct (it was for me). For a string, the address + 0xC will be a pointer to the bytes of the string (to help you verify that it works).

I compiled the above ActionScript with asc.jar as suggested by the Tamarin build documentation.

It is worth knowing that Flash Player uses a version of the Tamarin virtual machine, but the sample code will not work directly with Flash. Feel free to reverse it yourself and see the modifications you need to make. I have not spent the time to make my script work with it, but I think it shouldn't be hard.

Epilogue

This is, I think, a cute trick. Who cares? I care because this kind of leak is really hard to automatically check for. Bits are leaked via comparisons. How do you track this kind of information leakage? Maybe someone from academia can pipe up -- maybe no one cares. Knowing the address of some attacker controllable bytes is always good. Regardless of the usefulness, I hope it was interesting.

Happy hunting!

EDIT: Changed "Prologue" to "Epilogue"... wow, I wrote this post too quickly.
EDIT: Fix the typo reported by Jordan.