Archive for the 'Uncategorized' Category

DFLY: syscall frequency

Tuesday, April 29th, 2008

What does the syscall traffic look like for a “make buildkernel”?

% ktrace -di -f /home/dion/projects/syscall-hist/trace1.ktlog -t c make buildkernel
% ktrace -di -f /home/dion/projects/syscall-hist/trace2-cw.ktlog -t cw make buildkernel

Then, I used this “thing” to chew the log files and spit out a nicer form. This code is pretty loosely based on the source to kdump.

#include <sys/ktrace.h>
#include <sys/time.h>
 
#include <err.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
 
static void process_ktrace_log(FILE *logf);
 
uint64_t syscalls[512] = { 0 };
 
int main(int argc, char *argv[]) {
    FILE *logf;
 
    logf = fopen(argv[1], "r");
    process_ktrace_log(logf);
    fclose(logf);
 
    return 0;
}
 
static void process_ktrace_log(FILE *logf) {
    struct ktr_header ktrhdr;
    struct timeval tv_curr, tv_first, *tvp_first = NULL;
    int ktrlen;
    char *m = NULL;
    int m_size = 0;
 
    FILE *logsc = fopen("sc.log", "w");
    FILE *logcsw = fopen("csw.log", "w");
 
    while (fread(&ktrhdr, sizeof(struct ktr_header), 1, logf) == 1) {
        if ((ktrlen = ktrhdr.ktr_len) < 0)
            errx(1, "bogus length 0x%x", ktrlen);
        if (ktrlen > m_size) {
            m = (void *)realloc(m, ktrlen+1);
            if (m == NULL)
                errx(1, "%s", strerror(ENOMEM));
            m_size = ktrlen;
        }
 
        if (ktrlen && fread(m, ktrlen, 1, logf) == 0)
            errx(1, "data too short");
 
        if (!tvp_first) {
            tv_first = ktrhdr.ktr_time;
            tvp_first = &tv_first;
        }
 
        timersub(&ktrhdr.ktr_time, tvp_first, &tv_curr);
 
        switch (ktrhdr.ktr_type) {
            case KTR_SYSCALL:
                {
                    struct ktr_syscall *ktrsc = (struct ktr_syscall *)m;
                    fprintf(logsc, "%16lld %d\n", 
                            (uint64_t) tv_curr.tv_sec * 1000000L +
                                       tv_curr.tv_usec,
                            ktrsc->ktr_code);
                    syscalls[ktrsc->ktr_code]++;
                }
                break;
            case KTR_CSW:
                {
                    struct ktr_csw *ktrcsw = (struct ktr_csw *)m;
                }
                break;
            default:
                break;
        }
    }
 
    fclose(logsc);
    fclose(logcsw);
 
    {
        FILE *schistlog = fopen("schist.log", "w");
        int i;
 
        for(i = 0; i < 512; i++) {
            if (syscalls[i]) fprintf(schistlog, "%d %lld\n", i, syscalls[i]);
        }
 
        fclose(schistlog);
    }
}

Then I used this gnuplot input to generate a quick plot:

set terminal png
set output 'schist-1m.png'
set xrange [0:60000000]
plot "sc.log" using 1:2 with dots

Finally, I used this little python script (I’m guessing some awk/sed genius could do it on the command line, but I can’t):

#!/usr/pkg/bin/python2.4
 
import sys
import re
 
sc = open("/usr/include/sys/syscall.h")
scdict = {}
for line in sc.readlines():
    mo = re.match("^#define\s+SYS_(\S+)\s+(\d+)\s*$", line)
    if mo is not None:
        scdict[mo.group(2)] = mo.group(1)
sc.close()
 
dpair_re = re.compile("^\s*(\d+) (\d+)$")
for line in sys.stdin.readlines():
    mo = dpair_re.match(line)
    if mo is not None:
        print '%15s %3s %s' % (mo.group(2), mo.group(1), scdict[mo.group(1)])

Then I sorted it on the command line:
% ./schist.py < schist.log | sort -rn > schist-chew.log

And the top 15 are: (full results)

        1047837 197 mmap
         828677   5 open
         412613   3 read
         355321   6 close
         343725 476 fstat
         287935  73 munmap
         278365 475 stat
         114136   4 write
         108059 342 sigaction
          88929 199 lseek
          64747 477 lstat
          42090  92 fcntl
          40985 472 set_tls_area
          29434 253 issetugid
          27946  59 execve

[EDIT] Added the graph and final results.

[SHA, Right] What’s Joux Talkin’ Bout?

Tuesday, April 29th, 2008
"Differential Collisions in SHA-0"

Chabaud & Joux begin by attacking a weakened SHA-0 variation.  The first version, which
they call SHI1, is equivalent to SHA-0 with the addition operations in A_{i+1} replaced
with XORs and the f_i functions replaced by XOR.  This effectively removes the
non-linearity introduced by the f_i function and the addition operations.

The analysis of SHI1 begins by examing the effect of perturbations of the vector
W_{i} (0 <= i < 80) directly (instead of trying to study the perturbations of the
message block, M or W_i (0 <= i < 16)).  In other words, suppose we directly supply the
vector W, instead of using the expansion function E.  Since we are mounting a
differential attack, we will start with some vector W and study how permutations of this
vector effect the final result.

The paper gives the following differential path to correct a flip of W^{i}_{1}:
W^{i+1}_{6}
W^{i+2}_{1}
W^{i+3}_{31}
W^{i+4}_{31}
W^{i+5}_{31}
This differential path gives the bits we need to flip to "undo" a given bit flip (bit
1 of W^{i} or W^{i}_{1} for this example).

TODO: Give the output of the python test script -- showing the correction?
TODO: Add image from the Joux paper (Figure 2)

Performing the above flips, we can produce a second vector, W', which will produce the
same result -- a collision!

This shows we can produce a collision if we can supply both W and W' in their
entirety, but SHI1 contains an expansion function producing W from the message block
M.  The last 64 words of W are derived from a recurrence primed with the first 16
words (which is M).  Some values we could choose for W (or W') would be impossible
to generate using any M.  This leads to our next step; we must ensure that both W
and W' can result from the expansion process.

Let's start by representing the collisions as an error vector, m_{0}.  m_{0} is
80 bits long and will contain a 1 in position i if we intend to negate W^{i}_{1}
(for now, we will only concern ourselves with correcting flips in bit 1 -- see
Note 1 in the paper).  m_{0} must be 0 for the i >= 75 "since a perturbation in
round i is never corrected before round i + 6, and since all perturbations must be
corrected by round 80."

The intuitive explanation of the process
----------------------------------------

Important observation:
* The expansion process does not interleave bits!  This turns it into a function
from 16 bits to 80 bits over each bit in the word.

1. Find valid perturbations -- these are deduced by ensuring they fit the expansion
recurrence relation.  It is important to see that since the compression functions
starts primed with some A - E, the recurrence must actually start at the 11th word
(5 steps [A-E] behind the 16th that the recurrence is defined at).

The search is brute force with a search space of 2^16.  It is simple.

We will call the chosen error vector e_{0}

TODO: Include the functions taken from sha_exp_rev.py that compute valid error
vectors.

2. Now, derive the global differential mask (which is M in the paper -- M is also
the message block... bad naming).  The global differential mask is derived by fixing
the flips found in the previous step with the differential path described in the
prebvious section.  Since the SHI1 defines all combination function in the
compression function as XOR, we can XOR the differential paths for all the bits
flipped in e_{0} to compute up with the global mask.  We will call the global mask
G.  We only need the first 16 words of this mask since those will define the rest of
it (via the expansion function).

NOTE: Maybe worth pointing out that this will generate a valid W' because e_{0}
satisfies (9).

3. Given the global mask M and *any* input message M, SHI1(M) == SHI1(M \xor G).

Collision!  Hooray!  Wait.  That's just SHI1.  It's all linear.  We just solved an
algebra equation.  Oh.

[SHA, Right] The Algorithm

Tuesday, April 29th, 2008

SHA-0 (or something more clever here)

SHA-0 was specified in the first Secure Hash Standard (SHS) (current draft) issued in 1993. Two years later, NIST issued a small change to the SHA-0 expansion function; the new function is known as SHA-1. There was no justification for the updated function, but it is assumed the NSA found a collision attack.

Structure of the Hash Function

The SHA hash function works on blocks of 512 bits and outputs a hash value of 160 bits. The function proceeds as follows:

  1. Pad the message by adding a single 1 bit and then a string of 0 bits suffixed with the length (in bits) of the message without padding (represented as a 64-bit big endian integer). The number of zero bits is chosen as the minimum number to evenly pad the message to the next 512 bit boundary.
  2. The five state variables are initialized to fixed constants:
    A = 0x67452301
    B = 0xefcdab89
    C = 0x98badcfe
    D = 0x10325476
    E = 0xc3d2e1f0
  3. For each message block, call the compression function using the current state, A through E, and the current block. The compression function will return the new state. Pseudocode:
    state[i + 1] = compress(current_block, state[i])
  4. The output of the function is the concatentation of the state variables. Pseudocode:
    return A ++ B ++ C ++ D ++ E

The Compression Function

The compression function is where all the “good stuff” happens.

compress(M, h) is computed as:

  1. Divide the 512-bit message block, M, into 16 32-bit words W_0, W_1, \ldots, W_15.
  2. Expand the 16 words into 80 words using the expansion function defined by the recurrence relation:
    W_i = W_{i-3} \oplus W_{i-8} \oplus W_{i-14} \oplus W_{i-16}
  3. Initialize the state variables, A, B, C, D, E, using h
  4. For each word in W, W_i, compute the new state as:
    \begin{align*}A_{i+1} &= (W_i + A_i \lll 5 + f_i(B_i, C_i, D_i) + E_i + K_i) \mod 2^{32}\\B_{i+1} &= A_i\\C_{i+1} &= B_i \lll 30\\D_{i+1} &= C_i\\E_{i+1} &= D_i\end{align*}
    The function f_i(B,C,D) is defined below.
  5. The output is given as:
    (A_0 + A_80, B_0 + B_80, C_0 + C_80, D_0 + D_80, E_0 + E_80)
The function "f_i(X,Y,Z)" and "K_i" are defined as:
  If i is in the range:
    [ 0, 20)   "IF"   f_i = (X & Y) | (X & Z)             K_i = 0x5A827999
    [20, 39)   "XOR"  f_i = X ^ Y ^ Z                     K_i = 0x6ED9EBA1
    [40, 59)   "MAJ"  f_i = (X & Y) | (X & Z) | (Y & Z)   K_i = 0x8F1BBCDC
    [60, 79)   "XOR"  f_i = X ^ Y ^ Z                     K_i = 0xCA62C1D6

NOTE: Add a link to the python implementation.

[SHA, Right] Introduction: A French Guy, an Israeli Guy, and a Chinese Gal Walk into a Bar

Wednesday, March 12th, 2008

… and discuss how best to attack SHA.

I was talking with a math guy here about the requirement for SHA-384 in a product we are developing. I had heard that MD5 was “insecure” and not to be trusted for collision resistance, but I hadn’t heard anything about SHA-1. Despite hearing these claims around the internet, I’ve never investigated further. My co-worker mentioned that SHA-1 was also on the way out and NIST will probably issue something declaring the larger members of the SHA family (SHA-384 and SHA-512) to be the new standards (I’ve since learned that NIST has a call for proposals for new hash functions — in the same way they picked AES). Never having implemented a secure hash function, I was curious what was involved and where the “one-wayed-ness” came from. This investigation led me from doing a fast implementation of SHA-512 (and, almost by default, SHA-384) to the original paper on differential attacks on DES to a bunch of work by three groups hitting the SHA functions pretty hard. I feel like I should try and explain some of the attacks I’ve learned (along with source code) — they’re clever and not difficult to understand once digested.

Before I begin, here are some references for those following along at home. I’m sorry for the crappy links; I know Springer and ACM are useless for anyone outside of academia. I’ll have to upload local copies — these papers are surprisingly hard to find.

C as a target

Saturday, December 1st, 2007

Much of the code I write must be in C. I’ve decided C needs a metalanguage. The preprocessor just doesn’t cut it. I want a full language. I don’t care if the C preprocessor is Turing complete; it simply doesn’t suit my needs.

Lately, this has resulted in me spending time developing a code generator to spit out C based on some domain specific description. The generator has turned out to work surprisingly better than I expected. It shields me from the task of writing the many safety checks required when doing C buffer arithmetic. It also writes the code necessary to sanitize the parameters and adds any extra safety checks I’ve decided are necessary for certain types of data. Apart from carrying the burden of writing the tedious and error prone safety code, the generator abstracts certain implementation decisions I’ve had to make along the way. Since the generator is designed to make these decisions easy to swap, a comparison between methods or, more likely, a customer’s last minute change becomes easy. Additionally, have the current generator can generate code for both C and Python. Having Python code that accomplishes the same task can be used to verify some other parts of our system. Finally, pre-computing pieces of data (used to boost performance) is safe in this scheme — no need to worry about recomputing that table, length, or offset every time x or anything x depends on changes. This kind of thing is by no means a new idea. Domain specific languages began receiving attention in the late 90s. I am just now starting to see their benefit.

Much of the design of my generator (for lack of a better word) is inspired by the code I’ve been writing for Simon Peyton Jones’ Implementing Functional Languages: A Tutorial. The book has been great. It’s the most Haskell I’ve written — I’m really enjoying it. The combination of the iterative design and the exercises with expanding scope make it easy to learn. Even the model for the pretty printer was useful for the C code generation. The patterns (I hate that word in this context) used are easy to recognize and I had no problem implementing tweaked versions in Python.

Last thing, the map/reduce combination is useful. It made much of my Python easy to write and understand.

Notes for next time: why doesn’t he like a direct tree grammar?

Remember to: Add references to all the ML compiled to C papers I’ve been reading.