P For Paranoia OR a quick way of overwriting a partition with random-like data

(General Surgeon’s warning: The following post contains doses of paranoia which might exceed your recommended daily dosage. Fnord!).
A lot of the data sanitisation literature around advises overwriting partitions with random data (btw, SANS Institute research claims that even a pass with /dev/zero is enough to stop MFM but YPMV). So leaving Guttman-like techniques aside, in practice, generation of random data will take a long time in your average system which does not contain a cryptographic accelerator. In order to speed up things, /dev/urandom can be used in lieu of /dev/random, noting that when read, the non-blocking /dev/urandom device will return as many bytes as are requested, even if the entropy pool is depleted . As a result, the result stream is not as cryptographically sound as /dev/random but is faster.
Assuming that time is of the essence and your paranoia level is low there is an alternative which you can use, both providing random-like data (which means you do not have to fall back to /dev/zero and keep fingers crossed) and being significantly faster. Enter Truecrypt. Truecrypt allows for encrypted partitions using a variety of algorithms that have been submitted to peer review and are deemed secure for general usage. I can hear Johnny sceptical shouting “Hey wait a minute now, this is NOT random data, what the heck are you talking about?”. First of all, Truecrypt headers aside, let’s see what ent reports. For those of you not familiar with ent, it is a tool that performs a statistical analysis of a given file (or bitstream if you tell it so), giving you an idea about entropy and other way way useful statistics. For more information man 1 ent.
For the purposes of this demonstration, I have created the following files:

  • an AES encrypted container
  • an equivalent size file getting data from /dev/urandom (I know, but I was in a hurry )
  • a well defined binary object in the form of a shared library
  • a system configuration file
  • a seed file which contains a mixture of English, Chinese literature, some C code, strings(1) output from the non-encrypted swap (wink-wink, nudge-nudge)
  • Let’s do some ent analysis and see what results we get (for the hastily un-strict compliant Perl code look at the end of the article)

    ################################################################################
    processing file: P_for_Paranoia.tc 16777216 bytes
    Entropy = 7.999988 bits per byte.
    Optimum compression would reduce the size
    of this 16777216 byte file by 0 percent.
    Chi square distribution for 16777216 samples is 288.04, and randomly
    would exceed this value 10.00 percent of the times.
    Arithmetic mean value of data bytes is 127.4834 (127.5 = random).
    Monte Carlo value for Pi is 3.141790185 (error 0.01 percent).
    Serial correlation coefficient is 0.000414 (totally uncorrelated = 0.0).
    ################################################################################
    processing file: P_for_Paranoia.ur 16777216 bytes
    Entropy = 7.999989 bits per byte.
    Optimum compression would reduce the size
    of this 16777216 byte file by 0 percent.
    Chi square distribution for 16777216 samples is 244.56, and randomly
    would exceed this value 50.00 percent of the times.
    Arithmetic mean value of data bytes is 127.4896 (127.5 = random).
    Monte Carlo value for Pi is 3.143757139 (error 0.07 percent).
    Serial correlation coefficient is -0.000063 (totally uncorrelated = 0.0).
    ################################################################################
    processing file: seed 16671329 bytes
    Entropy = 5.751438 bits per byte.
    Optimum compression would reduce the size
    of this 16671329 byte file by 28 percent.
    Chi square distribution for 16671329 samples is 101326138.53, and randomly
    would exceed this value 0.01 percent of the times.
    Arithmetic mean value of data bytes is 82.9071 (127.5 = random).
    Monte Carlo value for Pi is 3.969926804 (error 26.37 percent).
    Serial correlation coefficient is 0.349229 (totally uncorrelated = 0.0).
    ################################################################################
    processing file: /etc/passwd 1854 bytes
    Entropy = 4.898835 bits per byte.
    Optimum compression would reduce the size
    of this 1854 byte file by 38 percent.
    Chi square distribution for 1854 samples is 20243.47, and randomly
    would exceed this value 0.01 percent of the times.
    Arithmetic mean value of data bytes is 86.1019 (127.5 = random).
    Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
    Serial correlation coefficient is 0.181177 (totally uncorrelated = 0.0).
    ################################################################################
    processing file: /usr/lib/firefox-4.0.1/libxul.so 31852744 bytes
    Entropy = 5.666035 bits per byte
    Optimum compression would reduce the size
    of this 31852744 byte file by 29 percent.
    Chi square distribution for 31852744 samples is 899704400.21, and randomly
    would exceed this value 0.01 percent of the times.
    Arithmetic mean value of data bytes is 74.9209 (127.5 = random).
    Monte Carlo value for Pi is 3.563090648 (error 13.42 percent).
    Serial correlation coefficient is 0.391466 (totally uncorrelated = 0.0).

    Focusing on entropy, we see that
    Truecrypt: Entropy = 7.999988 bits per byte.
    /dev/urandom: Entropy = 7.999989 bits per byte.
    which are directly comparable (if you are trusting ent that is) and much better than a well structured binary file (5.666035 bits per byte) and heads and shoulders our seed.txt results (which is a conglomerate unlikely to be encountered in practice). Chi-square entropy distribution values are different by a factor of 5 in our example, in favor of /dev/urandom data, which is still way more than the data encountered in our other test cases.
    From the above, there is strong indication that when you need random-like data and /dev/urandom is too slow (for example, as I will elaborate on an upcoming post), for example when you want to “randomize” your swap area, a Truecrypt volume will do in a pinch.
    #!/usr/bin/env perl
    use warnings;
    use File::stat;
    # a 5 min script (AKA no strict compliance) to supplement results for a blog article
    # why perl? Nostalgia :-)
    @subjects = qw(P_for_Paranoia.tc P_for_Paranoia.ur seed /etc/passwd /usr/lib/firefox-4.0.1/libxul.so);
    sub analyzeEnt {
    my($file) = @_;
    my $sz = stat($file)->size;
    my $ent = `ent $file` ."\n";
    print "#" x 80 . "\nprocessing file: $file ". $sz ." bytes\n".$ent;
    }
    foreach my $subject (@subjects) {
    &analyzeEnt($subject);
    }

    4 comments

    1. Interesting article. So is using /dev/urandom actually slower than doing truecrypt? The results of such an experiment would be a nice addition to this post. Waiting for the upcoming post 🙂

      Like

    2. Thanks for the comment. In my box it is way faster for a 64Gb partition on a SATA drive (I am poor I know). Average Truecrypt throughput is appx 116MB/s whereas dcfldd is way slower (along the lines of 8-10Mb/sec but perhaps I am using the wrong settings 🙂 . Usual caveats apply.

      Like

    3. Nice. BTW does dcfldd offer anything more than a progress bar when coping from /dev/urandom? (Or, why not use dd?). Anyway, for my wipes would prefer the Secure Erase internal instruction that is NIST 800-88 compliant ( http://cmrr.ucsd.edu/people/Hughes/SecureErase.shtml ). It also takes care of DCO and HPA areas which dd and similar methods do not…You are the one who said P for Paranoia 🙂 And since it’s done in the disc controller level, I guess it’s faster than usual software methods, although I’d love to see some measurements. I sugget you take a look on it, I bet you’ll find it interesting. There’s also a Full Disc Encryption – Secure Erase (FDE-SE) option that works like you can imagine – In fact secure erase (enchanced secure erase) in newer hardware encrypted drives works in secods: It’s just a command to change and forget the encryption key!

      Like

    4. Thanks for the links but some of your (valid, don’t get me wrong) remarks are besides the point of this article, if not outright off-topic. You are talking media sanitization “HDDerase.exe is a DOS-based utility that securely erases “sanitizes” all data
      on ATA hard disk drives in Intel architecture computers (PCs).” , I am talking about something else. Wait for the next article in the series, you will see where I am getting at, as opposed to posting with an nitpicking attitude, addressing issues that are completely off-topic.
      “For paranoid-level security, the cypt-text in an FDE disk drive could be eliminated by a
      Normal OW SE done after the FDE E-SE.” <— LOL
      PS: I also have the feeling we had this conversation again a long time ago (a year or two?) in a security related mailing list (although I cannot tell from your disposable email address, how about using a real one next time?). If that's the case, hi again 🙂 Shame you are not using your real alias/email though …

      Like

    Leave a comment