What, you’ve never wanted to be randomly consistent? Meaning, you’ve never wanted to use something that might-as-well be random, but at the same time give you consistent answers based on input? The solution just requires a small but of insight and a small bit of code. Let’s check it out.

Scenario

At work, we were trying to figure out a way to store only a certain percentage of requests in our cache, but we didn’t particularly care which requests were stored and which weren’t, but we also wanted to be able to determine quickly if something should be in the cache without asking the cache. In other words: controlled randomness.

At first, this seems like an untenable position, but consider it a bit deeper: what always returns the same result, given the same input, and is suitably “random” (Mitt Romney doesn’t count: he’s out on the first requirement). Enter hashing algorithms.

The Hash

Cryptographic hashing algorithms were specifically designed so that their return values represent a uniform distribution of possible values over its output range. For this algorithm, we’ll use one of the best-known hashing algorithms out there: MD5 (128 bits). Honestly, the algorithm itself it far shorter than even this sentence. Let’s see it:

import hashlib

class ConsistentDecision(object):
    """
        Used to *consistently* decide on an
        action based on an input value.
    """
    def __init__(self, percentage):
        # The percentage of values to select
        self.perc = int(percentage)

    def ask(self, value):
        return (int(hashlib.md5(str(value))
            .hexdigest(), 16) % 100) < self.perc

Note: So this is a python class, just for convenience, but it clearly doesn’t have to be.

Why does this work?

Quite simply, we take the input value and mod it by 100: this gives us a number between 0 and 99, and because we used a cryptographic hashing function, we know we have an even distribution of values, so we check to see if our value falls into the self.perc bucket (ie. the percentage of values we want to select), and if it does, we simply say True.

Why are you telling me this?

Mainly, I just found this to be a pretty cool little algorithm. That and the original implementation included huge arrays, a 159 line class, and hilarity when I pointed out this simpler version. Moral of this story: know your algorithms, kids.

One week at work, I was trying to figure out how to stash my working copy in SVN only to discover that I could not. Everywhere I looked, I saw nothing but huge “svn stash” projects with incredibly complex syntaxes and impossible to remember commands. I missed git.

Why was it so hard?

During that long week, I hacked together a simple solution that you can drop into your local bin and use, just like git stash. Everything exists in one python file, without any external dependencies (besides SVN, obviously). Have fun.

I want this

Yeah, so did I. Here you go: svnstash on GitHub

How do I use this?

Quite well…erm, I mean, read the documentation.

Fun With RTMP

For work, I have to use RTMP. A lot. This is just some of the fun I’ve had with it.

Getting

You’re going to need the following:

  • A modern Linux distro
  • rtmpdump (can be installed with your distro’s package manager)
  • An RTMP stream that you have permission to copy and download

iptables Magic

First, in order to capture RTMP streams, you’re going to need to redirect the traffic to something you can intercept (ie. back to your machine). Since you’re using Linux, this is incredibly simple:

$# iptables -t nat -A OUTPUT -p tcp --dport 1935 \
	 -m owner \! --uid-owner root -j REDIRECT

To delete the iptables rule:

$# iptables -t nat -D OUTPUT -p tcp --dport 1935 \
	-m owner \! --uid-owner root -j REDIRECT

Hijacking the requests

The following command will create the RTMP interceptor and respawn it once it downloads a stream completely.

$# rtmpsuck

To kill it, issue Ctrl+c, and you’re good.

Note

Notice that these commands are being run as root. You need to run these as root, otherwise you won’t be able to intercept the RTMP calls. Also, in the iptables command, root is mentioned: this tells iptables to only redirect requests to rtmpsrv provided that it wasn’t root who made the request.

Legal

Don’t be evil. Respect the laws of your respective country. I am not responsible for how you use this. I only talk about this topic because I find it interesting, and there’s nothing wrong with learning.

So, it turns out today that my external hard drive has a few bad blocks on it, and since it’s mainly used as an archive, I feel no pressing need to replace it (stupid, yes, I know). But until I get some more external backup space, I’m just going to mark some bad blocks and move on with it. Since this is something that we will all hit at some point, let’s go over the basics.

warning!

If your drive has bad blocks that the operating system can see, then chances are it’s time for a new drive. Typically, the drive will transparently remap bad sectors when its finds them, so if you’re seeing bad blocks, that is really bad news. Though you can prolong the life of your drive by adding these blocks to the bad block list in your filesystem, it is almost always better (in the short and long run) to get a new drive. You’ll save yourself hours (sometimes days) of scanning the drive, and the looming threat of a dead drive will be gone. Do yourself a favor: get a new drive. But if you can’t…here we go.

What’re we gonna do, chief?

Well, cadet, we’re going to scan the entire drive. “The entire drive?!”, you say. Yes, the entire drive. It’s a few simple, easy, quick steps (for you, not the computer), so buckle down and let’s find and annihilate the bad blocks!

Assumptions

  • The partition having trouble is /dev/sda1. Adjust to suit your needs.
  • You have root access.

ext2/3/4

For these file systems, it’s really simple to do this:

$# e2fsck -c -c -k -C 0 /dev/sda1

resiserfs

Well, this is a bit more involved, but it comes with a lot more satisfaction (and nerd points).

Find that block size

First, we need to find out our blocksize so that we can do a proper scan of the hard drive to find bad blocks. Using one of the following, determine your block size:

$# tune2fs -l /dev/sda1 | grep -i "block size"
Block size: 4096

$# dumpe2fs -h /dev/sda1 | grep "Block size:"
Block size: 4096

$# blockdev –getbsz /dev/sda1
4096

Scan that drive

This one is really simple. Be sure to substitute your drive and block size appropriately.

Be sure you have days to spare for large drives.

$# badblocks -n -b 4096 -o badblocks /dev/sda1

Here’s what’s going on:

  1. -n: run in read-write, non-destructive mode (your data will be safe)
  2. -b 4096: your block size
  3. -o badblocksfile: where to store the list of bad blocks
  4. /dev/sda1: the partition to scan

Annihilate those bad blocks

$# reiserfsck --fix-fixable \
	--badblocks badblocks /dev/sda1

Verbinator

Verbinator is a tool to help you understand German. First, you might ask yourself why you would only ever want only the verbs of a sentence translated, but how often has it happened that you read an entire sentence, understand every word, then get to a verb and wonder what it’s doing? Yeah, that’s what I thought. This is for that case.

Why not Google Translator / Babel Fish?

Well, if you’re looking for a new translator and actually found this, that answers that question. Those translators will only help you so much before you just get lost in their nonsensicalilityness. Plus, when those guys look at sentences, they often fail spectacularly at identifying verb tenses. Verbinator is a tool to help you understand the verbs in sentences and what they’re doing.

What’s with all the magic?

Buttons tend to need text, and I figured magic was a good way to describe it because, on the back end, there’s quite a bit of processing and “magic” going on. In reality, the magic is nothing more than insane algorithms and crunching of the sentence.

How accurate is all this?

This is actually pretty accurate – for doing language processing, it does rather well for itself. It’s by no means 100% accurate in every case, but for a majority of German sentences, it will get the job done.

Umm, what are you talking about?

Verbinator! Keep up!

What if I find something wrong?

If you think that you found an error with the translation, chances are you did. You see, natural language processing is incredibly difficult to get 100% right (if it were easy, this site would not exist). So, the errors you’re seeing most likely arise from the fact that there’s some ambiguity in German that could not be resolved programatically, so it tried its best, but that wasn’t good enough. Not even the magic and dark magic that the site use are good enough for every case.

No, what if I really find something wrong?

Well, then you might want to report an issue on GitHub Issues.

Where else can I use this?

Check out the bookmarklet.