Random (and Readable) String Generation

For web application, it’s useful to be able to generate different types of random strings. Default passwords, CAPTCHAs, and API keys are just some examples of website random strings.

I finally revamped my string generating utility when I wanted to generate a human-readable string. I’d like each Pownce user to have a “syndication key” for several potential new features including custom feeds (non-public), upcoming event feeds, and email posting. It’s helpful to have a readable key for email posting since nobody wants to remember something like “cp59p18sv3wwi4z0″. Blech. That’s just fine for API keys though.

Flickr does a nice readable key with actual short words (5-6 characters) for their email posting. I didn’t really want to use a dictionary of available words and thought that the form used in this example was probably good enough. It generates something like “fof91yoj”.

So here’s the final result.

The only thing you’ll need to change is where you’re getting the lists, BANNED_WORDS and BANNED_PHRASES. Pownce is becoming a total rat’s nest and I’m too lazy to mess with these for the example.

Is there a library (Python or Django) to do this already? I didn’t search too hard because I really wanted to an excuse to write code that uses lambda.

12 Comments

  1. Posted January 8, 2008 at 5:48 pm | Permalink

    Yeah, I can’t tell you how many times I’ve rewritten code to do almost exactly this same thing. The output of yours is pretty nice though — and easy enough to remember. I haven’t found any Python code that does this as nicely as yours does) — but who cares; you got a chance to use lambda! I’ve been working on my open source project and have gotten a chance to use generators (yield) a bunch of times… It’s almost as much fun as lambda ;)

    Didn’t get a chance to tell you I liked your new layout — especially the background pattern. Hope you’re havin a good week!

    Cheers!
    -Ken

  2. Posted January 8, 2008 at 6:04 pm | Permalink

    RFC1751 specifies a standard for representing 128 bit numbers by a series of short English words, and there’s an implementation in PyCrypto. I’ve never used it and I’m not sure if it’s easy to get it to spit out shorter phrases for smaller numbers.

    I’ve never seen anything with your exact algorithm, but I’ve often seen apps use a hex digest of a hash (probably cropped to fewer digits).

  3. mikeal

    Posted January 8, 2008 at 7:06 pm | Permalink

    import uuid
    def randomness(characters=6):
    return str(uuid.uuid1()).replace(’-',”)[:characters]

  4. Posted January 8, 2008 at 7:53 pm | Permalink

    Similar… don’t know if it’s as “nice”:

    http://zopelabs.com/cookbook/1059673251/txt_src

  5. zylox

    Posted January 9, 2008 at 7:26 am | Permalink

    On the shell I like to use APG ( http://www.adel.nursat.kz/apg/ ). It has a nice feature called “pronounceable password generation” which will give you a list like this:

    Shucowuzil
    fojWinur
    foHolbAg
    citpoytsut
    ThoorfimJi
    siefyawJen

    You can also compare the generated to a dictionary file. Anyway, not really Python, but it could be implemented.

  6. Posted January 9, 2008 at 9:52 am | Permalink

    You can find django.contrib.auth.UserManager.make_random_password userful. Or not.

  7. Posted January 9, 2008 at 10:44 am | Permalink

    I’d just like to point out that it’s possible that generate_random_string() will effectively never return. It is theoretically guaranteed that at some point you will run into a sequence of many BANNED_WORDS being generated one after the other. At any rate, predicting the execution time of this function is basically impossible.

    I would just remove all vowels from your generation pool as well as the number 0, and skip the is_valid_string() check. It’s pretty much impossible to generate a “naughty” word without vowels.

    Use of lambdas to do this kind of thing also strikes me as rather… inefficient? I guess I’m not sure if python is smart enough to optimize this into a simple loop, but I sure hope so. If not, it might be better to just use a simple for loop.

  8. Posted January 9, 2008 at 12:04 pm | Permalink

    Hey Leah,

    You should allow you blog to display more than 1 post on the front page.. and where are the previous / next page links? :)

    Also, the blog looks fine in Firefox, but your title is cut in half (horizontally) in IE 7.. (Have to use it at work, please don’t stone me :) )

  9. Posted January 9, 2008 at 7:00 pm | Permalink

    I have a hard time imagining the content of BANNED_PHRASES. Is it an instance where two tame words turn evil when combined, like dog and walk are OK, but dogwalk is bad?

    What about this instead of the try…pass stuff.

    if s in BANNED_WORDS: return None

    Maybe make BANNED_WORDS into a dictionary or a set or something with a faster lookup. Maybe it already is.

    Anyhow, saw your slides about “be nice to the database” on pownce. If the database is the bottleneck, would it be possible to slice up the site into lots of different servers, each with their different database? You would have to partition each database so that everybody’s friends are on that database.

    For example, maybe the metal kids are all one box and all the punk rock kids are on a different one. Since they never talk to each other, then it’s OK.

    Finally, our ancestors died to give us the right to put docstrings on our functions. We should honor their sacrifice :)

  10. ShyWolf

    Posted January 18, 2008 at 6:15 am | Permalink

    I don’t think that such a library exists already for Python.
    The “easy to remember but fairly secure” is a subject on which the point of view does differ so much that it’s hard to make a good tradeoff.

    Explaining better: password/shared keys generated that way sound too easily breakable to me (well, mostly because I could look at the code too), and probably will look too hard to memorize to my boss.

    He’d resort to copy/paste for that anyway, and at that point it wouldn’t make much difference if they’re readable or not.

    Honestly, beside UIDs and temporary passwords, I don’t see much use for automatic string generation. To achieve what you want (shared feeds, for example) there are other ways that are probably more sane and secure.

    Also, I have many “moarl” issues in using things lambda and yeld. I mean it happened to me to use them, but I somehow always end up having guilt complexes about it.
    Somewhere in my past, and I don’t remember when or where, I was nagged about having a single return point for functions (yeld) and for what regards lamba well, it’s basically a “I don’t feel like typing def myfunction()” kinda construct.

    Disclaimer: I’m an engineer as formation, so that might explain a lot, including my raging paranoia.
    I’m happy not to be a JS programmer.
    As a little diclaimer

  11. Posted January 18, 2008 at 2:50 pm | Permalink

    That’s awesome, thanks for sharing it looks pretty handy. That uncov idiot should just shut up. I think you should get a gag order on him for libel or harrassment or something. His attitude is just uncalled for.

  12. nick

    Posted January 19, 2008 at 11:36 am | Permalink

    @Matt Wilson:

    no, because metal kids have friends with punk kids and mods and shit (see ‘SLC Punk’). You’re thinking in some ideal dream world, that probabily only exists in SFV. It’s called mapreduce anyway… or you can distribute the content using BTree or something for faster lookup.