Tiny URLs based on pk

I’ve been wanting to make a short url for Pownce permalink pages for an upcoming project. Pownce note permalink pages on the website currently look like this: http://pownce.com/leahculver/notes/2477365/

The generic form is: http://pownce.com/<sender username>/note/<note id>/

While these URLs are pretty descriptive and simple, they’re also a bit long. The note id is really the only piece of information we need from the URL, the sender username is just fluff. I’d like to have an alternate URL that is quite a lot shorter that redirects to the final (longer) URL.

I think the following set of characters look pretty good in URLs:
23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ
Basically, 0-9a-zA-Z minus the confusing characters: 01loIO.

Now, it’s fairly simple to convert a base 10 note id to this strange base 56 set assuming that the character order (digit-lower-upper) is maintained. The (very freshman-like) code is here.

The result is something like http://pownce.com/~g7YF/ will redirect to http://pownce.com/leahculver/notes/2477365/

This works very well for finding a note page based off a note id. It’s also super handy for making a generic tiny-urlizing service. The only data that needs to be stored is the real URL and a numerical id - the unique primary key for the database row! I have no desire to run a tiny-urlizing service, but this would probably be how I’d do it.

Note: Pownce user profile pages are in the form http://pownce.com/<username>/ so a non-username character, such as ‘~’ can signify that this is a note permalink as oppose to a profile page. I like the tilde, it’s got style… a bit confusing for the Unix folks though. Since it’s the code isn’t in production yet, other character suggestions are welcome.

30 Comments

  1. Francis

    Posted June 17, 2008 at 11:39 am | Permalink

    Sure it works for me. Google’s got a habbit of using single letters http://www.google.com/a/ is for the google apps for domains, and they have a couple others. Maybe pounce.com/n/g7yf for notes and /p/ for person or profile. Like that. I think even gallery (gallery.sf.net) uses things like /v/ for viewing a photo and /d/ for album (why d? maybe directory?).

    It helps to plan it all out in advance because you never want it to change.

  2. Leah

    Posted June 17, 2008 at 11:50 am | Permalink

    Francis - Good idea but ‘n/’ would be one extra character and someone’s username is already ‘n’…

  3. Posted June 17, 2008 at 12:11 pm | Permalink

    You could retool the pwnce.com domain into a tinyurl-type service.

    http://pwnce.com/g7YF

    If the short URL isn’t valid, then direct to the equivalent pownce.com URL (as it does now):

    http://pwnce.com/leahculver -> http://pownce.com/leahculver

  4. Posted June 17, 2008 at 12:12 pm | Permalink

    I’d remove vowels so you can’t make bad or inappropriate words in your urls. The result is something I like to call “bsafe” ids.

  5. Posted June 17, 2008 at 12:21 pm | Permalink

    Django has a built in redirect thingy that would convert it to http://pownce.com/r//&lt;model_id/ i.e. http://pownce.com/r/5/10/ see it here:

    http://www.djangoproject.com/documentation/redirects/

  6. Igor

    Posted June 17, 2008 at 12:35 pm | Permalink

    How about a #…

  7. Leah

    Posted June 17, 2008 at 12:49 pm | Permalink

    Myles - I like the redirects app - it’s good for mapping old broken URLs to new ones. However, it’s not as efficient for tiny URLs. The “converting bases” way is one less db column since you don’t need to store the new URL. It’s also faster to add an entry, since no iteration is needed to verify uniqueness.

  8. Leah

    Posted June 17, 2008 at 12:49 pm | Permalink

    Andrew - Heh. I think ‘fck’ would be just as bad though…

  9. Posted June 17, 2008 at 1:06 pm | Permalink

    I just wrote a base62 encoder for the tinysong website at Grooveshark. I left in similar-looking characters because I figured people would be copying/pasting the urls rather than trying to remember and type them. I do the insert into the table, grab the auto_incremented primary key, then base62 encode it. One of your commenters suggested removing vowels, but that limits the “fun” urls like http://tinysong.com/eaT :).

  10. Posted June 17, 2008 at 2:07 pm | Permalink

    Francis’ idea could be implemented by just adding a letter to the start of the “code” - so n[code] - at least you have some future proofing should you ever want to allow short URLs for other areas of pownce. But, as you say - it’s an extra letter - is it worth it? Will you ever want to expand shortened URLs to other areas?

    Oh - http://pownce.com/n/ 404s and if somebody has the account, surely you could give them a free premium account, apologise and politely move them along ;)

    ~ is usually used to symbolize a user, so not really sure on the use of it here - hmmm….

  11. Matt Froese

    Posted June 17, 2008 at 4:14 pm | Permalink

    I don’t think you would really need to store the ids and real URLs separately. Think of Hex to base 10 or Oct to base 10. Create your own number system.

  12. Posted June 17, 2008 at 5:13 pm | Permalink

    It seems like the fact that profile pages are accessed right off the root may really pinch later on. So at the very least, you need a namespace to accomodate, for lack of a better word, controller class methods like this one.

    You’re certainly stuck with the http://pownce.com// route, but I bet it would pay dividends down the road to carve out a non-profile namespace to be extensible even farther past a tinyurl service, and the tilde may not be a bad symbol. So if you use the http://pownce.com/~g7YF/ pattern here you lose it just like you lost every other character to profile pages.

    So if you went with something like http://pownce.com/~/n/g7YF you’ll be able to carve yourself out some breathing room for other extensions.

  13. Posted June 18, 2008 at 3:36 pm | Permalink

    Take a look at Z-Base32. It’s a character selection that’s been specifically picked to be easy for people to dictate, etc. One of the interesting insights is that the extra bits you get from distinguishing between upper and lowercase are really not worth it in terms of how much harder they make the strings to read and transcribe.

  14. Posted June 19, 2008 at 9:56 am | Permalink

    Quick comment: “basestring“ is a Python type, which you might not want to mask by having it be the name of an argument/local var.

  15. Ohako

    Posted June 20, 2008 at 6:39 am | Permalink

    This works a treat! Thanks very much for your work. Oh, I found this page by Googling ‘making tiny urls’, and yours was the first site that had a roll-your-own approach. Very classy.

  16. Posted June 23, 2008 at 9:35 pm | Permalink

    Fun with Ruby.

    # My Ruby encoder
    to_base = Proc.new {|number, table, base| base ||= table.length; number.zero? ? “” : to_base.call(number / base, table, base) + table[number % base].to_s }

    # Leah’s example data
    t = “23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ”.split(”")
    n = 2477365
    puts to_base.call(n, t)
    # puts g7YF

    Pastie: http://pastie.org/private/fiuddepzue7cbodjyfxoqq

  17. Posted June 27, 2008 at 1:50 pm | Permalink

    I should think that using only one letter case would simplify communicating the url over a non-digital medium (case insensitive).

    Also, I would probably use a sub-domain rather than a special character to make things simpler for the non-computer people.

    So …

    http://notes.pownce.com/abc123

    (or if your into less typing)

    http://n.pownce.com/abc123

  18. Posted June 27, 2008 at 10:55 pm | Permalink

    Another way you can easily do this, assuming you know the content-type ID of the object, is to just drop in this URL pattern:

    (r’^(?P\d+)/(?P\d+)/$’, ‘django.views.defaults.shortcut’),

    And voila! Instant short URLs.

  19. Posted June 28, 2008 at 9:34 am | Permalink

    @Mason even more fun with Python :-)

    gen_token=lambda n,cs=’23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ’:(lambda b=len(cs):cs[n%b]+gen_token((n-(n%b))/b))() if n else ”

    usage: gen_token(n)

    @Leah why not use a different host name for the redirect app? ie. http://r.pownce.com/

  20. Posted June 28, 2008 at 11:04 am | Permalink

    The solution can be easy using the pownce.com/n/12345 idea. The urls uses regex so if you want to avoid that fron a username ‘n’ you just have to add the tiny url regex rule on top of the users rule. If the pownce.com/n/12345 is not valid then go to the pownce.com/n/

  21. Posted July 1, 2008 at 7:08 pm | Permalink

    Why not an exclamation mark? I’ve always liked them, and the only UNIX related use I can think of is `!!` on the command line, which inserts the last command. Actually, I think that’s only Linux, but anyways.

    ~ would probably be confusing, though.

    Make sure, whatever you do, that you send a 301 Moved Permanently header. I mean, I’m sure you know that, but just thought I’d mention it. Really terrible for SEO otherwise.

  22. Jon

    Posted July 7, 2008 at 3:58 pm | Permalink

    Just wondering…what’s the benefit of url-encoding the note ids, when you can just use the note id itself?

  23. Leah

    Posted July 7, 2008 at 4:18 pm | Permalink

    James - The default view is great for using the ids and it’s nice to know it exists. However, I wanted to make the URL suffix shorter than the note id by using more url-safe chars.

  24. kevin

    Posted July 13, 2008 at 2:09 pm | Permalink
  25. Chris Milton

    Posted July 14, 2008 at 9:34 am | Permalink

    Leah,

    Have you considered the caret? “^”.
    http://en.wikipedia.org/wiki/Caret

    Chris

  26. Mark

    Posted July 16, 2008 at 7:37 am | Permalink

    Late on the uptake, yes, but I love doing little things like this. It’s faster to push the characters to a list, reverse the list, and join the elements. About 15% faster. Also, you realize that this occasionaly (1/56th of the time) sticks a 0 (encoded as ‘2′) on the front? A do-while loop would be simpler, but Python misses out on that one:

    def int_to_anybase(num, alphabet):
    if (num == 0): return alphabet[0]
    arr = []
    while num:
    rem = num % base
    num = num // base
    arr.append(alphabet[num])
    arr.reverse()
    return ”.join(arr)

    Here’s a do-while in the language of the gods:

    std::string int_to_anybase(unsigned num, const std::string &alphabet)
    {
    std::ostringstream out;
    std::list list;
    size_t base = alphabet.length();
    do {
    unsigned rem = num % base;
    num /= base;
    list.push_front(alphabet[rem]);
    } while (num);
    std::copy(list.begin(), list.end(), std::ostream_iterator(out));
    return out.str();
    }

  27. Mark

    Posted July 16, 2008 at 7:39 am | Permalink

    My template arguments got zapped, but you get the idea.

  28. Posted July 22, 2008 at 6:58 pm | Permalink

    I would be nice that pownce would have an tinyurl integrated within the system so that I would shorten the links on the post. Similar to what they did with Twhirl or Twitterfox

  29. Charlie La Mothe

    Posted July 31, 2008 at 7:31 pm | Permalink
  30. Mark Bate

    Posted August 7, 2008 at 2:35 pm | Permalink

    nice idea.
    will the permalink be optional or automagical?
    It probably won’t matter either way, other than a little number crunching on the server.

    as far as characters to use.. the tilde is quite nice, it’s a funky little character.
    I personally think you should try and pick a character that plays with the permalink idea. generally permalinks are something someone thinks is important and should be drawn to attention, kind of like a pinned notice.
    Characters that might resemble pushpins could be appropriate? eg. ‘ or ` (side on push pins?), *, +, @ (front on push pins?)