I’ve been wanting to make a short url for Pownce permalink pages for an upcoming project. Pownce note permalink pages on the website currently look like this: http://pownce.com/leahculver/notes/2477365/
The generic form is: http://pownce.com/<sender username>/note/<note id>/
While these URLs are pretty descriptive and simple, they’re also a bit long. The note id is really the only piece of information we need from the URL, the sender username is just fluff. I’d like to have an alternate URL that is quite a lot shorter that redirects to the final (longer) URL.
I think the following set of characters look pretty good in URLs:
23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ
Basically, 0-9a-zA-Z minus the confusing characters: 01loIO.
Now, it’s fairly simple to convert a base 10 note id to this strange base 56 set assuming that the character order (digit-lower-upper) is maintained. The (very freshman-like) code is here.
The result is something like http://pownce.com/~g7YF/ will redirect to http://pownce.com/leahculver/notes/2477365/
This works very well for finding a note page based off a note id. It’s also super handy for making a generic tiny-urlizing service. The only data that needs to be stored is the real URL and a numerical id - the unique primary key for the database row! I have no desire to run a tiny-urlizing service, but this would probably be how I’d do it.
Note: Pownce user profile pages are in the form http://pownce.com/<username>/ so a non-username character, such as ‘~’ can signify that this is a note permalink as oppose to a profile page. I like the tilde, it’s got style… a bit confusing for the Unix folks though. Since it’s the code isn’t in production yet, other character suggestions are welcome.
30 Comments
Francis
Sure it works for me. Google’s got a habbit of using single letters http://www.google.com/a/ is for the google apps for domains, and they have a couple others. Maybe pounce.com/n/g7yf for notes and /p/ for person or profile. Like that. I think even gallery (gallery.sf.net) uses things like /v/ for viewing a photo and /d/ for album (why d? maybe directory?).
It helps to plan it all out in advance because you never want it to change.
Leah
Francis - Good idea but ‘n/’ would be one extra character and someone’s username is already ‘n’…
Tamal
You could retool the pwnce.com domain into a tinyurl-type service.
http://pwnce.com/g7YF
If the short URL isn’t valid, then direct to the equivalent pownce.com URL (as it does now):
http://pwnce.com/leahculver -> http://pownce.com/leahculver
Andrew Gwozdziewycz
I’d remove vowels so you can’t make bad or inappropriate words in your urls. The result is something I like to call “bsafe” ids.
Myles Braithwaite
Django has a built in redirect thingy that would convert it to http://pownce.com/r//<model_id/ i.e. http://pownce.com/r/5/10/ see it here:
http://www.djangoproject.com/documentation/redirects/
Igor
How about a #…
Leah
Myles - I like the redirects app - it’s good for mapping old broken URLs to new ones. However, it’s not as efficient for tiny URLs. The “converting bases” way is one less db column since you don’t need to store the new URL. It’s also faster to add an entry, since no iteration is needed to verify uniqueness.
Leah
Andrew - Heh. I think ‘fck’ would be just as bad though…
Skyler Slade
I just wrote a base62 encoder for the tinysong website at Grooveshark. I left in similar-looking characters because I figured people would be copying/pasting the urls rather than trying to remember and type them. I do the insert into the table, grab the auto_incremented primary key, then base62 encode it. One of your commenters suggested removing vowels, but that limits the “fun” urls like http://tinysong.com/eaT :).
Adam
Francis’ idea could be implemented by just adding a letter to the start of the “code” - so n[code] - at least you have some future proofing should you ever want to allow short URLs for other areas of pownce. But, as you say - it’s an extra letter - is it worth it? Will you ever want to expand shortened URLs to other areas?
Oh - http://pownce.com/n/ 404s and if somebody has the account, surely you could give them a free premium account, apologise and politely move them along
~ is usually used to symbolize a user, so not really sure on the use of it here - hmmm….
Matt Froese
I don’t think you would really need to store the ids and real URLs separately. Think of Hex to base 10 or Oct to base 10. Create your own number system.
Dean Landolt
It seems like the fact that profile pages are accessed right off the root may really pinch later on. So at the very least, you need a namespace to accomodate, for lack of a better word, controller class methods like this one.
You’re certainly stuck with the http://pownce.com// route, but I bet it would pay dividends down the road to carve out a non-profile namespace to be extensible even farther past a tinyurl service, and the tilde may not be a bad symbol. So if you use the http://pownce.com/~g7YF/ pattern here you lose it just like you lost every other character to profile pages.
So if you went with something like http://pownce.com/~/n/g7YF you’ll be able to carve yourself out some breathing room for other extensions.
Mark
Take a look at Z-Base32. It’s a character selection that’s been specifically picked to be easy for people to dictate, etc. One of the interesting insights is that the extra bits you get from distinguishing between upper and lowercase are really not worth it in terms of how much harder they make the strings to read and transcribe.
Paul Smith
Quick comment: “basestring“ is a Python type, which you might not want to mask by having it be the name of an argument/local var.
Ohako
This works a treat! Thanks very much for your work. Oh, I found this page by Googling ‘making tiny urls’, and yours was the first site that had a roll-your-own approach. Very classy.
Mason
Fun with Ruby.
# My Ruby encoder
to_base = Proc.new {|number, table, base| base ||= table.length; number.zero? ? “” : to_base.call(number / base, table, base) + table[number % base].to_s }
# Leah’s example data
t = “23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ”.split(”")
n = 2477365
puts to_base.call(n, t)
# puts g7YF
Pastie: http://pastie.org/private/fiuddepzue7cbodjyfxoqq
Kristoph
I should think that using only one letter case would simplify communicating the url over a non-digital medium (case insensitive).
Also, I would probably use a sub-domain rather than a special character to make things simpler for the non-computer people.
So …
http://notes.pownce.com/abc123
(or if your into less typing)
http://n.pownce.com/abc123
James Bennett
Another way you can easily do this, assuming you know the content-type ID of the object, is to just drop in this URL pattern:
(r’^(?P\d+)/(?P\d+)/$’, ‘django.views.defaults.shortcut’),
And voila! Instant short URLs.
cpinto
@Mason even more fun with Python
gen_token=lambda n,cs=’23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ’:(lambda b=len(cs):cs[n%b]+gen_token((n-(n%b))/b))() if n else ”
usage: gen_token(n)
@Leah why not use a different host name for the redirect app? ie. http://r.pownce.com/
Abraham Estrada
The solution can be easy using the pownce.com/n/12345 idea. The urls uses regex so if you want to avoid that fron a username ‘n’ you just have to add the tiny url regex rule on top of the users rule. If the pownce.com/n/12345 is not valid then go to the pownce.com/n/
Tyler Menezes
Why not an exclamation mark? I’ve always liked them, and the only UNIX related use I can think of is `!!` on the command line, which inserts the last command. Actually, I think that’s only Linux, but anyways.
~ would probably be confusing, though.
Make sure, whatever you do, that you send a 301 Moved Permanently header. I mean, I’m sure you know that, but just thought I’d mention it. Really terrible for SEO otherwise.
Jon
Just wondering…what’s the benefit of url-encoding the note ids, when you can just use the note id itself?
Leah
James - The default view is great for using the ids and it’s nice to know it exists. However, I wanted to make the URL suffix shorter than the note id by using more url-safe chars.
kevin
http://clickontyler.com/blog/2007/10/foo9-url-shortener/
Chris Milton
Leah,
Have you considered the caret? “^”.
http://en.wikipedia.org/wiki/Caret
Chris
Mark
Late on the uptake, yes, but I love doing little things like this. It’s faster to push the characters to a list, reverse the list, and join the elements. About 15% faster. Also, you realize that this occasionaly (1/56th of the time) sticks a 0 (encoded as ‘2′) on the front? A do-while loop would be simpler, but Python misses out on that one:
def int_to_anybase(num, alphabet):
if (num == 0): return alphabet[0]
arr = []
while num:
rem = num % base
num = num // base
arr.append(alphabet[num])
arr.reverse()
return ”.join(arr)
Here’s a do-while in the language of the gods:
std::string int_to_anybase(unsigned num, const std::string &alphabet)
{
std::ostringstream out;
std::list list;
size_t base = alphabet.length();
do {
unsigned rem = num % base;
num /= base;
list.push_front(alphabet[rem]);
} while (num);
std::copy(list.begin(), list.end(), std::ostream_iterator(out));
return out.str();
}
Mark
My template arguments got zapped, but you get the idea.
Cesar Noel Quinon
I would be nice that pownce would have an tinyurl integrated within the system so that I would shorten the links on the post. Similar to what they did with Twhirl or Twitterfox
Charlie La Mothe
http://code.djangoproject.com/changeset/8162#file15
Mark Bate
nice idea.
will the permalink be optional or automagical?
It probably won’t matter either way, other than a little number crunching on the server.
as far as characters to use.. the tilde is quite nice, it’s a funky little character.
I personally think you should try and pick a character that plays with the permalink idea. generally permalinks are something someone thinks is important and should be drawn to attention, kind of like a pinned notice.
Characters that might resemble pushpins could be appropriate? eg. ‘ or ` (side on push pins?), *, +, @ (front on push pins?)