I felt a bit stupid on Saturday when I came across this nice chunk of code:

I’m looking to find 18 random friends of a user to display on their homepage.
So why is this so stupid? This bit of code runs on every member page including friend pages and is a fairly slow query. Django QuerySets are lazy, meaning that they aren’t evaluated until their data is needed. So I wasn’t really caching the result of this query at all. Here’s my current fix:

If anyone has any better suggestions, please let me know. I’m especially interested in hearing a runtime comparison of sorting by MySQL to Python random.shuffle to list.pop. Say the average user has 100 friends and the limit is usually 18 random friends.
Damn it. Perhaps Pownce just needs a Top 18.
22 Comments
Jeremy Osterhouse
If your problem is only that QuerySets are lazy, then why not just force its evaluation?
Offhand, you could cache list(f) instead of f.
Leah Culver
Yeah, sorry I forgot to mention that self.get_friends() returns a list. It performs list(self.friends.all()) which forces the QuerySet to evaluate.
Ryan
Hey Leah,
I think the ability to tag friends that you want on your homepage (as an alternative to random) would be nice.
Seems like it could be strange randomly seeing different friends with each subsequent login, if you know you’ll frequently want specific friends there.
BTW, did you just take an image of that code in your editor and link to it? heh
Richard Kiss
I believe what Jeremy is suggesting is the clever idea to add something small to your original code.
From
CACHE_FRIENDS_LIST = ‘friends_of_’
PROFILE_FRIENDS = 18
# get the friends list
f = cache.get(CACHE_FRIENDS_LIST+str(u.id))
if not f:
- f = m.friends.order_by(’?')[:PROFILE_FRIENDS]
+ f = list(m.friends.order_by(’?')[:PROFILE_FRIENDS])
cache.set(CACHE_FRIENDS_LIST+str(u.id), f)
Good job on Pownce, by the way. If you pass on an invite, I’ll sign up.
x
Random is beautiful. I personally don’t agree with the ramifications of choosing order for my “friends”.
pseudo-code (10 or so minutes in AS.)
property kRandomness : {}
on run
set kFriends to {”1″, “2″, “3″, “4″, “5″, “6″, “7″, “8″, “9″, “10″}
set kMaxRandFriends to 4
set kFriendCount to count of kFriends
set kRandFriends to {}
set kRandomness to {}
repeat kMaxRandFriends times
set kRandTemp to kGenerateRandom(kFriendCount, kRandomness)
set end of kRandomness to kRandTemp
set end of kRandFriends to item kRandTemp of kFriends
end repeat
return kRandFriends
end run
on kGenerateRandom(kFriendCount, kList)
set kRandomNumber to (random number from 1 to kFriendCount)
if kRandomness contains kRandomNumber then
kGenerateRandom(kFriendCount, kList)
else
return kRandomNumber
end if
end kGenerateRandom
Sorry - for the awful language choice of the moment, but compare and contrast and you might find something useful.
And when you get that working, add code to make the viewing user list first in the random list. ; )
Leah Culver
Richard - I was trying to get 2 birds with 1 stone in this re-write by not doing the SQL orderby. The list() takes care of the QuerySet evaluation, but I was trying to make getting random friends more efficient in general. Also, I like the OO feel of the get_random_friends inside of the UserProfile object.
Richard Kiss
If I had to guess, I would suspect that the SQL orderby would be faster. Of course, to really know, one would have to test, which I’m too lazy to do.
Just for kicks, here’s a more concise version of your new method using built-in Python libraries:
def get_random_friends(self, limit=None):
all_friends = self.get_friends()
if not limit or limit >= len(all_friends):
random.shuffle(all_friends)
return all_friends
return random.sample(all_friends, limit)
As always, not very tested.
Vijay Chakravarthy
Actually, you should just use a reservoir algorithm.
http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fportal.acm.org%2Fft_gateway.cfm%3Fid%3D3165%26type%3Dpdf%26coll%3DACM%26dl%3DACM%26CFID%3D15151515%26CFTOKEN%3D6184618&ei=WJW3Rof2K5WeepqzqdMK&usg=AFQjCNFpI3bFikNqwd1Iu-OSj-aBASt3iQ&sig2=OBPHmgTNmd_7dI7epe_xwQ
The basic idea is - if you want 18 random friends, then fetch the first 18 out of say 200, then incrementally (based on a random number) toss out or keep subsequent ones.
If you keep, then obviously you will need to remove one from the reservoir.
This will work even for large datasets, and you can choose a cutoff easily..
Manish
Leah, you could try store an index with each friend, then inside the UserProfile object generate 18 random numbers and do “select * from FRIENDS_TABLE where FRIEND_INDEX IN (….)”
This might be a better solution than “order by” or storing the whole friend list in the cache. Think of the scenario when pownce has 100-500k+ users and friends count gets upto 500-1000+.
Jerome Chan
So the order is random! Could you turn off the randomness when an account has 18 or lesser friends? There’s no point in randomizing it.
http://pownce.com/eviltofu/notes/452606/
Sam
Hey,
this isn’t strictly relevant to the post, but how come you chose python for Pownce. I’m fairly new to web development, and I just wondered if I should stick to PHP or if it’s worth using something else that may be better.
Thanks.
JOSH
When creating friend sets:
if you dont select a friend to the new set it still creates the set, but when you try to add a friend and Create, it basicly tries to create that set label again which then already exists.
There should be some extra logic in there
emad
Leah,
At YellowBot, we sort based on when your friends were last logged in…a way for you to know that activity (i.e. they wrote a review or tagged a location) may have just happened with those you want to keep track of.
If you want to sort randomly, you can also use this in mysql: SELECT … FROM … ORDER BY rand() LIMIT 18;

qubitz
m.friends.sort_by{rand}[1..18]
Cameron
Leah,
I know you don’t like Ruby, so I had to show you a nice solution…
m.friends.values_at(*(0..m.length-1).to_a.sort_by{rand}[0..17])
This generates 18 random indices and selects those friends.
matt
When you post Django code could you post a link to dpaste (http://dpaste.com) so we (I…) don’t have have to retype everything? Thanks. Later.
Leah Culver
Matt - I love dpaste, however it deletes stuff after 30 days if not viewed by anyone.
Sorry about the images. I’m pretty lazy.
Joe Stump
Just wanted to echo the “SELECT * FROM foo WHERE userID = 5 ORDER BY RAND LIMIT 0,18″ statement. Since you’re doing a (presumably) primary key lookup on userID it’ll be highly optimized so, considering most users have less than a few hundred friends, ordering by RAND() isn’t horrible.
That all being said, ordering by RAND() is normally not a good thing because MySQL selects out the entire set and then randomly selects rows until your LIMIT is fulfilled.
Mike Malone
“If you want to sort randomly, you can also use this in mysql: SELECT … FROM … ORDER BY rand() LIMIT 18;”
Bad idea if you have a large dataset. Doing an order by rand() in MySQL will force the SQL engine to generate a random number for ever item in your result set, then do a sort based on those numbers… which is fine if you have a small result set, but gets very icky when you get into thousands of rows (or hundreds of thousands).
wes
Leah — as of yesterday, it appears that anyone with less than 18 friends isn’t having their friends list show up at all. Just an FYI!
Zelnox
That’s a nice font for code. What is it called?
Guilherme
I don’t think I like the Idea of random friends. Maybe you should use other criteria, like in Orkut (which is very popular here in Brazil), it lists the friends with the most recent login (at least I think that’s how they do). Other idea would be to rank your friends by number of interactions, those who leaves more messages, or something like that.
Great work!
Still waiting for my invitation