My 18 Random Friends

I felt a bit stupid on Saturday when I came across this nice chunk of code:

Random Friends Code Before

I’m looking to find 18 random friends of a user to display on their homepage.

So why is this so stupid? This bit of code runs on every member page including friend pages and is a fairly slow query. Django QuerySets are lazy, meaning that they aren’t evaluated until their data is needed. So I wasn’t really caching the result of this query at all. Here’s my current fix:

Random Friends Code After

If anyone has any better suggestions, please let me know. I’m especially interested in hearing a runtime comparison of sorting by MySQL to Python random.shuffle to list.pop. Say the average user has 100 friends and the limit is usually 18 random friends.

Damn it. Perhaps Pownce just needs a Top 18.

22 Comments

  1. Posted August 5, 2007 at 3:02 pm | Permalink

    If your problem is only that QuerySets are lazy, then why not just force its evaluation?

    Offhand, you could cache list(f) instead of f.

  2. Posted August 5, 2007 at 4:13 pm | Permalink

    Yeah, sorry I forgot to mention that self.get_friends() returns a list. It performs list(self.friends.all()) which forces the QuerySet to evaluate.

  3. Posted August 5, 2007 at 8:16 pm | Permalink

    Hey Leah,

    I think the ability to tag friends that you want on your homepage (as an alternative to random) would be nice.

    Seems like it could be strange randomly seeing different friends with each subsequent login, if you know you’ll frequently want specific friends there.

    BTW, did you just take an image of that code in your editor and link to it? heh

  4. Posted August 5, 2007 at 10:10 pm | Permalink

    I believe what Jeremy is suggesting is the clever idea to add something small to your original code.

    From

    CACHE_FRIENDS_LIST = ‘friends_of_’
    PROFILE_FRIENDS = 18

    # get the friends list
    f = cache.get(CACHE_FRIENDS_LIST+str(u.id))
    if not f:
    - f = m.friends.order_by(’?')[:PROFILE_FRIENDS]
    + f = list(m.friends.order_by(’?')[:PROFILE_FRIENDS])
    cache.set(CACHE_FRIENDS_LIST+str(u.id), f)

    Good job on Pownce, by the way. If you pass on an invite, I’ll sign up.

  5. x

    Posted August 5, 2007 at 10:55 pm | Permalink

    Random is beautiful. I personally don’t agree with the ramifications of choosing order for my “friends”.

    pseudo-code (10 or so minutes in AS.)

    property kRandomness : {}

    on run
    set kFriends to {”1″, “2″, “3″, “4″, “5″, “6″, “7″, “8″, “9″, “10″}

    set kMaxRandFriends to 4
    set kFriendCount to count of kFriends

    set kRandFriends to {}
    set kRandomness to {}

    repeat kMaxRandFriends times
    set kRandTemp to kGenerateRandom(kFriendCount, kRandomness)
    set end of kRandomness to kRandTemp
    set end of kRandFriends to item kRandTemp of kFriends
    end repeat

    return kRandFriends
    end run

    on kGenerateRandom(kFriendCount, kList)
    set kRandomNumber to (random number from 1 to kFriendCount)
    if kRandomness contains kRandomNumber then
    kGenerateRandom(kFriendCount, kList)
    else
    return kRandomNumber
    end if
    end kGenerateRandom

    Sorry - for the awful language choice of the moment, but compare and contrast and you might find something useful.

    And when you get that working, add code to make the viewing user list first in the random list. ; )

  6. Posted August 6, 2007 at 10:21 am | Permalink

    Richard - I was trying to get 2 birds with 1 stone in this re-write by not doing the SQL orderby. The list() takes care of the QuerySet evaluation, but I was trying to make getting random friends more efficient in general. Also, I like the OO feel of the get_random_friends inside of the UserProfile object.

  7. Posted August 6, 2007 at 1:54 pm | Permalink

    If I had to guess, I would suspect that the SQL orderby would be faster. Of course, to really know, one would have to test, which I’m too lazy to do.

    Just for kicks, here’s a more concise version of your new method using built-in Python libraries:

    def get_random_friends(self, limit=None):
    all_friends = self.get_friends()
    if not limit or limit >= len(all_friends):
    random.shuffle(all_friends)
    return all_friends
    return random.sample(all_friends, limit)

    As always, not very tested.

  8. Posted August 6, 2007 at 2:43 pm | Permalink

    Actually, you should just use a reservoir algorithm.
    http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fportal.acm.org%2Fft_gateway.cfm%3Fid%3D3165%26type%3Dpdf%26coll%3DACM%26dl%3DACM%26CFID%3D15151515%26CFTOKEN%3D6184618&ei=WJW3Rof2K5WeepqzqdMK&usg=AFQjCNFpI3bFikNqwd1Iu-OSj-aBASt3iQ&sig2=OBPHmgTNmd_7dI7epe_xwQ

    The basic idea is - if you want 18 random friends, then fetch the first 18 out of say 200, then incrementally (based on a random number) toss out or keep subsequent ones.
    If you keep, then obviously you will need to remove one from the reservoir.

    This will work even for large datasets, and you can choose a cutoff easily..

  9. Posted August 6, 2007 at 5:49 pm | Permalink

    Leah, you could try store an index with each friend, then inside the UserProfile object generate 18 random numbers and do “select * from FRIENDS_TABLE where FRIEND_INDEX IN (….)”

    This might be a better solution than “order by” or storing the whole friend list in the cache. Think of the scenario when pownce has 100-500k+ users and friends count gets upto 500-1000+.

  10. Posted August 7, 2007 at 5:52 am | Permalink

    So the order is random! Could you turn off the randomness when an account has 18 or lesser friends? There’s no point in randomizing it.

    http://pownce.com/eviltofu/notes/452606/

  11. Sam

    Posted August 7, 2007 at 6:41 am | Permalink

    Hey,

    this isn’t strictly relevant to the post, but how come you chose python for Pownce. I’m fairly new to web development, and I just wondered if I should stick to PHP or if it’s worth using something else that may be better.

    Thanks.

  12. Posted August 7, 2007 at 8:21 am | Permalink

    When creating friend sets:

    if you dont select a friend to the new set it still creates the set, but when you try to add a friend and Create, it basicly tries to create that set label again which then already exists.

    There should be some extra logic in there

  13. Posted August 7, 2007 at 11:10 am | Permalink

    Leah,

    At YellowBot, we sort based on when your friends were last logged in…a way for you to know that activity (i.e. they wrote a review or tagged a location) may have just happened with those you want to keep track of.

    If you want to sort randomly, you can also use this in mysql: SELECT … FROM … ORDER BY rand() LIMIT 18;
    :-)

  14. qubitz

    Posted August 8, 2007 at 3:07 pm | Permalink

    m.friends.sort_by{rand}[1..18]

  15. Cameron

    Posted August 8, 2007 at 3:31 pm | Permalink

    Leah,

    I know you don’t like Ruby, so I had to show you a nice solution…

    m.friends.values_at(*(0..m.length-1).to_a.sort_by{rand}[0..17])

    This generates 18 random indices and selects those friends.

  16. Posted August 9, 2007 at 10:18 am | Permalink

    When you post Django code could you post a link to dpaste (http://dpaste.com) so we (I…) don’t have have to retype everything? Thanks. Later.

  17. Posted August 9, 2007 at 2:47 pm | Permalink

    Matt - I love dpaste, however it deletes stuff after 30 days if not viewed by anyone.

    Sorry about the images. I’m pretty lazy.

  18. Posted August 9, 2007 at 11:33 pm | Permalink

    Just wanted to echo the “SELECT * FROM foo WHERE userID = 5 ORDER BY RAND LIMIT 0,18″ statement. Since you’re doing a (presumably) primary key lookup on userID it’ll be highly optimized so, considering most users have less than a few hundred friends, ordering by RAND() isn’t horrible.

    That all being said, ordering by RAND() is normally not a good thing because MySQL selects out the entire set and then randomly selects rows until your LIMIT is fulfilled.

  19. Posted August 10, 2007 at 9:28 pm | Permalink

    “If you want to sort randomly, you can also use this in mysql: SELECT … FROM … ORDER BY rand() LIMIT 18;”

    Bad idea if you have a large dataset. Doing an order by rand() in MySQL will force the SQL engine to generate a random number for ever item in your result set, then do a sort based on those numbers… which is fine if you have a small result set, but gets very icky when you get into thousands of rows (or hundreds of thousands).

  20. wes

    Posted August 13, 2007 at 8:50 am | Permalink

    Leah — as of yesterday, it appears that anyone with less than 18 friends isn’t having their friends list show up at all. Just an FYI! :)

  21. Zelnox

    Posted August 17, 2007 at 10:39 am | Permalink

    That’s a nice font for code. What is it called?

  22. Guilherme

    Posted August 17, 2007 at 1:37 pm | Permalink

    I don’t think I like the Idea of random friends. Maybe you should use other criteria, like in Orkut (which is very popular here in Brazil), it lists the friends with the most recent login (at least I think that’s how they do). Other idea would be to rank your friends by number of interactions, those who leaves more messages, or something like that.

    Great work!

    Still waiting for my invitation ;)