Django Site Module

So I finally took a look at the queries I’m generating with Django. Wow, my queries suck. Here’s an example of the suckiness:

SELECT `django_site`.`id`,`django_site`.`domain`,`django_site`.`name` FROM `django_site` WHERE (`django_site`.`id` = 1)
SELECT `django_site`.`id`,`django_site`.`domain`,`django_site`.`name` FROM `django_site` WHERE (`django_site`.`id` = 1)
SELECT `django_site`.`id`,`django_site`.`domain`,`django_site`.`name` FROM `django_site` WHERE (`django_site`.`id` = 1)

Three times??

The Django Site module is a funny beast. It consists of a database table with columns id, name, and domain. Then in the settings file, there is a SITE_ID which is used to get the current site from the database.

Umm… Why not just put the site name and domain in the settings file? Why do the database query? Especially since each site already has a unique SITE_ID in the settings file? Is there something I’m missing here?

Brad rewrote this post as:

So there be this bitch of a module for the DJANGO that does site shit but it’s always be like querying the database for the same shits all overs the time over and over again and it be like doing nthing useful so I removed it and then it all did the same thing but without the site module so I recommend to you, by lloyalest readers, that you not to be using it plz k thx.

ThE DNDS.

16 Comments

  1. Posted July 17, 2007 at 5:57 pm | Permalink

    Can’t believe you actually posted my über-intelligent 15 second version.

  2. Posted July 17, 2007 at 6:21 pm | Permalink

    I think I can answer this one, but if someone smarter wants to come along and confirm or correct me, that’s cool too.

    Django was originally built as part of Ellington (a commerical CMS that I worked on for a year and a half). Ellington (and therefore, Django), supports multiple sites being served from one database. The use case that inspired this is so:

    A newspaper company (The Lawrence Journal-World, where I worked) has several media properties. For a partial example, they have a news site (LJWorld.com), an entertainment site (Lawrence.com), and a sports site (KUSports.com). Often, these sites will share the same content. A story may be on both LJWorld.com and Lawrence.com, or on KUSports.com and LJWorld.com, for example.

    The sites module facilitates this. By a model (like, say, Story) having a ManyToMany relationship with sites, one story can belong to multiple sites. And, it’s easy for editors and reporters (i.e. non-technical folks) to assign them to the appropriate sites from the admin area.

    That’s the long story. The short story is that if you only have one site (like, say, pownce.com), you don’t need (or probably want) to use the sites app.

  3. Posted July 17, 2007 at 7:15 pm | Permalink

    Yup, Jeff pretty much nailed it.

    As for why it’s a table: if it wasn’t a table then objects couldn’t (properly) refer back to it. Referential integrity’s extremely important to data cleanlieness; it would violate any number of best practices to have a site ID that didn’t refer back to a table.

    Still, I’m curious about what situations your running into that generate so many queries against the sites table; that doesn’t happen anywhere in my code. I have to wonder if there might be a bug somewhere that causes all those extra queries; think you might be able to share the offending code so we can take a look?

  4. Posted July 17, 2007 at 8:53 pm | Permalink

    Jeff - thanks for the explanation. I can see how it was useful in that case, but now it seems a bit particular to content publishing applications. I hope that as Pownce grows, we might contribute our own stuff to Django (perhaps with a different slant).

  5. Posted July 17, 2007 at 8:59 pm | Permalink

    Jacob - I was using it in a template context processor so my designer wouldn’t hard-code the site name and domain. In fact, Pownce went by a code name during most of the development. Anyways, as soon as I pointed the context processor code to a static string in the settings file, the queries went away. When are the template context processors evaluated? Is it per-page and per-usage?

  6. Posted July 17, 2007 at 10:03 pm | Permalink

    I also like how it’s SELECTing django_site.id when it already has that information.

  7. Posted July 17, 2007 at 10:25 pm | Permalink

    you know those posts you wrote about computer science classes? i wish my school would have merged programming and sql courses, now i know how to do sql, and i know how to program but i don’t know how to make them work together well.

  8. Posted July 18, 2007 at 12:48 am | Permalink

    I tried to remove the sites module from my django project to but then the flatpages fallback middleware complained, because there is a dependency on the sites module …

  9. Posted July 18, 2007 at 4:09 am | Permalink

    Leah - I had the same problem and found a fairly simple solution…
    I’ve added the two variables to the settings.py file

    ie.:
    SITE_NAME = ‘Something’
    SITE_URL = ‘http://something.com’

    and then import them into context_processors.py with:
    from django.conf import settings.SITE_NAME, settings.SITE_URL

    This works really well for one-site projects (the majority of my sites). While I do understand the usefulness of contrib.sites for larger multi-sites, I think there’s too much overhead for one-site projects.

    BTW, great job with Pownce! I’ve just got my invite yesterday and have yet to get my friends to join, but I like it so far.

  10. Posted July 18, 2007 at 6:01 am | Permalink

    You know, you wouldn’t having this problem if you’d used Ruby on Rails. Sorry, I had to. Blame Canada? P.S. Hello from a fellow Minnesotan.

  11. Posted July 18, 2007 at 9:52 am | Permalink

    @Leah: ah, that makes sense — context processors are executed each time a context (er, a RequestContext, that is) is initialized; depending on how things are written multiple contexts could be created on each request.

    Sounds like your fix was easy enough; it should also be trivial to modify the context processor to cache its data (it’s a good practice to cache things like context processors if possible since it can be hard to predict how often they’ll get called).

    @Mark: you’re probably just trolling, but I’ll pretend you asked a question: when you do an ORM lookup in Django it selects all columns by default (including whatever columns you used in the lookup). That’s because the time it takes for a database to return all columns is almost identical to the time it takes to return a subset of columns. In fact, in some edge cases fewer columns can actually be slower.

    Oh, and Django enumerates the columns explicitly because SELECT * is evil.

  12. Posted July 18, 2007 at 10:21 am | Permalink

    I’m not sure what the request model looks like for Django, but we cache read-only queries like this in the request object so that it only gets fetched once per request. There’s also a tier of memcached between our webapps and the database, but making sure cached stuff doesn’t go stale is tricky.

  13. Posted July 18, 2007 at 2:27 pm | Permalink

    Bob and Jacob - thanks for the tip about caching the context processors! I’ll do that.

  14. Posted July 18, 2007 at 6:26 pm | Permalink

    Django ate my baby!

    (I had to say it)

  15. Johnson Rice

    Posted July 18, 2007 at 8:14 pm | Permalink

    Doubtfully a good time…

    I’m just thinking the “Public” “Private” and “# Recipients” labels on each message should be subtly color coded so they stand out slightly more.

    At least private messages should be… I’d like to know a little more clearly when a message is directed AT me specifically, or from me to someone specifically.

    Although, that suggestion is probably ill timed and out of place here… Maybe.. I dunno. *shrug*

  16. Posted July 25, 2007 at 6:57 pm | Permalink

    Do you use any special IDE to write your python code? It seems outside of pydev or wingide, there aren’t really any good python IDE’s. Pydev and wingide aren’t that good anyways, so I’ve went back to using good old emacs to do the job.