Using Query of Queries to improve performance
May 28, 2008
I'm pretty sure that this has been blogged about before but I thought it was important enough to write about as some people may not have seen it or forgotten!
I am currently building an application that requires a lot of database interaction. I've done all the usual stuff to improve performance such as query caching and also building indexes for the database (indexes are an essential part of optimising your application!) I'm now looking at using Query or Queries to reduce the amount of database interaction.
My current dilemma is whether I should pull out a large amount of unfiltered data from the database and cache it (for 30 mins), I can then uses Query or Queries to filter the results so that I only have to connect to the database once in that 30 minute period. I've found that this does improve performance, except for when the query cache has expired. It's a trade off, I was just wondering if I should have lots of fast executing cached queries or one cached query which is slow to execute every 30 minutes.
A prime example of this would be a e-commerce shop which has lots of products. For each product and category page you might be showing the newest products in the current category, products on offer in the current category, best sellers in the current category. This page would require three queries to be executed to build it. If you have 10 categories, this will result in 30 cached queries.
Individual cached queries approach:
<cfquery name="qryBestSellers" datasource="#Request.Datasource#" cachedWithin="#CreateTimeSpan(0,30,0,0)#">
SELECT
p.product_id
, p.product_title
, c.category_title
FROM Products p
INNER JOIN Categories c
ON ( p.product_category_id = c.category_id )
WHERE c.category_id = <cfqueryparam value="#Val(URL.category_id)#" cfsqltype="CF_SQL_INTEGER" />
ORDER BY
p.product_sold desc
</cfquery>
<cfquery name="qryNewProducts" datasource="#Request.Datasource#" cachedWithin="#CreateTimeSpan(0,30,0,0)#">
SELECT
p.product_id
, p.product_title
, c.category_title
FROM Products p
INNER JOIN Categories c
ON ( p.product_category_id = c.category_id )
WHERE c.category_id = <cfqueryparam value="#Val(URL.category_id)#" cfsqltype="CF_SQL_INTEGER" />
ORDER BY
p.product_created desc
</cfquery>
<cfquery name="qryOffers" datasource="#Request.Datasource#" cachedWithin="#CreateTimeSpan(0,30,0,0)#">
SELECT
p.product_id
, p.product_title
, c.category_title
FROM Products p
INNER JOIN Categories c
ON ( p.product_category_id = c.category_id )
WHERE c.category_id = <cfqueryparam value="#Val(URL.category_id)#" cfsqltype="CF_SQL_INTEGER" />
AND p.product_onoffer = <cfqueryparam value="1" cfsqltype="CF_SQL_BIT" />
ORDER BY
p.product_title
</cfquery>
Query of Queries approach
<!--- get all products from the database --->
<cfquery name="qryProducts" datasource="#Request.Datasource#" cachedWithin="#CreateTimeSpan(0,30,0,0)#">
SELECT
p.product_id
, p.product_onoffer
, p.product_title
, p.product_sold
, p.product_created
, c.category_title
FROM Products p
INNER JOIN Categories c
ON ( p.product_category_id = c.category_id )
</cfquery>
<cfquery name="qryBestSellers" dbtype="query">
SELECT
product_id
, product_title
, category_title
FROM qryProducts
WHERE category_id = <cfqueryparam value="#Val(URL.category_id)#" cfsqltype="CF_SQL_INTEGER" />
ORDER BY
product_sold desc
</cfquery>
<cfquery name="qryNewProducts" dbtype="query">
SELECT
product_id
, product_title
, category_title
FROM qryProducts
WHERE category_id = <cfqueryparam value="#Val(URL.category_id)#" cfsqltype="CF_SQL_INTEGER" />
ORDER BY
product_created desc
</cfquery>
<cfquery name="qryOffers" dbtype="query">
SELECT
product_id
, product_title
, category_title
FROM qryProducts
WHERE category_id = <cfqueryparam value="#Val(URL.category_id)#" cfsqltype="CF_SQL_INTEGER" />
AND product_onoffer = <cfqueryparam value="1" cfsqltype="CF_SQL_BIT" />
ORDER BY
product_title
</cfquery>
This is much faster to run than to connect to the database each time. Using cahcedWithin does pose one more problem, which is what happens if the database is changed whilst the query is cached. The way to solve this is to call the <cfobjectcache action="clear" /> tag when you update the database. However, you need to be aware that this clears the cache for the server, not just your application.
My conclusion is that as with most things, it depends. If you have 1000 categories and of those only 100 or so are viewed regualrly then you are proably better off using indivdual cahced queries, otherwise it is definately worth investigating the Query or Queries approach.
- Posted in:
- SQL
- ColdFusion
8 comments
Leave a comment
If you found this post useful, interesting or just plain wrong, let me know - I like feedback :)





I didn't know about cfobjectcache, but as you say it's a bit of a sledgehammer. A more granular way to flush cached queries is to specify the cachedwithin value as a variable which defaults to your normal timespan. When the data changes you can call the query with a zero timespan which will effectively flush the query (and only that one):
// Query (cached by default)
<cffunction name="getData" returntype="query">
<cfargument name="timespan" required="false" default="#createTimespan(0,0,30,0)#">
<cfset var q="">
<cfquery name="q" datasource="#dsn#" cachedwithin="#arguments.timespan#">
SELECT ...etc
</cfquery>
<cfreturn q>
</cffunction>
// To flush the query, call with a zero timespan
<cfset flushedData = getData(createTimespan(0,0,0,0)>
Comment by Julian Halliwell – May 30, 2008
You're right, that would work well in my example. I've never been a fan of this method, although it does work, as to clear the cache then you have to replicate all queries that access the data that has changed exactly (even the whitespace!). Also, it just seems odd having to call a select statement after every insert or update to the database when you don't actually want the data. Having said that, as you seem to have your SQL statements inside a CFC then I suppose you could quite easily change it to do that for you. For example:
<!--- Query (cached by default) --->
<cffunction name="getData" returntype="query">
<cfargument name="timespan" required="false" default="#createTimespan(0,0,30,0)#">
<cfset var q="">
<cfquery name="q" datasource="#dsn#" cachedwithin="#arguments.timespan#">
SELECT ...etc
</cfquery>
<cfreturn q>
</cffunction>
<cffunction name="updateData">
<cfargument name="params" required="true">
<cfset var q="">
<cfquery name="q" datasource="#dsn#">
UPDATE ...etc
</cfquery>
<!--- Clear the Query Cache --->
<cfset getData(timespan=CreateTimespan(0,0,0,-1))>
</cffunction>
Comment by John Whish – May 30, 2008
I know what you mean about running a query when you don't actually want any data, but it's worth an extra call to get the best of both worlds: cached data which is also responsive to changes (which are likely to be infrequent).
Comment by Julian Halliwell – May 30, 2008
Like you I have all my queries inside cfcs, although I do allow parameters to be passed to them (to build the where statements), which means I have multiple cached queries for the same table, so the cfobjectcache does the job (with a sledgehammer!). Using the CreateTimeSpan method and a DAO approach, you'd have something like this...
<cfcomponent>
<cffunction name="getAllRecords" returntype="query">
<cfargument name="timespan" required="false" default="#CreateTimespan(0,0,30,0)#" />
<!--- get all records from the table and cache --->
<cfset var q = "" />
<cfquery name="q" datasource="#dsn#" cachewithin="#Arguments.timespan#">
SELECT ...
</cfquery>
<cfreturn q />
</cffunction>
<cffunction name="getById" returntype="query">
<cfargument name="id" required="true" />
<!--- get the cached data --->
<cfset var qCached = getAllRecords() />
<cfset var q = "" />
<!--- query cached data to get actual record --->
<cfquery name="q" dbtype="query">
SELECT ...
FROM qCached
WHERE id = <cfqueryparam value="#Arguments.id#" cfsqltype="CF_SQL_INTEGER" />
</cfquery>
<cfreturn q />
</cffunction>
<cffunction name="update">
<cfargument name="params" required="true" />
<cfset var q = "" />
<!--- query cached data to get actual record --->
<cfquery name="q" datasource="#dsn#">
UDPATE...
</cfquery>
<!---
Need to clear the cache, can use:
<cfobjectcache action="clear" />
or
<cfset getAllRecords(timespan=CreateTimespan(0,0,0,-1)) />
--->
<cfset getAllRecords(timespan=CreateTimespan(0,0,0,-1)) />
</cffunction>
</cfcomponent>
Is that how you build your CFCs?
Comment by John Whish – May 30, 2008
<cfunction name="update" returntype="void">
<cfscript>
instance.DAO.update(newstuff);
flushRecords();
</cfscript>
</cffunction>
<cffunction name="flushRecords" returntype="void">
<cfset instance.DAO.getAllRecords(createTimespan(0,0,0,0))>
</cffunction>
That way you can use it for deletes or any other change operations as well. But how you dish things things out will depend on your preferred model architecture.
Comment by Julian Halliwell – May 30, 2008
I have a different approach that I use for my applications and I wanted to get your thoughts on it. In the example below I extract all the addresses from my database and store them in the application scope. I then do a query of queries whenever I need to get the data, parsing in any parameters, etc.
<!--- Extract addresses from database --->
<cffunction name="getAddresses" access="public" returntype="query" hint="Extract addresses from database">
<cfset var getAddresses="" />
<cfif not StructKeyExists(application,"getAddresses")>
<cflock type="exclusive" scope="application" timeout="30">
<cfquery name="application.getAddresses" datasource="#variables.ds#" username="#variables.dsUsername#" password="#variables.dsPassword#">
select *
from addresses
order by town
</cfquery>
</cflock>
</cfif>
<cfquery name="getAddresses" dbtype="query">
select *
from application.getAddresses
</cfquery>
<cfreturn getAddresses />
</cffunction>
If I want to extract one address I can do so as follows and no additional calls to my database are required.
<!--- Extract address from database --->
<cffunction name="getAddress" access="public" returntype="struct" hint="Extract address from database" output="false">
<cfargument name="addressID" type="numeric" required="true" default="0" />
<cfscript>
var getAddress="";
var getAddresses=application.addressesCFC.getAddresses();
var address="";
</cfscript>
<cfquery name="getAddress" dbtype="query">
select *
from getAddresses
where addressID=<cfqueryparam value="#arguments.addressID#" cfsqltype="cf_sql_integer" />
</cfquery>
<cfscript>
address=application.addressesCFC.defineDefaults();
if(getAddress.RecordCount){
StructUpdate(address,"addressID",getAddress.addressID);
StructUpdate(address,"address1",getAddress.address1);
StructUpdate(address,"address2",getAddress.address2);
StructUpdate(address,"address3",getAddress.address3);
StructUpdate(address,"town",getAddress.town);
StructUpdate(address,"county",getAddress.county);
StructUpdate(address,"postcode",getAddress.postcode);
StructUpdate(address,"country",getAddress.country);
StructUpdate(address,"telephone",getAddress.telephone);
StructUpdate(address,"fax",getAddress.fax);
StructUpdate(address,"googleMap",getAddress.googleMap);
}else{
StructInsert(address,"custom404",1);
}
return address;
</cfscript>
</cffunction>
Finally, if I modify any of the addresses in my database I can call the following function to clear my initial query from the application scope.
<!--- Remove addresses query from application scope --->
<cffunction name="resetAddresses" access="public" returntype="void" hint="Remove addresses query from application scope" output="false">
<cfset var temp=StructDelete(application,"GetAddresses") />
</cffunction>
Is there any benefit to using cachedwithin to store the query instead of the application scope?
Many thanks,
Simon
Comment by Simon Bingham – June 10, 2008
I too have my db calls inside CFCs. The example above was largely to demonstrate my point, rather than wade into the ORM/DAO/Gateway debate.
"Is there any benefit to using cachedwithin to store the query instead of the application scope?"
Blimey that's a big question and I'm not an expert on this. I'm guessing that you also store your CFC in the application scope as well, in which case it does seem logical to hold the data in the application scope as well.
I believe the advantages of cachedwithin over application scope are:
? Each query can have it's own timespan for caching. So a table that is regularly updated may be cached for 10 mins, whereas one that rarely changes could be cached for 24 hours.
? ColdFusion intelligently controls the cached queries so that if the number of cached queries exceed the cache limit then ColdFusion will flush them. With the application scope it will just fill up the memory and potentially take your server out (or at least affect the performance).
? If you're using the application scope then you need to be using locks (even if they are read only) which adds overhead.
? The query cache is specifically designed to cache queries so I'm guessing that the performance should be pretty good.
? Finally, you could never do <cfdump ="#application#" /> !!!
If your interested in my thoughts on your code then (and remember there is no one correct way!) my thoughts are:
? In your getAddresses function, if the variable "application.getAddresses" doesn't exist then you run the cfquery and then query the data again in your Query of Query. If the variable exists just return it, if it doesn't run the query, then return it. I'm not sure what the QofQ is for.
? In your getAddress function I'd be tempted to loop over getAddress.columnlist and dynamically populate the with the StructInsert function. This means that if you ever add a column then it will just appear in your struct :-)
I hope this is useful. Feel free to disagree!
Comment by John Whish – June 10, 2008
Thank you for your comments.
Caching seems to be a better approach. I can certainly see the benefits of it.
"In your getAddresses function, if the variable 'application.getAddresses' doesn't exist then you run the cfquery and then query the data again in your Query of Query. If the variable exists just return it, if it doesn't run the query, then return it. I'm not sure what the QofQ is for."
I agree. The addresses code is a bad example really. I usually use the query of queries if I need to pass any parameters into the query.
Simon
Comment by Simon Bingham – June 10, 2008