Use Subqueries - they are fast!
I was looking through some legacy code to track down why a product export script was timing out and came across this (I've formatted it so that it fits into a blog post a bit better ).
<!--- get product information --->
<cfquery name="qryProductExport" datasource="#dsn#">
SELECT
product_id
, product_title
, product_description
, product_costexvat
, product_imagemain
, option_stockcode
, option_id
, option_title
, category_title
FROM [Products]
INNER JOIN [Options]
ON ( product_id = option_product_id )
INNER JOIN [Categories]
ON ( category_id = product_category_id )
WHERE product_active = 1
ORDER BY product_title
</cfquery>
<!--- get minimum product quantity that can be ordered --->
<cfquery name="qryMinQty" datasource="#dsn#">
SELECT
MIN(pricebreak_quantityMin) AS pricebreak_quantityMin
, pricebreak_product_id
FROM [PriceBreaks]
WHERE pricebreak_product_id in (
SELECT product_id
FROM [Products]
WHERE product_active = 1
)
GROUP BY pricebreak_product_id
</cfquery>
<!--- add new column to store min quantity in --->
<cfset QueryAddColumn(
qryProductExport
, "min_quantity"
, ArrayNew(1) )
/>
<cfloop query="qryProductExport">
<!--- do this with QofQ to reduce database I/O... --->
<cfquery name="qryProductMinQty" dbtype="query">
SELECT
pricebreak_quantityMin
FROM qryMinQty
WHERE pricebreak_product_id =
<cfqueryparam
value="#qryProductExport.product_id#"
cfsqltype="cf_sql_integer"
/>
</cfquery>
<!--- update min quantity column for this product... --->
<cfset QuerySetCell(
qryProductExport
, "min_quantity"
, qryProductMinQty.pricebreak_quantityMin
, qryProductExport.CurrentRow )
/>
</cfloop>
The code above was actually timing out - the loop is where it grinds to a halt. It probably worked fine when there were a hundred product options. A while back I posted about how the performance gain with subquery instead of group. In this instance a subquery can be used to get the data that the section query was pulling out of the database. By rewriting the SQL statement to use a subquery to get the minimum quantity, instead of two seperate queries which are then combined, I've got the execution time down to a few seconds.
Here is the new code using a subquery.
<cfquery name="qryProductExport" datasource="#dsn#">
SELECT
product_id
, product_title
, product_description
, product_costexvat
, product_imagemain
, option_stockcode
, option_id
, option_title
, category_title
, (
SELECT MIN(pricebreak_quantityMin)
FROM PriceBreaks
WHERE Products.product_id = PriceBreaks.pricebreak_product_id
) AS min_quantity
FROM [Products]
INNER JOIN [Options]
ON ( product_id = option_product_id )
INNER JOIN [Categories]
ON ( category_id = product_category_id )
WHERE product_active = 1
ORDER BY product_title
</cfquery>
The point of this post? Often when we need to get a specific recordset from a database we tend to use our favourite language to manipulate the data, when in fact by learning a bit more SQL you can make a huge difference to performance and write cleaner code. I'm not an SQL guru so if I can do it so can you!
After some really good feedback from Ben Nadel and Brian Kotek (two guys who really know their stuff!) You would probably be better off using this version of the query as it should cope better with large tables. You can see the speeds of the two queries in the comments below.
<cfquery name="qryProductExport" datasource="#dsn#">
SELECT product_id
, product_title
, product_description
, product_updated
, product_costexvat
, product_imagemain
, option_stockcode
, option_id
, option_title
, category_title
, min_quantity
FROM [Products]
INNER JOIN [Options]
ON ( product_id = option_product_id )
INNER JOIN [Categories]
ON ( category_id = product_category_id )
INNER JOIN (
SELECT
pricebreak_product_id
, MIN(pricebreak_quantityMin) AS min_quantity
FROM PriceBreaks
GROUP BY pricebreak_product_id
) AS TempRecordSet
ON ( Products.product_id = TempRecordSet.pricebreak_product_id )
WHERE product_active = 1
ORDER BY product_title
</cfquery>
- Posted in:
- SQL
- ColdFusion


Comment by Simon Bingham – September 05, 2008
SELECT
FROM
. . . [Products]
INNER JOIN
(
. . . SELECT
. . . . . pricebreak_product_id,
. . . . . MIN(pricebreak_quantityMin)
. . . FROM PriceBreaks
. . . GROUP BY pricebreak_product_id
) AS temp
ON
. . . Products.product_id = temp.pricebreak_product_id
I find that depending on the amount of data, this can perform pretty fast because it's only doing the derived table once rather than the sub-select for each record.
Comment by Ben Nadel – September 05, 2008
@Ben, Really good point. I'm going to try your suggestion that to see which is faster. When I changed the code I decided not to do it like that as I have a lot of products that are inactive which would be included in the subquery recordset so figured it would be faster but there is only one way to find out...!
Comment by John Whish – September 05, 2008
The results are (on SQL Server 2000):
john : 2750ms
ben : 2547ms
Comment by John Whish – September 05, 2008
Comment by Ben Nadel – September 05, 2008
john : 2656ms
ben : 2719ms
For reference the other SQL statement is:
SELECT product_id
, product_title
, product_description
, product_updated
, product_costexvat
, product_imagemain
, option_stockcode
, option_id
, option_title
, category_title
, min_quantity
FROM [Products]
INNER JOIN [Options] ON ( product_id = option_product_id )
INNER JOIN [Categories] ON ( category_id = product_category_id )
INNER JOIN (
SELECT
pricebreak_product_id
, MIN(pricebreak_quantityMin) as min_quantity
FROM PriceBreaks
GROUP BY pricebreak_product_id
) as TempRecordSet
ON ( Products.product_id = TempRecordSet.pricebreak_product_id )
WHERE product_active = 1
ORDER BY product_title
Comment by John Whish – September 05, 2008
Comment by John Whish – September 05, 2008
Comment by Ben Nadel – September 05, 2008
Actually, it's been interesting to compare techniques; I'm tempted to test further when I have the time to see which is best under different conditions. Thanks for your input Ben.
Comment by John Whish – September 05, 2008
I think it has to do with the ratio of record to sub-queries.
Comment by Ben Nadel – September 05, 2008
Comment by Brian Kotek – September 05, 2008
Comment by John Whish – September 06, 2008
Still, I think the readability of the code is more important (up until you run into a performance problem). When I look at your subquery example and the derived table example, one looks much clearer to me.
To me, the subquery accurately depicts what you are doing. The other approach is a bunch of code to try to get the same result, but without the same clarity.
Next, imagine if you want a different subquery from another table. What will the code look like then? Will the purpose be clear when you return to it in a year?
Comment by Steve Bryant – September 08, 2008
Normally I'm not a fan of premature optimization, but this is such a minor difference that personally I wouldn't see any good reason to use the subquery when it is almost guaranteed to be an issue in the future, since databases only get bigger in nearly all cases. Again though, this may just be my own personal bias and experience. I learned the hard way many years ago about the problems that subqueries can cause over time.
Comment by Brian Kotek – September 08, 2008
Comment by John Whish – September 08, 2008
Oh well, not the first time and probably won't be the last.
Comment by Steve Bryant – September 08, 2008
The other reason I might advise people to look at derived tables if this is new to them, is that these let you do things that would be impossible with a normal query, or that would require multiple separate queries to achieve. Derived tables are quite possibly the most powerful and yet most underused feature of SQL.
Comment by Brian Kotek – September 08, 2008
Comment by Ben Nadel – September 09, 2008