Tuesday, November 28, 2006

boolean queries technique for monthly summary report

The other night at edmug Richard Campbell presented some good SQL Querying Tips & Techniques. A lot of them were about sql 2005 which was nice to see (espeically using Common Table Expressions which are quite handy in the case of recursion).

The coolest thing I found was in regards to a cross tab query. The example used was to get sales amount by month for each employee. As you can see in the following example this is done with a lot of subqueries that builds a hideous execution plan.

SELECT Salespeople.Salesperson, SUM(S1.Quantity*S1.Price) AS Total,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=1 AND S2.Sales_ID = S1.Sales_ID) AS Jan,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=2 AND S2.Sales_ID = S1.Sales_ID) AS Feb,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=3 AND S2.Sales_ID = S1.Sales_ID) AS Mar,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=4 AND S2.Sales_ID = S1.Sales_ID) AS Apr,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=5 AND S2.Sales_ID = S1.Sales_ID) AS May,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=6 AND S2.Sales_ID = S1.Sales_ID) AS Jun,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=7 AND S2.Sales_ID = S1.Sales_ID) AS Jul,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=8 AND S2.Sales_ID = S1.Sales_ID) AS Aug,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=9 AND S2.Sales_ID = S1.Sales_ID) AS Sep,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=10 AND S2.Sales_ID = S1.Sales_ID) AS Oct,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=11 AND S2.Sales_ID = S1.Sales_ID) AS Nov,
(SELECT SUM(S2.Quantity*S2.Price) FROM Sales AS S2 WHERE
DatePart(mm,S2.Invoice_Date)=12 AND S2.Sales_ID = S1.Sales_ID) AS Dec
FROM Sales AS S1 INNER JOIN Salespeople ON S1.Sales_ID = Salespeople.Sales_ID
GROUP BY Salespeople.Salesperson, S1.Sales_ID;

A Russiam mathemetician by the name of Rozenshtein has a great query (I beleive it was called a boolean query)

SELECT Salespeople.Salesperson, SUM(Sales.Quantity*Sales.Price) AS Total,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-1)))) AS Jan,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-2)))) AS Feb,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-3)))) AS Mar,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-4)))) AS Apr,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-5)))) AS May,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-6)))) AS Jun,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-7)))) AS Jul,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-8)))) AS Aug,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-9)))) AS Sep,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-10)))) AS Oct,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-11)))) AS Nov,
SUM(Sales.Quantity*Sales.Price*(1-ABS(SIGN(DatePart(mm,Sales.Invoice_Date)-12)))) AS Dec
FROM Sales INNER JOIN Salespeople ON Sales.Sales_ID = Salespeople.Sales_ID
GROUP BY Salespeople.Salesperson;


Breaking this down we take the numeric month out of the Invoice date with the DatePart(mm, Sales.Invoice_date) section. Next the value is run through the SIGN method. SIGN converts a number to a +1 if greater than 0, 0 if the value is 0, or -1 if the number is less than 0. Then convert the number to a positive using ABS (absolute value). Then subtract the value from 1 so now we have a boolean 0 or -1 for the current row. Then multiply the data (in this case the queantiy * price) by the value we have been calculated. So if the row is in the current month it will be multiplied by -1, otherwise by 0 (and hence equal 0 so it will not be included in the sum).

Here is a table showing the value at each step


Subtraction SIGN ABS Subtraction
Jan 2 1 1 0
Feb 1 1 1 0
Mar 0 0 0 1
Apr -1 -1 1 0
May -2 -1 1 0
Jun -3 -1 1 0

If you look at the execution plan for this method you will see it is drastically more efficient which really helps with large amounts of data. This technique does not have to be used for dates. A good example Richard Campbell showed was using CharIndex to find a name (as it will return -1 if not found, 0 if an exact match, or greater than 0 if contained within the word). Basically anything you can turn into a numeric to say you have matched something vs. not matched then this technique should be applicable.

No comments: