Monday, September 6, 2010

SQL Server performance - querying by date

In every application that includes a database we will probably need to query some table by some date column -

SELECT Id, MyDate
FROM #CompareDates
WHERE MyDate = GETDATE()


Although this is a very simple syntax, I will show you how it can be a performance hit.

We'll start by filling a table with some random dates(and one row of a specific date just to make sure it exists) and add a clustered index on this column.

The data we are looking for is all the rows with today's date.

DECLARE @myNow DATETIME
SET @myNow = '2010-09-06 22:47'

SELECT *
FROM #CompareDates
WHERE MyDate = @myNow


This query will work very fast, but, because the column 'MyDate' contains time and not just date - the result won't be as expected, only rows with the exact date & time will appear.

We can see in the execution plan that it uses 'clustered index seek' and results with one row which is the row we added manually with exact time.



We can try using DATEDIFF to eliminate the time in the column:
SELECT *
FROM #CompareDates
WHERE DATEDIFF(day,MyDate,@myNow)=0


This will get the exact result, but, using a function over the 'MyDate' column causes the sql server to scan the entire index!!




If the column does not contain time, we can eliminate the time in the '@myNow' parameter:
DECLARE @myNowNoTime DATETIME

SET @myNowNoTime = dateadd(dd,datediff(dd,0,GETDATE()),0)

SELECT *
FROM #CompareDates
WHERE MyDate = @myNowNoTime


This will also perform well as the 1st query, but again...what if 'myDate' column contains time??

The solution is to avoid doing functions (like DATEDIFF, DATEADD or any user defined function) over 'MyDate' column.

The solution is very simple - date range.
First I've created two functions that one returns the begining of the day (time will be '00:00:00') and the other the end of the day (time will be 23:59:59:997)
CREATE FUNCTION [dbo].[fnGetBeginOfDay]
(@myDate DATETIME)
RETURNS DATETIME
BEGIN
RETURN CAST(FLOOR(CAST(@myDate AS DECIMAL(12, 5))) AS DATETIME)
END
GO

CREATE FUNCTION [dbo].[fnGetEndOfDay]
(@myDate DATETIME )
RETURNS DATETIME
BEGIN
RETURN DATEADD(ms, -3, DATEADD(day, 1, (dbo.fnGetBeginOfDay(@myDate))))
END
GO


Than the query will look like this:
DECLARE @beginOfDay DATETIME, @endOfDay DATETIME
SET @beginOfDay = dbo.fnGetBeginOfDay(GETDATE())
SET @endOfDay = dbo.fnGetEndOfDay(GETDATE())

SELECT *
FROM #CompareDates with (index(idx_MyDate))
WHERE MyDate BETWEEN @beginOfDay AND @endOfDay


And the execution plan looks like this, back to the wonderfull 'index seek' and returns the expected amount of rows:



Additional facts:
1. Testing the same queries with NONCLUSTERED index shows that the date range solution is as bad as DATEDIFF - both result with table scan

2. Adding to (1) a table hint (with(index(idx_MyDate))) to both queries shows that while datediff uses 'index scan', date range uses 'index seek' (which is obviously better).

Conclusions:

1. When querying using a date column which is included in an index - use 'date range' solution.

2. If the index is nonclustered - check if a table hint makes the difference.


Till next time....
Diego

Full sample test source code:
--fill table with random dates
CREATE TABLE #CompareDates (Id int identity(1,1), MyDate DATETIME)

DECLARE @myNow DATETIME
SET @myNow = '2010-09-06 22:47'
DECLARE @i int, @myDate DATETIME
SET @i = 0

WHILE @i < 10000
BEGIN
--random date (10x to Ben Nadel for the useful & short function :-)
SET @myDate = CAST(CAST( @myNow AS INT ) -5 * RAND( CAST( CAST( NEWID() AS BINARY(8) ) AS INT ) )AS DATETIME)

INSERT INTO #CompareDates (MyDate)
SELECT @myDate
SET @i=@i+1

IF @i=500
INSERT INTO #CompareDates (MyDate) VALUES(@myNow)
END

CREATE CLUSTERED INDEX idx_MyDate ON #CompareDates(MyDate)

SELECT * FROM #CompareDates

-----TESTING:

--fast, looks for specific date including time, not quite we are looking for..
SELECT *
FROM #CompareDates
WHERE MyDate = @myNow

--exact results, scans entire index
SELECT *
FROM #CompareDates
WHERE DATEDIFF(day,MyDate,@myNow)=0

--if the column does not contain time, you can eliminate time in the parameter
DECLARE @myNowNoTime DATETIME
SET @myNowNoTime = dateadd(dd,datediff(dd,0,@myNow),0)

SELECT *
FROM #CompareDates
WHERE MyDate = @myNowNoTime

--if column contains time, use range
DECLARE @beginOfDay DATETIME, @endOfDay DATETIME
SET @beginOfDay = dbo.fnGetBeginOfDay(@myNow)
SET @endOfDay = dbo.fnGetEndOfDay(@myNow)

SELECT *
FROM #CompareDates with (index(idx_MyDate))
WHERE MyDate BETWEEN @beginOfDay AND @endOfDay

4 comments:

  1. One other solution is to create a computed column and an index, but that is usually an overhead...

    http://blog.sqlauthority.com/2010/08/22/sql-server-computed-columns-index-and-performance/

    Gil.

    ReplyDelete
  2. Detailed explanation and elegant solution - thank you.

    ReplyDelete
  3. Also refer this post for more examples and shortcuts
    http://beyondrelational.com/modules/2/blogs/70/posts/10899/understanding-datetime-column-part-iii.aspx

    ReplyDelete
    Replies
    1. Madhivanan, thank you for your contribution

      Delete