Search This Blog

Friday, November 27, 2009

Sorts Spills in tempdb due to incorrect behavour by Query optimizer...

This one is really bothering me .I tried n tried n tried but failed to understand why is it happening ....

To give you a brief description , A query runs in 25-45 ms for x rows but the same query runs 15-20 times slower if i just add 1 row or a dozen more rows in some other caese (you will see it later)...If i see the execution plans , its totally identical .The only difference is that , the query which is slower does sorting in tempdb instead of in the user database where its been executing ...

Here we go :

create table SQLDBA (a1 int primary key clustered,a2 int,a3 char(2000))

begin tran
declare @i int
set @i=1
while @i <=200000
insert into SQLDBA values (@i,rand() * 2000000,replicate('a',2000))
set @i=@i+1
commit tran

update statistics SQLDBA with fullscan

So , we created a table with 3 columns and inserted 200000 rows to it .
Now I will run 2 queries :

set statistics time on

/* Good Query */
declare @a1 int,@a2 int , @a3 char(2000)
select @a1=a1 ,@a2=a2,@a3=a3 from SQLDBA
Where a1 <3000
order by a2

/* Bad Query */
declare @a1 int,@a2 int , @a3 char(2000)
select @a1=a1 ,@a2=a2,@a3=a3 from SQLDBA
Where a1 <3079
order by a2

Notice that there are only 79 more rows .But if you execute them on SQL Server 2005 SP3 as I did with 2 GB RAM , the 1st query runs in approax 15-22 secs .While the second query runs in 320-410 secs.Try it yourself .You will see some difference in duration as its hardware specific.Also , in my case it was running OK till 3078 rows . Yes, the execution time when a1 <3000 or a1 <3078 is almost similar .If I increase just 1 more row it degrades the performance .

NOTE : You might have to find out this threshold limit on your machine .In my case it was 3078 .The moment I use 3079 ..Booooommmmmmmmm

So why its happening :D ......

-Setting Statistics profile to ON showed that The Subtree cost is 2.5% more ..which is fine .

-Setting Statistics IO to ON showed :

Table 'SQLDBA'. Scan count 1, logical reads 1007, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 34 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.

Table 'SQLDBA'. Scan count 1, logical reads 1034, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 344 ms.

So , looks like this is Neither a CPU issue (query that runs faster is taking more CPU time) nor an IO issue as there are no physical reads.

One thing that I found when I took the XML output of the execution plan was that the good query was always taking compile time 2 and the bad query was taking 0 ....this was again strange to me.

Totaly stuck ..........

Then I searched many DMVs related to query execution and used sys.dm_exec_query_memory_grants DMV which show the row for each query that is waiting for the memory or those waited and acquired but will not show the querries that do not have to wait for the memory grant .

surprisingly , it was showing only one query in the output and that was the query taking more time .Strange but true .Why its waiting for memory when there is enough RAM on my server and max server memory is unlimited.

sys.dm_exec_query_stats also shows that bad query is doing around 100 physical reads while good one is doing none.

Then i queried sys.dm_exec_query_memory_grants:

select granted_memory_kb,used_memory_kb,max_used_memory_kb
from sys.dm_exec_query_memory_grants
where session_id in (51,52)

found that good query is using very less memory out of granted but bad query using full and more then that ........

Because whatever extra memory it needs for sorting is coming form tempdb :

select num_of_reads,num_of_bytes_read,num_of_writes,num_of_bytes_written
From sys.dm_io_virtual_file_stats(db_id('tempdb'),1)

So looks like Optimizer is not able to allocate the right amount of memory to the bad query such that it does not go to tempdb .......NOPE i am incorrect i think ..because if you see the number of rows in where condition the amount of memory given looks directly proportional .....S oallocating memory to the query does not seems to be a problem here ..the problem is why the query is using more memory that it needs to spill to tempdb .....

Let me give you one more example that will further open your eyes :D ...

set statistics time on
declare @a1 int,@a2 int , @a3 char(2000)
select top 100 @a1=a1 ,@a2=a2,@a3=a3 from SQLDBA
Where a1 <3000
order by a2


declare @a1 int,@a2 int , @a3 char(2000)
select top 101 @a1=a1 ,@a2=a2,@a3=a3 from SQLDBA
Where a1 <3000
order by a2

Now , these queries are identical in any respect except that in the second query I am doing select top 101 ...just 1 row extra ....and see the difference ...HELL ....
And the status is same here if you go through the DMVs etc ..Even if you wrap up in a SP or use different hints wont help ...

However you will see that In this case i.e. the second one 1 MB of memory is given to both the queries and nonoe of them is consuming the complete memory given ...but still the bad query is doing Sort spills to tempDB .....So its proved that granting of memory to the queries by the optimizer is Correct ...but its the way query sorts in tempdb and also the usage of memory by the query that looks incorrect here ...

So whats happening ........ :D ...Why just one extra row makes the query to spill to tempdb for sorting ....(use profiler to see if sorting in happening)..


On my machine i have 2GB RAM .....i capped max server memory to 100MB...
And then the query was running fine top 100 and top 101 and the previous query all are running well except that the bad query takes more time once in 7-8 attampts (sometimes less or sometimes more)....

Looks like something is related to memory regions or chunks ...not sure , I may be incorrect ....Something is really wrong .....

Filing a bug ..

Happy learning ...

No comments: