This is a quick tip on query performance tuning. As always, don't forget to measure the performance against real data before applying any changes to a production environment.
Slow running queries, especially reports, often include a seemingly inoffensive range of dates in the where clause. Ultimately, the end-user wants to know "how many sales were completed last week" or "how many products were added in January". Take this simple example:
As long as the date range uses dates in the same order as the primary key and clustered index, we can substitute it for a range of ids. Generally, a column that stores the date and time of when the row was added is a good candidate for this technique. Let's try rewriting the query like this:
That's it! The query should now run considerably faster with only a slight increase in complexity.
Slow running queries, especially reports, often include a seemingly inoffensive range of dates in the where clause. Ultimately, the end-user wants to know "how many sales were completed last week" or "how many products were added in January". Take this simple example:
select * from T1 where created_on between @startDate and @endDate -- other criteriaThis query needs an index on the created_on column to perform well. However, if the where clause also references other columns, the optimizer might decide to ignore that index and use a clustered index scan instead, mainly to avoid bookmark lookups. The end result is that the query has to read most, if not all, of the table to find candidate rows. This issue only gets worse if the query joins other tables using the primary key.
As long as the date range uses dates in the same order as the primary key and clustered index, we can substitute it for a range of ids. Generally, a column that stores the date and time of when the row was added is a good candidate for this technique. Let's try rewriting the query like this:
declare @startId int select top 1 @startId = id from T1 where created_on >= @startDate order by created_on, id declare @endId int select top 1 @endId = id from T1 where created_on <= @endDate order by created_on desc, id desc select * from T1 where id between @startId and @endId -- other criteriaIt looks a lot more complicated but the basic idea is simple. First, we figure out the id of the first row in the date range. Then we do the same for the last row. These two queries will use the same index on created_on as the original query but they will run very fast since it's a simple index seek. Finally, the main query uses the same logic as before except with start and end ids instead of dates. Now the clustered index scan can skip most of the table and concentrate on the rows that are in the correct range.
That's it! The query should now run considerably faster with only a slight increase in complexity.
No comments:
Post a Comment