Sunday, November 06, 2011

What does Avoid non SARGable where clause mean?

SARGable is the short form of  "Search ARGument able". A condition in the SQL Query is said to be SARGable if the database engine can take advantage of an available Indexes and do an INDEX SEEK instead of Table Scan / Index scan to speed up the execution of that query.

One of the major mistakes developers do which makes a query non-SARGable is they use functions directly on a column in the WHERE Clause. The next common mistake i have seen is the issues created because of "implicit data type conversions". In this post I would explain those with few examples.

--Sample table
CREATE TABLE tblSARGTest
(
ProductID INT IDENTITY PRIMARY KEY,
ProductName VARCHAR(50) NOT NULL,
Manufacturing_Date DATETIME NOT NULL
)
GO

--Non clustered index on Manufacturing Date
CREATE NONCLUSTERED INDEX [nc_ix_manufacturing_dt] ON [dbo].[tblSARGTest] 
(
[Manufacturing_Date] ASC
)
GO

--Non clustered index on Product Name
CREATE NONCLUSTERED INDEX [nc_ix_productName] ON [dbo].[tblSARGTest] 
(
[ProductName] ASC
)
GO

-- Populate the table with some random 10K records. 
-- Refer my earlier post on Generating random test records
SET NOCOUNT ON
GO
INSERT INTO tblSARGTest 
SELECT 
dbo.udf_StringGenerator('A', 50),
GETDATE() + (RAND()* (39 * 365))
Go 10000

Now that we have a table with sample data and Indexes for demonstration purpose lets see some sample queries to understand the concept better.

Example 1: Find total records for the year 2011

--Query 1:: Non SARGable query 
SELECT COUNT(*) FROM tblSARGTest 
WHERE YEAR(Manufacturing_Date) = 2011

--Query 2:: SARGable query
SELECT COUNT(*) FROM tblSARGTest
WHERE Manufacturing_Date >= '01-01-2011' AND 
               Manufacturing_Date < '01-01-2012'

Both the queries would return the same number of records but "Query 2" would execute faster as it is SARGable. In "Query 1" we are using the function YEAR directly on the column which is a mistake so Query optimizer cannot make use of the appropriate indexes properly. Check the below screenshot to see the first query is making use of Index Scan while the second query is making use of Index Seek. Also we can see the difference it has on the "Query Cost".


Example 2: List all Products which starts with 'ab'

--Query 3: Non-SARGable
SELECT COUNT(*) FROM tblSARGTest WHERE LEFT(ProductName,2) = 'ab'
GO

--Query 4: SARGable
SELECT COUNT(*) FROM tblSARGTest WHERE ProductName LIKE 'ab%'
GO

In Query 3 we have again used a function on a column directly in the WHERE clause. So it would be slower and at the same time Query 4 would be faster.


Example 3: List all products which has the text 'ab' anywhere within it!

--Query 4: Non-SARGable
SELECT COUNT(*) FROM tblSARGTest WHERE ProductName LIKE '%ab%'
GO

Using wild card on both side of the search string is bad and it would result in a Table scan or an Index scan only. If we need to do these sort of searches then it is better to consider "Full text" searching option.

If we have wild card at the start of the search string, SQL Server has to evaluate each and every row irrespective of whether there is an Index or not!! The Query 4.1 shown below would also can't make use of INDEX SEEK.

--Query 4.1: Non-SARGable
SELECT COUNT(*) FROM tblSARGTest WHERE ProductName LIKE '%ab'
GO

So bottom-line is never prefix a search string with a wild card as it would make the query non-SARGable.

Example 4: Never use an expression which uses the same column name on both sides of the operator!

--Though there is a Clustered Index on ProductID it would not be used!
SELECT COUNT(*) FROM tblSARGTest WHERE ProductID = ProductID
GO


Example 5: Beware of Implicit data type conversion

Whenever implicit conversions occur while evaluating two values against each other, the value with the lowest data type in precedence is converted to the higher precedence. While doing so, SQL Server will try to choose the conversion that is least likely either to fail due to an overflow or to lose precision.

For example, if in a query we are equating an INT column with a BIGINT column then the INT column will be converted to an BIGINT!

Why does SQL Server do so? Why can't it do the vice-versa? Its simple, any INT value would be a subset of a BIGINT whereas if it tries to convert BIGINT into INT an overflow error would be thrown as INT cannot accommodate a BIGINT in it! So if we are careless we might see a performance hit on the query.

SET STATISTICS PROFILE ON
GO

--Query 6: Non-Sargable.
SELECT COUNT(*) FROM tblSARGTest WHERE ProductName LIKE N'ab%'
GO

--Query 7: Sargable
SELECT COUNT(*) FROM tblSARGTest WHERE ProductName LIKE 'ab%'
GO

In "Query 6" the unicode N'ab%' will force an implicit conversion of the ProductName column into NVARCHAR. The end result is we would get the costly Index Scan instead of an Index Seek.


So almost always use the right type for the columns in the queries.

1 comment:

Vidhi Shah said...

Hey there. That was a very good explaination. Thanks for it. I had a doubt as i am a new beeie..
In the Example 3 shown, you suggested not to use wildcards on both the side like '%ab%' as it will perform whole table scan for searching (correct me if i am wrong). So what is the alternate for such a query!?