Leading Wildcards Aren’t A Good Default

SQL Server has LIKE, which lets us do various pattern matches, but most commonly we use the percent side as a wildcard, such that “where lastname  like ‘Smi%’” matches Smith and Smit and Smithy. It’s a useful tool. It can also be used to do a contains type search, something along the lines of “where firstname like ‘%nd%’” which would match Andy and Sandi and Tandy and whatever else.

If you dig into show SQL Server uses indexes searches that use a trailing wild card typically optimize well assuming the number of rows it expects to match are reasonable (statistics). Leading wild cards almost always drive SQL to do an index scan or a table scan, which are usually more expensive.

That’s not intuitive to someone that doesn’t live and breath SQL Server. They see it as the column is indexed, I should be able to search it quickly. Because they don’t have a mental model that implies slowness the most common pattern I see in application searches is for the developers to do the contains type search with leading and trailing wildcards. The result is slow performance, even in cases where much better performance is possible.

My preference is to use the trailing wildcard, and then either let the user type in the leading wild card if they need it, or give them a “contains” checkbox to check. That leaves them with flexibility, but gives on average much better performance.

It’s our job to catch this as they use it and educate them. Sometimes they listen, sometimes the queries just stay slow!