SQL Server Transact-SQL General Tips
|By : Brad McGehee|
Don’t include code, variable, or parameters that don’t do anything. This may sound obvious, but I have seen this in some off-the-shelf SQL Server-based applications. For example, you may see code like this:
SELECT column_name FROM table_name
WHERE 1 = 0
When this query is run, no rows will be returned. Obviously, this is a simple example (and most of the cases where I have seen this done have been very long queries). A query like this (even if part of a larger query) doesn’t perform anything useful, and doesn’t need to be run. It is just wasting SQL Server resources. In addition, I have seen more than one case where such dead code actually causes SQL Server to throw errors, preventing the code from even running. [6.5, 7.0, 2000, 2000, 2005] Updated 1-24-2006
Don’t be afraid to make liberal use of in-line and block comments in your Transact-SQL code, they will not affect the performance of your application, and they will enhance your productivity when you have to come back to the code and try to modify it. [6.5, 7.0, 2000, 2005] Updated 1-24-2006
If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce the performance and scalability of your applications. If you need to perform row-by-row operations, try to find another method to perform the task. Some options are to perform the task at the client, use tempdb tables, use derived tables, use a correlated subquery, or use the CASE statement. More often than not, there are non-cursor techniques that can be used to perform the same tasks as a SQL Server cursor. [6.5, 7.0, 2000, 2005] Updated 1-24-2006
If your users perform many ad hoc queries on your SQL Server data, and you find that many of these “poorly-written” queries take up an excessive amount of SQL Server resources, consider using the “query governor cost limit” configuration option to limit how long a query can run.
This option allows you to specify the maximum amount of “seconds” a query will run, and whenever the query optimizer determines that a particular query will exceed the maximum limit, the query will be aborted before it even begins.
Although the value you set for this setting is stated as “seconds,” it does not mean seconds like we think of seconds. Instead, it relates to the actual estimated cost of the query as calculated by the query optimizer. You may have to experiment with this value until you find one that meets your needs.
There are two ways to set this option. First, you can change it at the server level (all queries running on the server are affected by it) using sp_configure “query governor cost limit,” or you can set it at the connection level (only this connection is affected) by using the SET QUERY_GOVERNOR_COST_LIMIT command. [7.0, 2000, 2005] Updated 1-24-2006
You may have heard of a SET command called SET ROWCOUNT. Like the TOP operator, it is designed to limit how many rows are returned from a SELECT statement. In effect, the SET ROWCOUNT and the TOP operator perform the same function.
While in most cases, using either option works equally efficiently, there are some instances (such as rows returned from an unsorted heap) where the TOP operator is more efficient than using SET ROWCOUNT. Because of this, using the TOP operator is preferable to using SET ROWCOUNT to limit the number of rows returned by a query. [6.5, 7.0, 2000, 2005] Updated 1-24-2006
If you have the choice of using a join or a subquery to perform the same task within a query, generally the join is faster. But this is not always the case, and you may want to test the query using both methods to determine which is faster for your particular application. [6.5, 7.0, 2000, 2005] Updated 1-24-2006
If you need to create a primary key (using a value meaningless to the record, other than providing a unique value for a record), many developers will use either an identity column (with an integer data type) or an uniqueindentifier data type.
If your application can use either option, then you will most likely want to choose the identity column over the uniqueindentifier column.
The reason for this is that the identity column (using the integer data type) only takes up 4 bytes, while the uniqueindentifier column takes 16 bytes. Using an identifier column will create a smaller and faster index. [7.0, 2000, 2005] Updated 1-24-2006
If your application requires you to create temporary tables for use on a global or per connection use, consider the possibility of creating indexes for these temporary tables. While most temporary tables probably won’t need, or even use an index, some larger temporary tables can benefit from them. A properly designed index on a temporary table can be as great a benefit as a properly designed index on a standard database table.
In order to determine if indexes will help the performance of your applications using temporary tables, you will probably have to perform some testing. [6.5, 7.0, 2000, 2005] Updated 1-24-2006
Suppose you have data in your table that represents the logical information of “Yes” and “No” and you want to give the results of a query to someone who isn’t working all day with computers. For such people, they may not know that a 1 is the logical representation of TRUE while a 0 represents FALSE. Sure, you can do this at the presentational layer. But what if someone comes to your desk, begging for immediate help? Here’s a little trick to make BITs (or any other 0 and 1 data) look a bit more intuitive:
CREATE TABLE MyBits ( id INT IDENTITY(1,1) PRIMARY KEY , bool BIT ) GO INSERT INTO MyBits SELECT 0 UNION ALL SELECT 1 GO SELECT id , bool , SUBSTRING('YesNo', 4 - 3 * bool, 3) as YesNo FROM MyBits GO DROP TABLE MyBits id bool YesNo ----------- ---- ----- 1 0 No 2 1 Yes (2 row(s) affected)
How does this work? The trick happens inside the SUBSTRING function. Precisely, when calculating the start value for the SUBSTRING. If our column “bool” contains a 0, the calculation looks like SUBSTRING(‘YesNo’, 4 – 3 * 0, 3), which resolves to SUBSTRING(‘YesNo’, 4, 3) and therefore, correctly returns ‘No’. We actually use here another feature of SUBSTRING. If the string is shorter than our requested length, SUBSTRING simply returns the shorter string without filling up the missing spaces. Finally, in case a 1 is in our “bool” column, the calculation goes like SUBSTRING(‘YesNo’, 4 – 3 * 1, 4), which is SUBSTRING(‘YesNo, 1, 3) and that is ‘Yes’. [7.0, 2000, 2005] Added 5-9-2005
Instead of using temporary tables, consider using a derived table instead. A derived table is the result of using a SELECT statement in the FROM clause of an existing SELECT statement. By using derived tables instead of temporary tables, you can reduce I/O and often boost your application’s performance. [7.0, 2000, 2005] Updated 6-12-2006
Sometimes, it is handy to be able to perform some calculation on one or more columns of a record, and then take the result of that calculation and then add it to similar calculations performed on other related records to find a grand total.
For example, let’s say you want to find the total dollar cost of an invoice. An invoice will generally involve a header record and one or more detail records. Each detail record will represent a line item on the invoice. In order to calculate the total dollar cost of an invoice, based on two or more line items, you would need to multiply the quantity of each item sold times the price of each item. Then, you would need to add the total price of each line item together in order to get the total dollar cost of the entire invoice. To keep this example simple, let’s ignore things like discounts, taxes, shipping, etc.
One way to accomplish this task would be to use a cursor; like we see below (we are using the Northwind database for this example code):
DECLARE @LineTotal money –Declare variables
DECLARE @InvoiceTotal money
SET @LineTotal = 0 –Set variables to 0
SET @InvoiceTotal = 0
DECLARE Line_Item_Cursor CURSOR FOR –Declare the cursor
SELECT UnitPrice*Quantity –Multiply unit price times quantity ordered
FROM [order details]
WHERE orderid = 10248 –We are only concerned with invoice 10248
OPEN Line_Item_Cursor –Open the cursor
FETCH NEXT FROM Line_Item_Cursor INTO @LineTotal –Fetch next record
WHILE @@FETCH_STATUS = 0
SET @InvoiceTotal = @InvoiceTotal + @LineTotal –Summarize line items
FETCH NEXT FROM Line_Item_Cursor INTO @LineTotal
CLOSE Line_Item_Cursor –Close cursor
DEALLOCATE Line_Item_Cursor –Deallocate cursor
SELECT @InvoiceTotal InvoiceTotal –Display total value of invoice
The result for invoice number 10248 is $440.00.
What the cursor does is to select all of the line items for invoice number 10248, then multiply the quantity ordered times the price to get a line item total, and then it takes each of the line item totals for each record and then adds them all up in order to calculate the total dollar amount for the invoice.
This all works well, but the code is long and hard to read, and performance is not great because a cursor is used. Ideally, for best performance, we need to find another way to accomplish the same goal as above, but without using a cursor.
Instead of using a cursor, let’s rewrite the above code using set-based Transact-SQL instead of a cursor. Here’s what the code looks like:
DECLARE @InvoiceTotal money
SELECT @InvoiceTotal = sum(UnitPrice*Quantity)
FROM [order details]
WHERE orderid = 10248
SELECT @InvoiceTotal InvoiceTotal
The result for invoice number 10248 is $440.00.
Right away, it is obvious that this is a lot less code and that is it more readable. What may not be obvious is that it uses less server resources and performs faster. In our example–with few rows–the time difference is very small, but if many rows are involved, the time difference between the techniques can be substantial.
The secret here is to use the Transact-SQL “sum” function to summarize the line item totals for you, instead of relying on a cursor. You can use this same technique to help reduce your dependency on using resource-hogging cursors in much of your Transact-SQL code. [6.5, 7.0, 2000, 2005] Updated 6-12-2006
While views are often convenient to use, especially for restricting users from seeing data they should not see, they aren’t always good for performance. So if database performance is your goal, avoid using views (SQL Server 2000/2005 Indexed Views are another story).
When the Query Optimizer gets a request to run a view, it runs it just as if you had run the view’s SELECT statement from the Query Analyzer or Management Studio. If fact, a view runs slightly slower than the same SELECT statement run from the Query Analyzer or Management Studio–but you probably would not notice the small difference–as it is small in simple examples like this.
Another issue with views is that they are often combined (nested) with other code, such as being embedded within another view, a stored procedure, or other T-SQL script. Doing so often makes it more difficult to identify potential performance issues.
Views don’t allow you to add more restrictive WHERE clauses as needed. In other words, they can’t accept input parameters, which are often needed to restrict the amount of records returned. I have seen lazy developers write generic views that return hundreds of thousands of unnecessary rows, and then user other code, such as client code, to filter only those few records that are needed. This is a great waste of SQL Server’s resources.
Instead of embedding SELECT statements in a view, put them in a stored procedure for optimum performance. Not only do you get an added performance boost (in many cases), you can also use the stored procedure to restrict user access to table columns, just as you can with views. [6.5, 7.0, 2000, 2005] Updated 6-12-2006
When using the WHILE statement, don’t avoid the use of BREAK just because some people consider it bad programming form. Often when creating Transact-SQL code using the WHILE statement, you can avoid using BREAK by moving a few lines of code around. If this works in your case, then by all means don’t use BREAK. But if your efforts to avoid using BREAK require you to add additional lines of code that makes your code run slower, then don’t do that. Sometimes, using BREAK can speed up the execution of your WHILE statements. [6.5, 7.0, 2000, 2005] Updated 6-12-2006
One of the advantages of using SQL Server for n-tier applications is that you can offload much (if not most) of the data processing work from the other tiers and place it on SQL Server. The more work you can perform within SQL Server, the fewer the network roundtrips that need to be made between the various tiers and SQL Server. And generally the fewer the network roundtrips, the more scalable and faster the application becomes.
But in some applications, such as those than involve complex math, SQL Server has traditionally been weak. In these cases, complex math often could not be performed within SQL Server, instead it had to be performed on another tier, causing more network roundtrips than desired.
By using user-defined functions (UDFs), this is becoming less of a problem. UDFs allow developers to perform many complex math functions from within SQL Server, functions that previously could only be performed outside of SQL Server. By taking advantage of UDFs, more work can stay with SQL Server instead of being shuttled to another tier, reducing network roundtrips, and potentially boosting your application’s performance.
Obviously, boosting your application’s performance is not as simple as moving math functions to SQL Server, but it is one feature of SQL Server 2000/2005 that developers can take advantage of in order to boost their application’s scalability and performance. [2000, 2005] Updated 6-12-2006
SQL Server 2000/2005 offers a data type called “table.” Its main purpose is for the temporary storage of a set of rows. A variable, of type “table,” behaves as if it is a local variable. And like local variables, it has a limited scope, which is within the batch, function, or stored procedure in which it was declared. In most cases, a table variable can be used like a normal table. SELECTs, INSERTs, UPDATEs, and DELETEs can all be made against a table variable.
For better performance, if you need a temporary table in your Transact-SQL code, consider using a table variable instead of creating a conventional temporary table. Table variables are often faster, but not always. In addition, table variables found in stored procedures result in fewer compilations (than when using temporary tables), and transactions using table variables only last as long as the duration of an update on the table variable, requiring less locking and logging resources. [2000, 2005] Updated 10-02-2006
Don’t repeatedly reuse the same function to calculate the same result over and over within your Transact-SQL code. For example, if you need to reuse the value of the length of a string over and over within your code, perform the LEN function once on the string, and this assign the result to a variable, and then use this variable, over and over, as needed in your code. Don’t recalculate the same value over and over again by reusing the LEN function each time you need the value, as it wastes SQL Server resources and hurts performance. [6.5, 7.0, 2000, 2005] Updated 10-02-2006
Many developers choose to use an identify column at their primary key. By design, an identity column does not guarantee that that each newly created row will be consecutively numbered. This means there will most likely be occasional gaps in the identity column numbering scheme. For most applications, occasional gaps in the identity column present no problems.
On the other hand, some developers don’t like these occasional gaps, trying to avoid them. With some clever use of INSTEAD OF triggers in SQL Server 2000,2005, it is possible prevent these numbering gaps. But at what cost?
The problem with trying to force an identify column to number consecutively without gaps can lead to locking and scalability problems, hurting performance. So the recommendation is not to try to get around the identify column’s built-in method of working. If you do, expect performance problems. [2000, 2005] Updated 10-02-2006
If you use the BULK INSERT to import data into SQL Server, seriously consider using the TABLOCK hint along with it. This will prevent SQL Server from running out of locks during vary large imports, and also boost performance due to the reduction of lock contention. [7.0, 2000, 2005] Added 11-22-2004
To help identify long running queries, use the SQL Server Profiler Create Trace Wizard to run the “TSQL By Duration” trace. You can specify the length of the long running queries you are trying to identify (such as over 1000 milliseconds), and then have these recorded in a log for you to investigate later. [7.0]