SQL Server Transact-SQL General Tips

from SQL-Server-Performance.com

SQL Server Transact-SQL General Tips

By : Brad McGehee  

Don’t include code, variable, or parameters that don’t do anything. This may sound obvious, but I have seen this in some off-the-shelf SQL Server-based applications. For example, you may see code like this:

SELECT column_name FROM table_name
WHERE 1 = 0

When this query is run, no rows will be returned. Obviously, this is a simple example (and most of the cases where I have seen this done have been very long queries). A query like this (even if part of a larger query) doesn’t perform anything useful, and doesn’t need to be run. It is just wasting SQL Server resources. In addition, I have seen more than one case where such dead code actually causes SQL Server to throw errors, preventing the code from even running. [6.5, 7.0, 2000, 2000, 2005] Updated 1-24-2006

*****

Don’t be afraid to make liberal use of in-line and block comments in your Transact-SQL code, they will not affect the performance of your application, and they will enhance your productivity when you have to come back to the code and try to modify it. [6.5, 7.0, 2000, 2005] Updated 1-24-2006

*****

If possible, avoid using SQL Server cursors. They generally use a lot of SQL Server resources and reduce the performance and scalability of your applications. If you need to perform row-by-row operations, try to find another method to perform the task. Some options are to perform the task at the client, use tempdb tables, use derived tables, use a correlated subquery, or use the CASE statement. More often than not, there are non-cursor techniques that can be used to perform the same tasks as a SQL Server cursor. [6.5, 7.0, 2000, 2005] Updated 1-24-2006

*****

If your users perform many ad hoc queries on your SQL Server data, and you find that many of these “poorly-written” queries take up an excessive amount of SQL Server resources, consider using the “query governor cost limit” configuration option to limit how long a query can run.

This option allows you to specify the maximum amount of “seconds” a query will run, and whenever the query optimizer determines that a particular query will exceed the maximum limit, the query will be aborted before it even begins.

Although the value you set for this setting is stated as “seconds,” it does not mean seconds like we think of seconds. Instead, it relates to the actual estimated cost of the query as calculated by the query optimizer. You may have to experiment with this value until you find one that meets your needs.

There are two ways to set this option. First, you can change it at the server level (all queries running on the server are affected by it) using sp_configure “query governor cost limit,” or you can set it at the connection level (only this connection is affected) by using the SET QUERY_GOVERNOR_COST_LIMIT command. [7.0, 2000, 2005] Updated 1-24-2006

*****

You may have heard of a SET command called SET ROWCOUNT. Like the TOP operator, it is designed to limit how many rows are returned from a SELECT statement. In effect, the SET ROWCOUNT and the TOP operator perform the same function.

While in most cases, using either option works equally efficiently, there are some instances (such as rows returned from an unsorted heap) where the TOP operator is more efficient than using SET ROWCOUNT. Because of this, using the TOP operator is preferable to using SET ROWCOUNT to limit the number of rows returned by a query. [6.5, 7.0, 2000, 2005] Updated 1-24-2006

*****

If you have the choice of using a join or a subquery to perform the same task within a query, generally the join is faster. But this is not always the case, and you may want to test the query using both methods to determine which is faster for your particular application. [6.5, 7.0, 2000, 2005] Updated 1-24-2006

*****

If you need to create a primary key (using a value meaningless to the record, other than providing a unique value for a record), many developers will use either an identity column (with an integer data type) or an uniqueindentifier data type.

If your application can use either option, then you will most likely want to choose the identity column over the uniqueindentifier column.

The reason for this is that the identity column (using the integer data type) only takes up 4 bytes, while the uniqueindentifier column takes 16 bytes. Using an identifier column will create a smaller and faster index. [7.0, 2000, 2005] Updated 1-24-2006

*****

If your application requires you to create temporary tables for use on a global or per connection use, consider the possibility of creating indexes for these temporary tables. While most temporary tables probably won’t need, or even use an index, some larger temporary tables can benefit from them. A properly designed index on a temporary table can be as great a benefit as a properly designed index on a standard database table.

In order to determine if indexes will help the performance of your applications using temporary tables, you will probably have to perform some testing. [6.5, 7.0, 2000, 2005] Updated 1-24-2006

*****

Suppose you have data in your table that represents the logical information of “Yes” and “No” and you want to give the results of a query to someone who isn’t working all day with computers. For such people, they may not know that a 1 is the logical representation of TRUE while a 0 represents FALSE. Sure, you can do this at the presentational layer. But what if someone comes to your desk, begging for immediate help? Here’s a little trick to make BITs (or any other 0 and 1 data) look a bit more intuitive:

CREATE TABLE MyBits  ( id INT IDENTITY(1,1) PRIMARY KEY  , bool BIT )  GO    INSERT INTO MyBits SELECT 0   UNION ALL  SELECT 1 GO SELECT  id  , bool  , SUBSTRING('YesNo', 4 - 3 * bool, 3) as YesNo FROM MyBits  GO  DROP TABLE MyBits    id          bool YesNo  ----------- ---- -----  1           0    No 2           1    Yes   (2 row(s) affected)

How does this work? The trick happens inside the SUBSTRING function. Precisely, when calculating the start value for the SUBSTRING. If our column “bool” contains a 0, the calculation looks like SUBSTRING(‘YesNo’, 4 – 3 * 0, 3), which resolves to SUBSTRING(‘YesNo’, 4, 3) and therefore, correctly returns ‘No’. We actually use here another feature of SUBSTRING. If the string is shorter than our requested length, SUBSTRING simply returns the shorter string without filling up the missing spaces. Finally, in case a 1 is in our “bool” column, the calculation goes like SUBSTRING(‘YesNo’, 4 – 3 * 1, 4), which is SUBSTRING(‘YesNo, 1, 3) and that is ‘Yes’. [7.0, 2000, 2005] Added 5-9-2005


Instead of using temporary tables, consider using a derived table instead. A derived table is the result of using a SELECT statement in the FROM clause of an existing SELECT statement. By using derived tables instead of temporary tables, you can reduce I/O and often boost your application’s performance. [7.0, 2000, 2005] Updated 6-12-2006

*****

Sometimes, it is handy to be able to perform some calculation on one or more columns of a record, and then take the result of that calculation and then add it to similar calculations performed on other related records to find a grand total.

For example, let’s say you want to find the total dollar cost of an invoice. An invoice will generally involve a header record and one or more detail records. Each detail record will represent a line item on the invoice. In order to calculate the total dollar cost of an invoice, based on two or more line items, you would need to multiply the quantity of each item sold times the price of each item. Then, you would need to add the total price of each line item together in order to get the total dollar cost of the entire invoice. To keep this example simple, let’s ignore things like discounts, taxes, shipping, etc.

One way to accomplish this task would be to use a cursor; like we see below (we are using the Northwind database for this example code):

DECLARE @LineTotal money –Declare variables
DECLARE @InvoiceTotal money
SET @LineTotal = 0 –Set variables to 0
SET @InvoiceTotal = 0

DECLARE Line_Item_Cursor CURSOR FOR –Declare the cursor

SELECT UnitPrice*Quantity –Multiply unit price times quantity ordered
FROM [order details]
WHERE orderid = 10248 –We are only concerned with invoice 10248

OPEN Line_Item_Cursor –Open the cursor
FETCH NEXT FROM Line_Item_Cursor INTO @LineTotal –Fetch next record
WHILE @@FETCH_STATUS = 0

BEGIN
SET @InvoiceTotal = @InvoiceTotal + @LineTotal –Summarize line items
FETCH NEXT FROM Line_Item_Cursor INTO @LineTotal
END

CLOSE Line_Item_Cursor –Close cursor
DEALLOCATE Line_Item_Cursor –Deallocate cursor
SELECT @InvoiceTotal InvoiceTotal –Display total value of invoice

The result for invoice number 10248 is $440.00.

What the cursor does is to select all of the line items for invoice number 10248, then multiply the quantity ordered times the price to get a line item total, and then it takes each of the line item totals for each record and then adds them all up in order to calculate the total dollar amount for the invoice.

This all works well, but the code is long and hard to read, and performance is not great because a cursor is used. Ideally, for best performance, we need to find another way to accomplish the same goal as above, but without using a cursor.

Instead of using a cursor, let’s rewrite the above code using set-based Transact-SQL instead of a cursor. Here’s what the code looks like:

DECLARE @InvoiceTotal money
SELECT @InvoiceTotal = sum(UnitPrice*Quantity)
FROM [order details]
WHERE orderid = 10248
SELECT @InvoiceTotal InvoiceTotal

The result for invoice number 10248 is $440.00.

Right away, it is obvious that this is a lot less code and that is it more readable. What may not be obvious is that it uses less server resources and performs faster. In our example–with few rows–the time difference is very small, but if many rows are involved, the time difference between the techniques can be substantial.

The secret here is to use the Transact-SQL “sum” function to summarize the line item totals for you, instead of relying on a cursor. You can use this same technique to help reduce your dependency on using resource-hogging cursors in much of your Transact-SQL code. [6.5, 7.0, 2000, 2005] Updated 6-12-2006

*****

While views are often convenient to use, especially for restricting users from seeing data they should not see, they aren’t always good for performance. So if database performance is your goal, avoid using views (SQL Server 2000/2005 Indexed Views are another story).

When the Query Optimizer gets a request to run a view, it runs it just as if you had run the view’s SELECT statement from the Query Analyzer or Management Studio. If fact, a view runs slightly slower than the same SELECT statement run from the Query Analyzer or Management Studio–but you probably would not notice the small difference–as it is small in simple examples like this.

Another issue with views is that they are often combined (nested) with other code, such as being embedded within another view, a stored procedure, or other T-SQL script. Doing so often makes it more difficult to identify potential performance issues.

Views don’t allow you to add more restrictive WHERE clauses as needed. In other words, they can’t accept input parameters, which are often needed to restrict the amount of records returned. I have seen lazy developers write generic views that return hundreds of thousands of unnecessary rows, and then user other code, such as client code, to filter only those few records that are needed. This is a great waste of SQL Server’s resources.

Instead of embedding SELECT statements in a view, put them in a stored procedure for optimum performance. Not only do you get an added performance boost (in many cases), you can also use the stored procedure to restrict user access to table columns, just as you can with views. [6.5, 7.0, 2000, 2005] Updated 6-12-2006


When using the WHILE statement, don’t avoid the use of BREAK just because some people consider it bad programming form. Often when creating Transact-SQL code using the WHILE statement, you can avoid using BREAK by moving a few lines of code around. If this works in your case, then by all means don’t use BREAK. But if your efforts to avoid using BREAK require you to add additional lines of code that makes your code run slower, then don’t do that. Sometimes, using BREAK can speed up the execution of your WHILE statements. [6.5, 7.0, 2000, 2005] Updated 6-12-2006

*****

One of the advantages of using SQL Server for n-tier applications is that you can offload much (if not most) of the data processing work from the other tiers and place it on SQL Server. The more work you can perform within SQL Server, the fewer the network roundtrips that need to be made between the various tiers and SQL Server. And generally the fewer the network roundtrips, the more scalable and faster the application becomes.

But in some applications, such as those than involve complex math, SQL Server has traditionally been weak. In these cases, complex math often could not be performed within SQL Server, instead it had to be performed on another tier, causing more network roundtrips than desired.

By using user-defined functions (UDFs), this is becoming less of a problem. UDFs allow developers to perform many complex math functions from within SQL Server, functions that previously could only be performed outside of SQL Server. By taking advantage of UDFs, more work can stay with SQL Server instead of being shuttled to another tier, reducing network roundtrips, and potentially boosting your application’s performance.

Obviously, boosting your application’s performance is not as simple as moving math functions to SQL Server, but it is one feature of SQL Server 2000/2005 that developers can take advantage of in order to boost their application’s scalability and performance. [2000, 2005] Updated 6-12-2006

*****

SQL Server 2000/2005 offers a data type called “table.” Its main purpose is for the temporary storage of a set of rows. A variable, of type “table,” behaves as if it is a local variable. And like local variables, it has a limited scope, which is within the batch, function, or stored procedure in which it was declared. In most cases, a table variable can be used like a normal table. SELECTs, INSERTs, UPDATEs, and DELETEs can all be made against a table variable.

For better performance, if you need a temporary table in your Transact-SQL code, consider using a table variable instead of creating a conventional temporary table. Table variables are often faster, but not always. In addition, table variables found in stored procedures result in fewer compilations (than when using temporary tables), and transactions using table variables only last as long as the duration of an update on the table variable, requiring less locking and logging resources. [2000, 2005] Updated 10-02-2006

*****

Don’t repeatedly reuse the same function to calculate the same result over and over within your Transact-SQL code. For example, if you need to reuse the value of the length of a string over and over within your code, perform the LEN function once on the string, and this assign the result to a variable, and then use this variable, over and over, as needed in your code. Don’t recalculate the same value over and over again by reusing the LEN function each time you need the value, as it wastes SQL Server resources and hurts performance. [6.5, 7.0, 2000, 2005] Updated 10-02-2006

*****

Many developers choose to use an identify column at their primary key. By design, an identity column does not guarantee that that each newly created row will be consecutively numbered. This means there will most likely be occasional gaps in the identity column numbering scheme. For most applications, occasional gaps in the identity column present no problems.

On the other hand, some developers don’t like these occasional gaps, trying to avoid them. With some clever use of INSTEAD OF triggers in SQL Server 2000,2005, it is possible prevent these numbering gaps. But at what cost?

The problem with trying to force an identify column to number consecutively without gaps can lead to locking and scalability problems, hurting performance. So the recommendation is not to try to get around the identify column’s built-in method of working. If you do, expect performance problems. [2000, 2005] Updated 10-02-2006

*****

If you use the BULK INSERT to import data into SQL Server, seriously consider using the TABLOCK hint along with it. This will prevent SQL Server from running out of locks during vary large imports, and also boost performance due to the reduction of lock contention. [7.0, 2000, 2005] Added 11-22-2004

*****

To help identify long running queries, use the SQL Server Profiler Create Trace Wizard to run the “TSQL By Duration” trace. You can specify the length of the long running queries you are trying to identify (such as over 1000 milliseconds), and then have these recorded in a log for you to investigate later. [7.0]

About these ads

2 Comments

  1. Posted April 24, 2013 at 12:58 pm | Permalink

    Aw, this was an extremely good post. Spending some time and actual effort to generate a good article… but what can I say… I
    put things off a whole lot and don’t seem to get anything done.


Post a Comment

Required fields are marked *
*
*

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: