Friday, August 31, 2012

Introduction to JOINs – (Basic of JOINs, SQL SERVER)

Download SQL Script used throughout in this article to practice along. Now, take a quick look at the following two tables I have created.

INNER JOIN

This join returns rows when there is at least one match in both the tables.

OUTER JOIN

There are three different Outer Join methods.
LEFT OUTER JOIN
This join returns all the rows from the left table in conjunction with the matching rows from the right table. If there are no columns matching in the right table, it returns NULL values.
RIGHT OUTER JOIN
This join returns all the rows from the right table in conjunction with the matching rows from the left table. If there are no columns matching in the left table, it returns NULL values.
FULL OUTER JOIN
This join combines left outer join and right outer join. It returns row from either table when the conditions are met and returns null value when there is no match.

CROSS JOIN

This join is a Cartesian join that does not necessitate any condition to join. The resultset contains records that are multiplication of record number from both the tables.

Additional Notes related to JOIN:

The following are three classic examples to display where Outer Join is useful. You will notice several instances where developers write query as given below.
SELECT t1.*FROM Table1 t1WHERE t1.ID NOT IN (SELECT t2.ID FROM Table2 t2)GO
The query demonstrated above can be easily replaced by Outer Join. Indeed, replacing it by Outer Join is the best practice. The query that gives same result as above is displayed here using Outer Join and WHERE clause in join.
/* LEFT JOIN - WHERE NULL */SELECT t1.*,t2.*FROM Table1 t1LEFT JOIN Table2 t2 ON t1.ID t2.IDWHERE t2.ID IS NULL
The above example can also be created using Right Outer Join.
NOT INNER JOIN
Remember, the term Not Inner Join does not exist in database terminology. However, when full Outer Join is used along with WHERE condition, as explained in the above two examples, it will give you exclusive result to Inner Join. This join will give all the results that were not present in Inner Join.
You can download the complete SQL Script here, but for the sake of complicity I am including the same script here.
USE AdventureWorks
GO
CREATE TABLE table1(ID INTValue VARCHAR(10))INSERT INTO Table1 (IDValue)SELECT 1,'First'UNION ALLSELECT 2,'Second'UNION ALLSELECT 3,'Third'UNION ALLSELECT 4,'Fourth'UNION ALLSELECT 5,'Fifth'GOCREATE TABLE table2(ID INTValue VARCHAR(10))INSERT INTO Table2 (IDValue)SELECT 1,'First'UNION ALLSELECT 2,'Second'UNION ALLSELECT 3,'Third'UNION ALLSELECT 6,'Sixth'UNION ALLSELECT 7,'Seventh'UNION ALLSELECT 8,'Eighth'GOSELECT *FROM Table1SELECT *FROM Table2
GO
USE AdventureWorks
GO
/* INNER JOIN */SELECT t1.*,t2.*FROM Table1 t1INNER JOIN Table2 t2 ON t1.ID t2.ID
GO
/* LEFT JOIN */SELECT t1.*,t2.*FROM Table1 t1LEFT JOIN Table2 t2 ON t1.ID t2.ID
GO
/* RIGHT JOIN */SELECT t1.*,t2.*FROM Table1 t1RIGHT JOIN Table2 t2 ON t1.ID t2.ID
GO
/* OUTER JOIN */SELECT t1.*,t2.*FROM Table1 t1FULL OUTER JOIN Table2 t2 ON t1.ID t2.ID
GO
/* LEFT JOIN - WHERE NULL */SELECT t1.*,t2.*FROM Table1 t1LEFT JOIN Table2 t2 ON t1.ID t2.IDWHERE t2.ID IS NULLGO/* RIGHT JOIN - WHERE NULL */SELECT t1.*,t2.*FROM Table1 t1RIGHT JOIN Table2 t2 ON t1.ID t2.IDWHERE t1.ID IS NULLGO/* OUTER JOIN - WHERE NULL */SELECT t1.*,t2.*FROM Table1 t1FULL OUTER JOIN Table2 t2 ON t1.ID t2.IDWHERE t1.ID IS NULL OR t2.ID IS NULLGO/* CROSS JOIN */SELECT t1.*,t2.*FROM Table1 t1CROSS JOIN Table2 t2
GO
DROP TABLE table1DROP TABLE table2
GO
I hope this article fulfills its purpose. I would like to have feedback from my blog readers. Please suggest me where do you all want me to take this article next.

Thursday, August 30, 2012

Tricks to speed up your SQL query


We all knows that the SQL provides the basic functionality, in terms of what can be searched for or filtered by. 

But some more advanced functions may be obtained, if the user knows a few tricks. Here are given few very basic tricks which can help to speed up the executions of your SQL scripts

Introduction

We all knows that the SQL provides very basic functionality, in terms of what can be searched for or filtered by.
But some more advanced functions may be obtained, if the user knows a few tricks. Here are given few very basic tricks which can help to speed up the executions of your SQL scripts.

Objective

There are many online sites and books available to teach you the SQL scripts and optimization techniques. Few of them exclusively focus on tricks and few focus on finding solutions. My goal here is to provide few basic SQL tricks that can help to boost the query performance and streamline some sql writting guidelines.

Using the code

Given below are little known tips that you can use to ensure your Transact-SQL queries are performing in the most efficient manner possible.





1.  Avoid '*' in select query. 
   Restrict the queries result set by returning only the particular columns from the table and not all the table's columns. The      sql query becomes faster if you use the actual columns names in SELECT statement instead of than '*'.

2. Avoid  COUNT(*) in select statement to check the existence of records in table.
   Instead use IF EXISTS() to check records.

   - Write the query as:
   IF EXISTS (SELECT * FROM table_name WHERE column_name = ‘xxx’)

   - Instead of :
   SELECT COUNT(*) FROM table_name WHERE column_name = ‘xxx’
 
3.  Use alternate of SELECT COUNT(*).                                                                                                                                                     Use an alternative way instead of the SELECT COUNT(*) statement to count the number of records in table.
   - SELECT COUNT(*) statement makes a full table scan to return the total table's row count which can take an extremely long time for large tables.
     Use alternate query to achieve the same
     SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2

4. Use TOP keyword or the SET ROWCOUNT to fetch first Nth row from table.
   - TOP or SET ROWCOUNT clause restricts the number of result and returns the  smaller result set. This helps to reduce the data trafic between server and client. 

5. Use ORDER BY clause with Primary or Indexed column of table
   - Fetching the result set order by primary column or Indexed column added big performance benefits as SQL server don't have to perform the extra overheads to rearrange data. Also try to implement the ORDER BY clause on Integer column instead of VARCHAR or STRING column.

6. Avoid ORDER BY on multiple columns.
   - Implementing the order by clause on multiple column degrade the query performance as the SQL server has to run data sorting algorithm independently on each column or result set.

7. Use 'WHERE' instead of 'HAVING'
  - 'HAVING' clause is used to filter the rows after all the rows are selected. It is just like a filter who filter data from selected list.
  - 'WHERE' clause work along with select statement to select only respective rows Do not use HAVING clause for any other purposes. 

8. Avoid Mathematical expression on column.                                                                                                                                         Avoid mathematical expression on column in WHERE clause. We should avoid computation on columns as far as possible and hence we will get an index scan instead of a seek
   - For example : SELECT * FROM Orders WHERE OrderID*3 = 33000 
   degrade the performace as query performing calculation on column 

9. Minimize the number of subquery block in your query.
   The more number of sub query makes the execution plan complicated. Keep the script logic as simple as possible. 

10. Use 'LIKE' clause in query instead of SUBSTR() function.
     The SUBSTR() function first extract the value from data and then matching result get added in result set. LIKE clause has potential great advantage over SUBSTR() whenever string data need to compare                                                                                        - Write the query as:
   SELECT id, first_name, age FROM student_details WHERE first_name LIKE 'ABC%'

   - Instead of :
   SELECT id, first_name, age FROM student_details WHERE SUBSTR(first_name,1,3) = 'ABC';

11. Use 'BETWEEN' operator instead of >= and <= operators to select data in range.

12. Wisely use the EXISTS,IN clauses in sub query select statement.
   - IN has the slowest performance as data is filtered between the range.
   - IN is efficient when most of the filter criteria is in the sub-query.
   - EXISTS is efficient when most of the filter criteria is in the main query.

13. Avoid 'NOT IN' in select clause.                                                                                                                                                            Because when we use “NOT IN” in SQL queries, the query optimizer uses 'Nested table scan' technique to perform the activity

14. Use Stored Procedure, functions(UDF) and views instead of heavy-duty queries.
   - The application must first convert the binary value into a character string (which doubles its size, thus increasing network traffic and taking more time) before it can be sent to the server. And when the server receives the character string, it then has to convert it back to the binary format (taking even more time). This can reduce network traffic as your client will send to the server only stored procedures, function or view name (perhaps with some parameters) instead of large heavy-duty queries text. The use of SP, UDF or view also improves DB security as this can be used to facilitate permission management. You can restrict user access to table columns they should not see.

15. Use 'SET NOCOUNT ON' statement into your stored procedures or UDF.
    Bydefault the SQL server returns the number of affected rows for each query in SP or UDF functionality. If the SP or UDF involves lots of insert,update or delete statement (DML scripts) then this messeging mechanism will consume lots of I/O resource as for every query execution it send back the number or rows affected.
    'SET NOCOUNT ON' statement at the begining of SP or UDF body skips the messeging and never return how many rows altered through DML scripts.


16. Use constraints instead of triggers. 
    Constraints are logical conditions setup on table which allow/restrict the operation on table. Where as Triggers are special block of program (like Stored procedure) which get execute on certain action on table like insert,update, delete. Triggers and constraints both are used to maintain the integrity of database but constraints have more benefits over triggers as it uses very less resources.

17. Use table variable over temp table object.                                                                                                                                          Use table variables objects instead of temporary tables object. Table variable object declare with '@' symbol where as temp table object is declare with '#' sign as the suffix of table name.
    #tempTable -> Temp. table object
    @tempTable -> in-memory table variable
    A table variable is created in memory, and so performs slightly better than #temp tables. Also because there is even less locking and logging in a table variable.
    Table variables are automatically cleared when the procedure or function goes out of scope, so you don't have to remember to drop or clear the data. The SQL server mentain very less log related table variable and log activity is truncated immediately.
    while #temp table log activity persists until the log hits a checkpoint, is manually truncated, or when the server restarts

18. use UNION ALL statement instead of UNION, whenever possible
    The UNION ALL statement is much faster than UNION, because UNION ALL statement does not look for duplicate rows, while the UNION statement does look for duplicate rows, whether they exist or not 

19. Use DISTINCT only whenever it is neccessery to use.
    DISTINCT clause filter the result for unique combination of all columns in result rows which results in some performance degradation. You should use this clause only when it is absolutely necessary and should be used with minimum columns on table.   
20. Use Indexes on table
Database indexes are similar to those you can find in libraries. They allow the database to find the requested information faster, just like a library index will allow a reader to find what they’re looking for without loosing time.
An Index can be created on a single column or a combination of columns in a database table. A table index is a database structure that arranges the values of one or more columns in a database table in specific order.





Conclusion

There are many more performance improvement techniques in SQL script writing. But the given above are very generic techniques which can be adopted and implemented in our everyday scripts and can make a habit of writing optimized sql scripts