Okay, so the title may be a little lame :-) , but I think it suits the basic analogy to describe indexing quite well. Today I was asked on the MSDN forums, how indexing really works, in SQL Server. I began type a lot of techno babble and then thought to myself, will the OP even understand what I am talking about? The answer is probably not. I deleted all my text and came up with a simple analogy that is easy for everyone to visualize. Essentially, indexing in SQL Server works like a phonebook. That’s right ladies and gentleman, a phonebook.
How is indexing like a phonebook? Imagine that you have a phonebook in front of you and you have a handful of bookmarks. If you want to find all the phone numbers where the person’s name begins with A, you will place a bookmark at the first page of that letter and on each subsequent page of that letter. If you want a more granular bookmark, you can create a composite key, that contains information about names, within the letter A. These other keys will provide density details that allow the optimizer to make better cardinality estimates, or in your case allow you to find the pages you are looking for in the phonebook more quickly. For example, you have a composite index on (A,Adam’ Haines). The first letter will allow you to quickly identify all letters that start with A and the second column will allow you to more quickly identify names that are equal to ‘Adam Haines’ . You cannot directly retrieve the phone number for ‘Adam Haines’ , without using the letter bookmark because ‘Adam Haines’ is contained within the letter A. Essentially, we have use one of our letter A bookmarks and then jump to ‘Adam Haines’. Think about it this way. If I asked you to find all the phone numbers where the person’s name started with the letter A, you would place bookmarks on each page where the name starts with the letter A. If I asked you to find all the phone numbers where the person name is ‘Adam Haines’, your bookmarks would be useless because you are looking for a specific name and not a letter. Without knowing the letter, we have to flip through or scan the pages to find the value. If you want to be able to search for a specific name, you have to place a bookmark on the page where the name exists. Let’s see this in action.
We will start by creating our table structure:
IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'PhoneBook')
DROP TABLE dbo.PhoneBook;
CREATE TABLE dbo.PhoneBook(
PRIMARY KEY (Letter,Name,PhoneNumber)
INSERT INTO dbo.PhoneBook VALUES ('A','Adam''s DB Consulting','555-555-5555');
INSERT INTO dbo.PhoneBook VALUES ('A','Alex BI Consulting','555-555-1234');
Now we can run a few simple queries. The first query will filter the data using a filter on the letter A. This query will yield an index seek because Letter is the first column of our index.
WHERE [Letter] = 'A' --Index seek
In the second query will use the same letter, but we will also include a specific name we want to find. The query plan is the same as the prior. On larger datasets, this query should be less costly because we are using a more granular predicate, so the optimizer should be able to find the row more quickly.
WHERE [Letter] = 'A' AND Name = 'Adam''s DB Consulting' --Index seek
In the third query we will try searching for just the specific name.
WHERE Name = 'Adam''s DB Consulting' --Clustered Index scan
Wait what the heck happened? The optimizer scanned the index even though the Name column is included in the index key!!!!! The answer is quite simple. As I stated earlier, the only column that is “seekable” is the first column in the index key. The other columns are used for density vectors, which help with cardinality estimates. In no way do the other columns help satisfy a predicate, where the first column is not present. Think back to where we put that bookmark. We put the bookmark on the letter A and the names within A, but we did not include the letter, in our search. How do you know where that name is specifically, without knowing the letter? The reality is we cant, so we have to scan. This is why we have to create an index on the name column.
--Create an index on Name to get an index seek
CREATE UNIQUE NONCLUSTERED INDEX ncl_idx_PhoneBook_PhoneNumber ON dbo.[PhoneBook](Name);
Now lets run the same query.
WHERE Name = 'Adam''s DB Consulting' --Nonclustered Index seek
I hope this analogy has made things a little clearer. Now this by no means all there is to know about indexing, but it is a good fundamental look at how they work. Let me know if you guys have anything to add, or anything I may have missed.