How does the database column indexing work in SharePoint?

No comments

Introduction

Being a SharePoint developer, administrator and architect, we quite often discuss about SharePoint list items loading issue and when it comes to list items loading performance issue, we straight away go for indexing the columns to speed up the performance. So in this article I will focus on what exactly happens in the SQL database when we index a column.

How does the SharePoint store the list items?

Before we head on to indexing, we must know the SharePoint list item data structure. This will help to SharePoint architect while designing the application. The SharePoint database is a heavily database by the designed by Microsoft, it has a complex data structure in the back end. So being a developer or administrator we should never directly talk to the SharePoint database in fact not recommended by Microsoft, for any changes we must go with the SharePoint UI or PowerShell or by any other coding.

 

There is a single table, i.e. “AllUserData” in all content database which stores all SharePoint list items I mean this is a common table for all the lists in the site collection. There are lots of internal columns in the table than what the number columns we create in SharePoint list – so what we see in the list it just a few columns. For example: 64 nvarchar, 16 units, 12 floats and so on.

If we look at the Microsoft documentation on the “AllUserData” table structure, we will be surprised to see how complex it is. Please click here to know more about “AllUserData” table structure.

AllUserTable

As per me this is one of the major reasons of being slowness. Had Microsoft been designed this way like – for each list/library there should be a dedicated table, then the speed might have been better. Anyway, we need to move on to whatever we have.

 

 

What is indexing in database table?

A database index is a data structure that improves the speed of data retrieval operations from a database table and it works based on key value pair technique – the core binary search and linear search algorithms are built on the indexing technique. In other words, we can say – an index is a copy of selected columns of data from an original table that can be searched very efficiently, which establish a direct link to the complete row of data from where it was copied from. Again, there are various types of indexing in database like cluster and non cluster, etc. I have not been going into that since the purpose of this article is different. In a short note – indexing is a technique using which we can get the fastest query result from database table. Indexing costs us additional writes and storage space in the file system.

To make it more realistic, we can give the example of the indexing in the Book. Here, the main content inside the book is the main data table and indexed pages which we generally see in the beginning of the each book where it mentions which content or chapter of the book is located in which page, this is just equivalent to hash table in the database. Using the indexed page number – we can very quickly navigate to a specific chapter or page. Exactly indexing in database table also works in a similar way.

Indexing Structure3

 

 

Usually without indexing how does the data get fetched from a SQL table?

Let’s write the below SQL query – this displays all purchase order details from “Purchase Order” table where “OrderName” is ‘PO001_Laptop‘. If there are 10,000 items (rows) in the SQL table, this query will traverse thru the 10,000 rows one by one to check the below condition, even though the “PO001_Laptop” is not available in many of the rows but still this query will scan thru all the rows.

Then, finally it displays the result which is a time consuming process and hits the query performance. In this technique we cannot skip even one row, the query has to scan thru right from the first row to last row. Imagine if we have millions of records in the database table how long would it take to complete this query?


SELECT * FROM PurchaseOrder
WHERE OrderName = 'PO001_Laptop'

For the same query – if we create index on “OrderName” column – as per the above diagram this will create an entry in indexing table (hash table) and pointer will map to the original data row, whenever any query comes to SQL, first this will check the indexing table (hash table) and directly will fetch the matching items from the original data source table and will skip the checking for rest all unwanted rows where items are not matched. This is how indexing improves the performance of SQL queries.

How does indexing column work in SharePoint?

The list items are stored in the “AllUserData” table in the SQL. For every defined indexed column the SharePoint stores the index value for every list item in a separate table, i.e. “NameValuePair ” table which we have seen in the above.  Let’s say we have 20,0000 items in the list, which means that we have 20,000 rows in “AllUserData” and 20,000 additional rows in the “NameValuePair” table (used for indexing).

So the SharePoint list items view requests are served based on the key value in the “NameValuePair” table and pick the appropriate mapping items from the main “AllUserData” table which is the actual data source. This is how the SharePoint indexing column speeds up the query performance.

2.1

Column Indexing is recommended in SQL table for the sake of better performance?

So far we have learned what is indexing and how indexed column improve the performance of the query. Now we need to see the the other side of the coin as well – I mean is column indexing recommended?

There is no straight answer for this – it all depends on the needs and the way you operate your database table. As we have learned that the column indexing costs us additional writes and storage space, so if our application requires more insert/update operation, we might want to use tables without indexes, but if it requires more data retrieval operations, we should go for indexing table.

 

 

Why SharePoint is limited to have 20 indexed column in a list?

There might be other reasons, but this is my analysis. As we have seen that the drawback of indexed column is that – the indexed column indices require additional space on the disk as the indices are stored together in a table using the MyISAM engine, this file may rapidly reach to the threshold limits of the given file system means will exhaust all disk space when many columns from the same list are indexed. So we should try to minimize the indexed column number as much as possible.

What are the SharePoint columns types are supported/unsupported for indexing?

SUPPORTED COLUMN TYPES

  • Title (but not in a document library)
  • Single line of text
  • Choice field (but not multi choice)
  • Number
  • Currency
  • Date/ Time
  • Lookup (but not multi value)
  • Person or Group (but not multi value)

UNSUPPORTED COLUMN TYPES

  • Multiple lines of text
  • Hyperlink/Picture
  • Custom Field Types
  • Calculated Field
  • Boolean (yes/no)
  • UI version
  • Checked out to
  • Content type ID

Summary – what we had here?

Thus, in this article we have learned that what exactly the column indexing is in the database and how it improves the performance of SQL queries or SharePoint list view queries.