23,560 total views, 36 views today
Being a SharePoint developer, administrator, and architect, we quite often discuss the SharePoint list items loading issue and when it comes to SharePoint “list” items loading performance issue, we straight away go for indexing the columns to speed up the performance. So in this article, I will focus on what exactly happens in the SQL database when we index a column.
How does the SharePoint store the list items?
Before we head on to indexing, we must know the SharePoint list item data structure. This will help the SharePoint architect while designing the application. The SharePoint database is a heavy database designed by Microsoft, it has a complex data structure in the back end. So being a developer or administrator we should never directly talk to the SharePoint database in fact not recommended by Microsoft, for any changes we must go with the SharePoint UI or PowerShell or by any other coding.
There is a single table, i.e. “AllUserData” in all content databases which stores all SharePoint list items I mean this is a common table for all the lists in the site collection. There are lots of internal columns in the table than what the number columns we create in the SharePoint list – so what we see in the list it just a few columns. For example 64 nvarchar, 16 units, 12 floats, and so on.
If we look at the Microsoft documentation on the “AllUserData” table structure, we will be surprised to see how complex it is. Please click here to know more about “AllUserData” table structure.
As per me, this is one of the major reasons for being slow. Had Microsoft been designed this way like – for each list/library there should be a dedicated table, then the speed might have been better. Anyway, we need to move on to whatever we have.
What is indexing in the database table?
A database index is a data structure that improves the speed of data retrieval operations from a database table and it works based on key-value pair technique – the core binary search and linear search algorithms are built on the indexing technique. In other words, we can say – an index is a copy of selected columns of data from an original table that can be searched very efficiently, which establishes a direct link to the complete row of data from where it was copied from. Again, there are various types of indexing in the database like cluster and non-cluster, etc. I have not been going into that since the purpose of this article is different. In a short note – indexing is a technique using which we can get the fastest query result from the database table. Indexing costs us additional writes and storage space in the file system.
To make it more realistic, we can give an example of the indexing in the Book. Here, the main content inside the book is the main data table and indexed pages which we generally see at the beginning of each book where it mentions which content or chapter of the book is located on which page, this is just equivalent to hash table in the database. Using the indexed page number – we can very quickly navigate to a specific chapter or page. Exactly indexing in the database table also works in a similar way.
Usually without indexing how does the data get fetched from a SQL table?
Let’s write the below SQL query – this displays all purchase order details from the “Purchase Order” table where “OrderName” is ‘PO001_Laptop‘. If there are 10,000 items (rows) in the SQL table, this query will traverse thru the 10,000 rows one by one to check the below condition, even though the “PO001_Laptop” is not available in many of the rows but still, this query will scan thru all the rows.
Then, finally, it displays the result which is a time-consuming process and hits the query performance. In this technique we cannot skip even one row, the query has to scan thru right from the first row to last row. Imagine if we have millions of records in the database table how long would it take to complete this query?
SELECT * FROM PurchaseOrder
WHERE OrderName = 'PO001_Laptop'
For the same query – if we create an index on “OrderName” column – as per the above diagram this will create an entry in indexing table (hash table) and the pointer will map to the original data row, whenever any query comes to SQL, first this will check the indexing table (hash table) and directly will fetch the matching items from the original data source table and will skip the checking for rest all unwanted rows where items are not matched. This is how indexing improves the performance of SQL queries.
How does indexing column work in SharePoint?
The list items are stored in the “AllUserData” table in the SQL. For every defined indexed column the SharePoint stores the index value for every list item in a separate table, i.e. “NameValuePair ” table which we have seen in the above. Let’s say we have 20,0000 items in the list, which means that we have 20,000 rows in “AllUserData” and 20,000 additional rows in the “NameValuePair” table (used for indexing).
So the SharePoint list items view requests are served based on the key value in the “NameValuePair” table and pick the appropriate mapping items from the main “AllUserData” table which is the actual data source. This is how the SharePoint indexing column speeds up the query performance.
Column Indexing is recommended in the SQL table for the sake of better performance?
So far we have learned what is indexing and how indexed column improves the performance of the query. Now we need to see the other side of the coin as well – I mean is column indexing recommended?
There is no straight answer for this – it all depends on the needs and the way you operate your database table. As we have learned that the column indexing costs us additional writes and storage space, so if our application requires more insert/update operation, we might want to use tables without indexes, but if it requires more data retrieval operations, we should go for indexing table.
Why SharePoint is limited to have 20 indexed columns in a list?
There might be other reasons, but this is my analysis. As we have seen that the drawback of an indexed column is that – the indexed column indices require additional space on the disk as the indices are stored together in a table using the MyISAM engine, this file may rapidly reach to the threshold limits of the given file system means will exhaust all disk space when many columns from the same list are indexed. So we should try to minimize the indexed column number as much as possible.
What are the SharePoint columns types that are supported/unsupported for indexing?
SUPPORTED COLUMN TYPES
- Title (but not in a document library)
- Single line of text
- Choice field (but not multi choice)
- Date/ Time
- Lookup (but not multi value)
- Person or Group (but not multi value)
UNSUPPORTED COLUMN TYPES
- Multiple lines of text
- Custom Field Types
- Calculated Field
- Boolean (yes/no)
- UI version
- Checked out to
- Content type ID
Summary – what we had here?
Thus, in this article, we have learned that what exactly the column indexing is in the database and how it improves the performance of SQL queries or SharePoint list view queries.
- SharePoint: Show item modified in the past 14 days in list web part
- List view threshold error(5000 items issue) in SharePoint On-Premise/SharePoint Online
- How to create indexed column in SharePoint on-premise and online – O365?
- Office 365: Getting started with SharePoint PnP PowerShell – installation
- In 2 steps convert a classic SharePoint page to modern using PnP
- Office 365: Retrieve hub sites and associated sites using PnP Powershell
- Create a modern team site using PnP PowerShell in SharePoint
- In 4 steps access SharePoint online data using postman tool
- SharePoint admin center: Learn SharePoint online administration in an hour – step by step
- SharePoint REST API: GET vs POST vs PUT vs DELETE vs PATCH
- Office 365: Understanding the hub site in SharePoint online