3 best ways Read and Update SharePoint Online Records over a million

Streamlining Your SharePoint Online Data Management: How to read and update SharePoint Online records quickly 3 ways

No comments

Loading

In this read and update SharePoint Online records article, we will learn the best way to read and update larger lists containing millions of records. Generally, we get the list view threshold error if we try to read list items over 5000. With this error, many people might think that we cannot store and manipulate more than 5000 items in a list or document library, but in reality, the SharePoint Online document library can store 30 million records per document library, but for optimum performance, Microsoft recommends syncing no more than 300,000 files across all document libraries.

Now let’s get into the actual agenda of this article.

Read and update SharePoint Online records (over a million)

Below are the recommended approaches that we have experienced.

Method 1: Using Power Automate get and update list items

To read more than 5000 items using the Power Automate Do Until loop, refer to our previous article: Best way to get more than 5000 items in Power Automate from SharePoint Online list.

To update more than 5000 items using the Power Automate Do Until loop, refer to our previous article: Best way to update SharePoint list with more than 5000 items using Power Automate.

Notes:

  • To get more than 5000 items, we need to use pagination in the Power Automate Get Items query. The pagination threshold limit is 100000(0.1 million).
  • Though the pagination upper limit is 0.1 million (100000), the query will not give us the 0.1 million records, it depends on the metadata and data size, Sometimes, it will bring only the 30000 to 40000 items. So, if you have the larger document library, we need to run it manually by configuring the new threshold limit value each time, which increases the overhead of running the flow.
  • In this approach, for each batch execution, you need to capture the last item ID from the previous Get Items query and pass it on to the next batch of Get Items queries until it gets completed.
  • This approach is slow, but it will work if you have time to manage flow manually until completion.
  • Generally, this will be the scheduler or manually triggered flow that is needed for some administrative maintenance or one-time execution.

The Get Items BadRequest Error

If your document library is large and contains huge amounts of data, we will get the below error when we try to pass the upper limit of the pagination threshold (0.1 million) in the Get Items query:

“BadRequest. The action ‘Get_items’ has an aggregated page results size of more than ‘233568356’ bytes. This exceeded the maximum size ‘209715200’ bytes allowed.”

Method 2: Using the PowerShell Script get and update more than 5000 items

To read more than 5000 items using the PowerShell script, refer to our previous article: SharePoint Online: Best way to get more than 5000 list items using PowerShell script.

To update more than 5000 items using the PowerShell script, refer to our previous article: Best 2 ways to Bulk update SharePoint list items using PowerShell in SharePoint Online.

Method 3 (Recommended): Get SharePoint list items using PnP PowerShell

Using the below PnP PowerShell, we can read over a million records from the SharePoint Online document library:


####The below script is used to read over millions of items or documents from SharePoint Online and update metadata dynamically.#############

cls

$PSshell = Get-PSSnapin "Microsoft.SharePoint.PowerShell" -ErrorVariable err -ErrorAction SilentlyContinue
if($PSshell -eq $null)
{
Add-PSSnapin "Microsoft.SharePoint.PowerShell"
}

$fileName = "GetandUpateOverMillionsItemsInSPOLog"

#'yyyyMMddhhmm yyyyMMdd
$enddate = (Get-Date).tostring("yyyyMMddhhmmss")
#$filename = $enddate + '_VMReport.doc'
$logFileName = $fileName +"_"+ $enddate+"_Log.txt"
$invocation = (Get-Variable MyInvocation).Value
$directoryPath = Split-Path $invocation.MyCommand.Path

$directoryPathForLog=$directoryPath+"\"+"LogFiles"
if(!(Test-Path -path $directoryPathForLog))
{
New-Item -ItemType directory -Path $directoryPathForLog
#Write-Host "Please Provide Proper Log Path" -ForegroundColor Red
}
#$logPath = $directoryPath + "\" + $logFileName

$logPath = $directoryPathForLog + "\" + $logFileName

$isLogFileCreated = $False

function Write-Log([string]$logMsg)
{
if(!$isLogFileCreated){
Write-Host "Creating Log File..."
if(!(Test-Path -path $directoryPath))
{
Write-Host "Please Provide Proper Log Path" -ForegroundColor Red
}
else
{
$script:isLogFileCreated = $True
Write-Host "Log File ($logFileName) Created..."
[string]$logMessage = [System.String]::Format("[$(Get-Date)] - {0}", $logMsg)
Add-Content -Path $logPath -Value $logMessage
}
}
else
{
[string]$logMessage = [System.String]::Format("[$(Get-Date)] - {0}", $logMsg)
Add-Content -Path $logPath -Value $logMessage
}
}

#Configuration parameters
$SiteURL = "https://globalsharepoint2020.sharepoint.com/sites/GSDRnD" #Here should be your site URL.
$ListOrDocumentLibraryName= "DocumentLibraryWithMoreThan5KItems" #Here it should be your document library or list name.

#Here you need to pass the the "InternalName" of the selected columns you want to retrieve.
$SelectedFieldsForTheQuery = @("ID","Title","YourColumn3","YourColumn4","YourColumn5") #Here you can pass all your desired columns internal names.
$CSVFilePath = "C:\Temp\YourQueriedListItems.csv" #This is the path where the exported CSV will be stored.
$ItemsDataCollectionFromDocLibraryOrListArray= @()

Try
{

#Connect to PnP Online
Connect-PnPOnline -Url $SiteURL -Credentials (Get-Credential)
$Counter = 0

#PageSize:The number of items to retrieve per page request, this number should be less than or equal to 5000, above 5000 this will throw list view threshold error.
$ListItemsCollection = Get-PnPListItem -List $ListOrDocumentLibraryName -Fields $SelectedFieldsForTheQuery -PageSize 2000

#Get all items from the configured list or document library.
$ListItemsCollection | ForEach-Object {
$OneListItem = Get-PnPProperty -ClientObject $_ -Property FieldValuesAsText
$OneListRow = New-Object PSObject
$Counter++
ForEach($OneField in $SelectedFieldsForTheQuery)
{
$OneListRow | Add-Member -MemberType NoteProperty $OneField $OneListItem[$OneField]
}
Write-Progress -PercentComplete ($Counter / $($ListItemsCollection.Count) * 100) -Activity "Exporting List Items..." -Status "Exporting Item $Counter of $($ListItemsCollection.Count)"
$ItemsDataCollectionFromDocLibraryOrListArray += $OneListRow
}
#Export the result Array to CSV file
$ItemsDataCollectionFromDocLibraryOrListArray | Export-CSV $CSVFilePath -NoTypeInformation

}
Catch
{

$ErrorMessage = $_.Exception.Message +"in while reading list items!:"
Write-Host $ErrorMessage -BackgroundColor Red
Write-Log $ErrorMessage
}



Method 4 (Recommended): Get SharePoint list items using PnP PowerShell dynamically

Using the below PnP PowerShell script we can get SharePoint list items dynamically:


#Dynamically get SharePoint List Item based on the condition

$ConfiguredContentTypeID="Your Content Type ID from the document library"

$ListItemsCollection | ForEach-Object {
$OneListItem = Get-PnPProperty -ClientObject $_ -Property FieldValuesAsText
$OneListRow = New-Object PSObject
$Counter++

$DynamicContentTypeIDFromDocumentLibrary=$OneListItem.FieldValues.ContentTypeId
$DynamicFileName=$OneListItem.FieldValues.FileRef
$DynamicFileNameDirPath=$OneListItem.FieldValues.FileDirRef
$DynamicDocumentStatusValue=$OneListItem.FieldValues.DocumentStatus
if ($DynamicDocumentStatusValue -eq "" -or $DynamicDocumentStatusValue -eq $null) -and ($DynamicContentTypeIDFromDocumentLibrary -eq $ConfiguredContentTypeID)
{

ForEach($OneField in $SelectedFieldsForTheQuery)
{
$OneListRow | Add-Member -MemberType NoteProperty $OneField $OneListItem[$OneField]
}
Write-Progress -PercentComplete ($Counter / $($ListItemsCollection.Count) * 100) -Activity "Exporting List Items..." -Status "Exporting Item $Counter of $($ListItemsCollection.Count)"
$ItemsDataCollectionFromDocLibraryOrListArray += $OneListRow

}

#Dynamically get SharePoint List Item based on the condition - ends here.

Method 5 (Recommended): Update SharePoint list items using PnP PowerShell dynamically

By injecting the below PnP PowerShell script into the above script inside the loop, we can update the SharePoint list items.


#Dynamically bulk update SharePoint list items based on the condition

$ConfiguredContentTypeID="Your Content Type ID from the document library"

$ListItemsCollection | ForEach-Object {
$OneListItem = Get-PnPProperty -ClientObject $_ -Property FieldValuesAsText
$OneListRow = New-Object PSObject
$Counter++

$DynamicContentTypeIDFromDocumentLibrary=$OneListItem.FieldValues.ContentTypeId
$DynamicFileName=$OneListItem.FieldValues.FileRef
$DynamicFileNameDirPath=$OneListItem.FieldValues.FileDirRef
$DynamicDocumentStatusValue=$OneListItem.FieldValues.DocumentStatus
if($DynamicDocumentStatusValue -eq "" -or $DynamicDocumentStatusValue -eq $null) -and ($DynamicContentTypeIDFromDocumentLibrary -eq $ConfiguredContentTypeID)
{

Set-PnPListItem -List $ListOrDocumentLibraryName -Identity $OneListItem.ID -Values @{"DocumentStatus" = "Approved"} -SystemUpdate

}

#Dynamically bulk update SharePoint list items based on the condition - ends here

 

Notes:

  • In the above script, we are updating the document status column value with “approved” if it is empty.
  • Here we have used the Set-PnPListItem command to update the column value and the “SystemUpdate” parameter to not update the modified date or modified by column.
  • In the above script, we are first getting all items where the document status column value is empty for certain content types and updating the document status column value.
  • We will get the ContentType ID, not the name, from Get-PnPProperty, so compare the actual content type ID from the site with the content type ID for each item level. The following example shows how to obtain the content type ID from the SharePoint Online site.

How to get Content Type ID from SharePoint list

To get the Content Type ID from a list or document library in SharePoint Online, we need to follow the below steps:

Step 1: Go to your document library or list settings page.

Document library settings page in SharePoint Online
Document library settings page in SharePoint Online

Step 2: Click on the content type for which you want to get the ID.

At the end of the URL bar, we could see “ctype=GUID”—this is the content type ID.

Get Content Type ID from SharePoint list
Get Content Type ID from SharePoint list

Now, you can pass your content type ID to the above scripts when you want to read document details from a given content type ID using the PnP PowerShell.

Conclusion: Which method should we opt for (Read and update millions of records in SharePoint Online)?

Deciding on the method mentioned above would be based on your requirements and needs. In general, if you are dealing with a larger list or library, we can follow the below guidelines in the decision-making process:

  • The Power Automate will not be suitable for the larger list; the Power Automate is designed for light-weight event-driven-based data. If you have time and patience, we recommend using the Power Automate scheduler flow or the manually trigger flow for the relatively smaller list or library (less than 0.1 million). We may encounter the timed-out error when running Power Automate for an extended period of time. We may also experience resource throttling.
  • If your list is relatively large, use PowerShell instead of Power Automate; it is faster. We may encounter the “connection timed out” error when running a long-running script.
  • If you have over a million records in your document library, we must go with the PnP PowerShell method. This method is the most efficient and faster than the other two methods.

Summary: Read and update SharePoint Online records (over a million)

Thus, in this article, we have learned the following with respect to getting and updating millions of records in SharePoint Online:

  • How to read and update SharePoint list items using the Power Automate flow
  • How to read and update SharePoint list items using the PowerShell script
  • How to read and update SharePoint list items using the PnP PowerShell script

See Also: SharePoint Online PowerShell Get List Items

You may also like the following SharePoint PowerShell tutorials:

Buy the premium version of SharePoint Online & Office 365 administration eBook from here:


Buy SharePoint Online & Office 365 Administration eBook

 

About Post Author

Do you have a better solution or question on this topic? Please leave a comment