Read this document prior to applying the November 2017 CU to SharePoint 2013
One of the improvements in the November 14, 2017, update for SharePoint Foundation 2013 (KB4011256) is managing the [dbo].[TimerJobHistory] table in the [SharePoint_Config] database, which can experience significant growth if it is not maintained on a regular schedule. In the latest update (15.0.4981.1002), the “timer job will store latest 10,000 job records instead of 7 days job history in the TimerJobHistory table so that the records will not take too many disk space.”
In previous versions, the timer job, [Delete Job History], was scheduled in the SharePoint Central Administration tool to purge records older than 7 days by default on a daily basis, with the option to increase or decrease the timer job execution frequency. The timer job, [Delete Job History], invokes the [dbo].[proc_DeleteOldTimerJobHistory] stored procedure. If this timer job is not enabled or does not complete successfully, the [dbo].[TimerJobHistory] table can grow to hundreds of millions of records and hundreds of Gigabytes, which can impact performance and availability.
In the current version, a new stored procedure, [dbo].[proc_DeleteOldTimerJobHistoryByRows], is created to limit the record count of the [dbo].[TimerJobHistory] table to 10,000 records. Both stored procedures delete records from three tables: [dbo].[TimerJobHistory], [dbo].[TimerScheduledJobs] and [dbo].[TimerRunningJobs]. Of the three, the largest table is [dbo].[TimerJobHistory]. The [dbo].[proc_DeleteOldTimerJobHistory] stored procedure deletes records from the largest table based on records that are older than 7 days from when it is executed; whereas, the [dbo].[proc_DeleteOldTimerJobHistoryByRows] stored procedure deletes records from the largest table based on the record count that is greater than 10,000.
You can clearly see these changes if you take a look at your SharePoint_Config database. If you look at the stored procedures prior to Nov 2016 CU you will not find the [dbo].[proc_DeleteOldTimerJobHistoryByRows].
After you install the Nov 2017 CU and run PSConfig you will see this new procedure added to database.
You can also see this in the log files while running PSConfig.
12/07/2017 00:33:08.82 psconfigui (0x0E9C) 0x3B80 SharePoint Foundation Upgrade SPConfigurationDatabaseSequence ajyzd DEBUG Retrieving the definition for [proc_DeleteOldTimerJobHistoryByRows] from the database. 00000000-0000-0000-0000-000000000000
12/07/2017 00:33:08.82 psconfigui (0x0E9C) 0x3B80 SharePoint Foundation Upgrade SPConfigurationDatabaseSequence ajyzf DEBUG The definition for [proc_DeleteOldTimerJobHistoryByRows] does not exist in the database. 00000000-0000-0000-0000-000000000000
12/07/2017 00:33:08.82 psconfigui (0x0E9C) 0x3B80 SharePoint Foundation Upgrade SPConfigurationDatabaseSequence ajyzk DEBUG Creating stored procedure or function [proc_DeleteOldTimerJobHistoryByRows]. 00000000-0000-0000-0000-000000000000
This new enhancement eliminates the possibility of the [SharePoint_Config] database becoming bloated. By imposing a ceiling on the [dbo].[TimerJobHistory] record count, a number of benefits are recognized such as disk space consumption for the database and its backups, I/O activity, backup impact on network throughput as well as maintenance activities (i.e. CHECKDB, Index, etc).
The new logic is also built into the execution when timer jobs are executed. The process goes something like this.
- Execute one or more stored procs to complete intended task in timer job
- Execute the AddTimerJobHistory proc
- Execute the DeleteOldTimerJobHistoryByRows proc
You can clearly see this pattern on all timer jobs is your turn up ULS tracing to VerboseEx using Set-SPLogLevel.
So you maybe thinking awesome I never have to worry about TimeJobHistory table bloat again. And indeed, this will clearly keep your TimerJobHistory table in check BUT what happens if previously your Delete Job History timer job hasn’t been running successfully for quite some time.
This week while working with a customer they patched their SharePoint 2013 farm with Nov 2017 CU and immediately ran into a critical issue within their SharePoint environment. The SharePoint admin applied Nov 2017 CU patch to all the servers in the farm and then started PSConfig process on first server. After 15 minutes or so PSConfig failed with a time out exception. Their SQL DBA notified the SharePoint admin that their 64 gig SQL server CPU was pegged at 100%.
A colleague of mine a SQL Server PFE started troubleshooting SQL Server and immediately spotted the issue. The TimeJobHistory table had 190 million rows dating back to last year. Due to the large table size there were hundreds of the [dbo].[proc_DeleteOldTimerJobHistoryByRows] sessions running causing excessive CPU pressures and thread pool waits.
When looking at the ULS logs on the SharePoint servers there were numerous SQL Server timeouts and slow queries.
So how do you resolve this issue of an extremely large TimeJobHistory table after you installed Nov 2017 CU. You will see some scripts that tell you to adjust DaysToKeepProperty on Delete Job History property and slowly purge data from TimeJobHistory table. While some of these solutions may work prior to this Nov 2017 CU installation, it will no longer since its simply executing the Delete Timer History job. My recommendation is to open a support case with Microsoft and they will provide you a supported fix that will quickly get your timer job history table back in order.
So, BEFORE you go and install the Nov 2017 CU on your SharePoint Farm please do some house keeping and validate the Delete Timer Job History job has been running successfully. If you have a SharePoint Config database that is large in size, check the row count and size of [dbo].[TimerJobHistory] table. If you have millions of rows execute Delete Job History timer job adjusting the DaysToKeepHistory size to slowly purge this table. The other option is to contact MS Support. If you explain to engineer in detail that your [dbo].[TimerJobHistory] table has grown to an unmanageable size they can provide a fix for you to clean up the data.