Quantcast
Channel: TechNet Blogs
Viewing all articles
Browse latest Browse all 34890

Troubleshooting Rapid Growth in Databases and Transaction Log Files in Exchange Server 2007 and 2010

$
0
0

A few years back, a very detailed blog post was released on Troubleshooting Exchange 2007 Store Log/Database growth issues.

We wanted to revisit this topic with Exchange 2010 in mind. While the troubleshooting steps needed are virtually the same, we thought it would be useful to condense the steps a bit, make a few updates and provide links to a few newer KB articles.

The below list of steps is a walkthrough of an approach that would likely be used when calling Microsoft Support for assistance with this issue. It also provides some insight as to what we are looking for and why. It is not a complete list of every possible troubleshooting step, as some causes are simply not seen quite as much as others.

Another thing to note is that the steps are commonly used when we are seeing “rapid” growth, or unexpected growth in the database file on disk, or the amount of transaction logs getting generated. An example of this is when an Administrator notes a transaction log file drive is close to running out of space, but had several GB free the day before. When looking through historical records kept, the Administrator notes that approx. 2 to 3 GBs of logs have been backed up daily for several months, but we are currently generating 2 to 3 GBs of logs per hour. This is obviously a red flag for the log creation rate. Same principle applies with the database in scenarios where the rapid log growth is associated to new content creation.

In other cases, the database size or transaction log file quantity may increase, but signal other indicators of things going on with the server. For example, if backups have been failing for a few days and the log files are not getting purged, the log file disk will start to fill up and appear to have more logs than usual. In this example, the cause wouldn’t necessarily be rapid log growth, but an indicator that the backups which are responsible for purging the logs are failing and must be resolved. Another example is with the database, where retention settings have been modified or online maintenance has not been completing, therefore, the database will begin to grow on disk and eat up free space. These scenarios and a few others are also discussed in the “Proactive monitoring and mitigation efforts” section of the previously published blog.

It should be noted that in some cases, you may run into a scenario where the database size is expanding rapidly, but you do not experience log growth at a rapid rate. (As with new content creation in rapid log growth, we would expect the database to grow at a rapid rate with the transaction logs.) This is often referred to as database “bloat” or database “space leak”. The steps to troubleshoot this specific issue can be a little more invasive as you can see in some analysis steps listed here (taking databases offline, various kinds of dumps, etc.), and it may be better to utilize support for assistance if a reason for the growth cannot be found.

Once you have established that the rate of growth for the database and transaction log files is abnormal, we would begin troubleshooting the issue by doing the following steps. Note that in some cases the steps can be done out of order, but the below provides general suggested guidance based on our experiences in support.

Step 1

Use Exchange User Monitor (Exmon) server side to determine if a specific user is causing the log growth problems.

  1. Sort on CPU (%) and look at the top 5 users that are consuming the most amount of CPU inside the Store process. Check the Log Bytes column to verify for this log growth for a potential user.
  2. If that does not show a possible user, sort on the Log Bytes column to look for any possible users that could be attributing to the log growth

If it appears that the user in Exmon is a ?, then this is representative of a HUB/Transport related problem generating the logs. Query the message tracking logs using the Message Tracking Log tool in the Exchange Management Consoles Toolbox to check for any large messages that might be running through the system. See #15for a PowerShell script to accomplish the same task.

Step 2

With Exchange 2007 Service Pack 2 Rollup Update 2 and higher, you can use KB972705 to troubleshoot abnormal database or log growth by adding the described registry values. The registry values will monitor RPC activity and log an event if the thresholds are exceeded, with details about the event and the user that caused it. (These registry values are not currently available in Exchange Server 2010)

Check for any excessive ExCDO warning events related to appointments in the application log on the server. (Examples are 8230 or 8264 events). If recurrence meeting events are found, then try to regenerate calendar data server side via a process called POOF.  See http://blogs.msdn.com/stephen_griffin/archive/2007/02/21/poof-your-calender-really.aspx for more information on what this is.

Event Type: Warning
Event Source: EXCDO
Event Category: General
Event ID: 8230
Description: An inconsistency was detected in username@domain.com: /Calendar/<calendar item> .EML. The calendar is being repaired. If other errors occur with this calendar, please view the calendar using Microsoft Outlook Web Access. If a problem persists, please recreate the calendar or the containing mailbox.

Event Type: Warning
Event ID : 8264
Category : General
Source : EXCDO
Type : Warning
Message : The recurring appointment expansion in mailbox <someone's address> has taken too long. The free/busy information for this calendar may be inaccurate. This may be the result of many very old recurring appointments. To correct this, please remove them or change their start date to a more recent date.

Important: If 8230 events are consistently seen on an Exchange server, have the user delete/recreate that appointment to remove any corruption

Step 3

Collect and parse the IIS log files from the CAS servers used by the affected Mailbox Server. You can use Log Parser Studio to easily parse IIS log files. In here, you can look for repeated user account sync attempts and suspicious activity. For example, a user with an abnormally high number of sync attempts and errors would be a red flag. If a user is found and suspected to be a cause for the growth, you can follow the suggestions given in steps 5 and 6.

Once Log Parser Studio is launched, you will see convenient tabs to search per protocol:

image

Some example queries for this issue would be:

image

Step 4

If a suspected user is found via Exmon, the event logs, KB972705, or parsing the IIS log files, then do one of the following:

  • Disable MAPI access to the users mailbox using the following steps (Recommended):
    • Run

      Set-Casmailbox –Identity <Username> –MapiEnabled $False

    • Move the mailbox to another Mailbox Store. Note: This is necessary to disconnect the user from the store due to the Store Mailbox and DSAccess caches. Otherwise you could potentially be waiting for over 2 hours and 15 minutes for this setting to take effect. Moving the mailbox effectively kills the users MAPI session to the server and after the move, the users access to the store via a MAPI enabled client will be disabled.
  • Disable the users AD account temporarily
  • Kill their TCP connection with TCPView
  • Call the client to have them close Outlook or turn of their mobile device in the condition state for immediate relief.

Step 5

If closing the client/devices, or killing their sessions seems to stop the log growth issue, then we need to do the following to see if this is OST or Outlook profile related:

Have the user launch Outlook whileholding down the control key which will prompt if you would like to run Outlook in safe mode. If launching Outlook in safe mode resolves the log growth issue, then concentrate on what add-ins could be attributing to this problem.

For a mobile device, consider a full resync or a new sync profile. Also check for any messages in the drafts folder or outbox on the device. A corrupted meeting or calendar entry is commonly found to be causing the issue with the device as well.

If you can gain access to the users machine, then do one of the following:

1. Launch Outlook to confirm the log file growth issue on the server.

2. If log growth is confirmed, do one of the following:

  • Check users Outbox for any messages.
    • If user is running in Cached mode, set the Outlook client to Work Offline. Doing this will help stop the message being sent in the outbox and sometimes causes the message to NDR.
    • If user is running in Online Mode, then try moving the message to another folder to prevent Outlook or the HUB server from processing the message.
    • After each one of the steps above, check the Exchange server to see if log growth has ceased
  • Call Microsoft Product Support to enable debug logging of the Outlook client to determine possible root cause.

3. Follow the Running Process Explorer instructions in the below article to dump out dlls that are running within the Outlook Process. Name the file username.txt. This helps check for any 3rd party Outlook Add-ins that may be causing the excessive log growth.
970920  Using Process Explorer to List dlls Running Under the Outlook.exe Process
http://support.microsoft.com/kb/970920

4. Check the Sync Issues folder for any errors that might be occurring

Let’s attempt to narrow this down further to see if the problem is truly in the OST or something possibly Outlook Profile related:

  • Run ScanPST against the users OST file to check for possible corruption.
  • With the Outlook client shut down, rename the users OST file to something else and then launch Outlook to recreate a new OST file. If the problem does not occur, we know the problem is within the OST itself.

If renaming the OST causes the problem to recur again, then recreate the users profile to see if this might be profile related.

Step 6

Ask Questions:

  • Is the user using any type of devices besides a mobile device?
  • Question the end user if at all possible to understand what they might have been doing at the time the problem started occurring. It’s possible that a user imported a lot of data from a PST file which could cause log growth server side or there was some other erratic behavior that they were seeing based on a user action.

Step 7

Check to ensure File Level Antivirus exclusions are set correctly for both files and processes per http://technet.microsoft.com/en-us/library/bb332342(v=exchg.141).aspx

Step 8

If Exmon and the above methods do not provide the data that is necessary to get root cause, then collect a portion of Store transaction log files (100 would be a good start) during the problem period and parse them following the directions in http://blogs.msdn.com/scottos/archive/2007/11/07/remix-using-powershell-to-parse-ese-transaction-logs.aspx to look for possible patterns such as high pattern counts for IPM.Appointment. This will give you a high level overview if something is looping or a high rate of messages being sent. Note: This tool may or may not provide any benefit depending on the data that is stored in the log files, but sometimes will show data that is MIME encoded that will help with your investigation

Step 9

If nothing is found by parsing the transaction log files, we can check for a rogue, corrupted, and large message in transit:

1. Check current queues against all HUB Transport Servers for stuck or queued messages:

get-exchangeserver | where {$_.IsHubTransportServer -eq "true"} | Get-Queue | where {$_.Deliverytype –eq “MapiDelivery”} | Select-Object Identity, NextHopDomain, Status, MessageCount | export-csv  HubQueues.csv

Review queues for any that are in retry or have a lot of messages queued:

Export out message sizes in MB in all Hub Transport queues to see if any large messages are being sent through the queues:

get-exchangeserver | where {$_.ishubtransportserver -eq "true"} | get-message –resultsize unlimited | Select-Object Identity,Subject,status,LastError,RetryCount,queue,@{Name="Message Size MB";expression={$_.size.toMB()}} | sort-object -property size –descending | export-csv HubMessages.csv

Export out message sizes in Bytes in all Hub Transport queues:

get-exchangeserver | where {$_.ishubtransportserver -eq "true"} | get-message –resultsize unlimited | Select-Object Identity,Subject,status,LastError,RetryCount,queue,size | sort-object -property size –descending | export-csv HubMessages.csv

2. Check Users Outbox for any large, looping, or stranded messages that might be affecting overall Log Growth.

get-mailbox -ResultSize Unlimited| Get-MailboxFolderStatistics -folderscope Outbox | Sort-Object Foldersize -Descending | select-object identity,name,foldertype,itemsinfolder,@{Name="FolderSize MB";expression={$_.folderSize.toMB()}} | export-csv OutboxItems.csv

Note: This does not get information for users that are running in cached mode.

Step 10

Utilize the MSExchangeIS Client\Jet Log Record Bytes/sec and MSExchangeIS Client\RPC Operations/sec Perfmon counters to see if there is a particular client protocol that may be generating excessive logs. If a particular protocol mechanism if found to be higher than other protocols for a sustained period of time, then possibly shut down the service hosting the protocol. For example, if Exchange Outlook Web Access is the protocol generating potential log growth, then stopping the World Wide Web Service (W3SVC) to confirm that log growth stops. If log growth stops, then collecting IIS logs from the CAS/MBX Exchange servers involved will help provide insight in to what action the user was performing that was causing this occur.

Step 11

Run the following command from the Management shell to export out current user operation rates:

To export to CSV File:

get-logonstatistics |select-object username,Windows2000account,identity,messagingoperationcount,otheroperationcount,
progressoperationcount,streamoperationcount,tableoperationcount,totaloperationcount | where {$_.totaloperationcount -gt 1000} | sort-object totaloperationcount -descending| export-csv LogonStats.csv

To view realtime data:

get-logonstatistics |select-object username,Windows2000account,identity,messagingoperationcount,otheroperationcount,
progressoperationcount,streamoperationcount,tableoperationcount,totaloperationcount | where {$_.totaloperationcount -gt 1000} | sort-object totaloperationcount -descending| ft

Key things to look for:

In the below example, the Administrator account was storming the testuser account with email.
You will notice that there are 2 users that are active here, one is the Administrator submitting all of the messages and then you will notice that the Windows2000Account references a HUB server referencing an Identity of testuser. The HUB server also has *no* UserName either, so that is a giveaway right there. This can give you a better understanding of what parties are involved in these high rates of operations

UserName : Administrator
Windows2000Account : DOMAIN\Administrator
Identity : /o=First Organization/ou=First Administrative Group/cn=Recipients/cn=Administrator
MessagingOperationCount : 1724
OtherOperationCount : 384
ProgressOperationCount : 0
StreamOperationCount : 0
TableOperationCount : 576
TotalOperationCount : 2684

UserName :
Windows2000Account : DOMAIN\E12-HUB$
Identity : /o= First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=testuser
MessagingOperationCount : 630
OtherOperationCount : 361
ProgressOperationCount : 0
StreamOperationCount : 0
TableOperationCount : 0
TotalOperationCount : 1091

Step 12

Enable Perfmon/Perfwiz logging on the server. Collect data through the problem times and then review for any irregular activities. You can reference Perfwiz for Exchange 2007/2010 data collection here http://blogs.technet.com/b/mikelag/archive/2010/07/09/exchange-2007-2010-performance-data-collection-script.aspx

Step 13

Run ExTRA (Exchange Troubleshooting Assistant) via the Toolbox in the Exchange Management Console to look for any possible Functions (via FCL Logging) that may be consuming Excessive times within the store process. This needs to be launched during the problem period. http://blogs.technet.com/mikelag/archive/2008/08/21/using-extra-to-find-long-running-transactions-inside-store.aspx shows how to use FCL logging only, but it would be best to include Perfmon, Exmon, and FCL logging via this tool to capture the most amount of data. The steps shown are valid for Exchange 2007 & Exchange 2010.

Step 14

Export out Message tracking log data from affected MBX server.

Method 1

Download the ExLogGrowthCollector script and place it on the MBX server that experienced the issue. Run ExLogGrowthCollector.ps1 from the Exchange Management Shell. Enter in the MBX server name that you would like to trace, the Start and End times and click on the Collect Logs button.

image

Note: What this script does is to export out all mail traffic to/from the specified mailbox server across all HUB servers between the times specified. This helps provide insight in to any large or looping messages that might have been sent that could have caused the log growth issue.

Method 2

Copy/Paste the following data in to notepad, save as msgtrackexport.ps1 and then run this on the affected Mailbox Server. Open in Excel for review. This is similar to the GUI version, but requires manual editing to get it to work.

#Export Tracking Log data from affected server specifying Start/End Times
Write-host "Script to export out Mailbox Tracking Log Information"
Write-Host "#####################################################"
Write-Host
$server = Read-Host "Enter Mailbox server Name"
$start = Read-host "Enter start date and time in the format of MM/DD/YYYY hh:mmAM"
$end = Read-host "Enter send date and time in the format of MM/DD/YYYY hh:mmPM"
$fqdn = $(get-exchangeserver $server).fqdn
Write-Host "Writing data out to csv file..... "
Get-ExchangeServer | where {$_.IsHubTransportServer -eq "True" -or $_.name -eq "$server"} | Get-MessageTrackingLog -ResultSize Unlimited -Start $start -End $end  | where {$_.ServerHostname -eq $server -or $_.clienthostname -eq $server -or $_.clienthostname -eq $fqdn} | sort-object totalbytes -Descending | export-csv MsgTrack.csv -NoType
Write-Host "Completed!! You can now open the MsgTrack.csv file in Excel for review"

Method 3

You can also use the Process Tracking Log Tool at http://blogs.technet.com/b/exchange/archive/2011/10/21/updated-process-tracking-log-ptl-tool-for-use-with-exchange-2007-and-exchange-2010.aspx to provide some very useful reports.

Step 15

Save off a copy of the application/system logs from the affected server and review them for any events that could attribute to this problem.

Step 16

Enable IIS extended logging for CAS and MB server roles to add the sc-bytes and cs-bytes fields to track large messages being sent via IIS protocols and to also track usage patterns (Additional Details).

Step 17

Get a process dump the store process during the time of the log growth. (Use this as a last measure once all prior activities have been exhausted and prior to calling Microsoft for assistance. These issues are sometimes intermittent, and the quicker you can obtain any data from the server, the better as this will help provide Microsoft with information on what the underlying cause might be.)

  • Download the latest version of Procdump from http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx and extract it to a directory on the Exchange server
  • Open the command prompt and change in to the directory which procdump was extracted in the previous step.
  • Type

    procdump -mp -s 120 -n 2 store.exe d:\DebugData

    This will dump the data to D:\DebugData. Change this to whatever directory has enough space to dump the entire store.exe process twice. Check Task Manager for the store.exe process and how much memory it is currently consuming for a rough estimate of the amount of space that is needed to dump the entire store dump process.
    Important: If procdump is being run against a store that is on a clustered server, then you need to make sure that you set the Exchange Information Store resource to not affect the group. If the entire store dump cannot be written out in 300 seconds, the cluster service will kill the store service ruining any chances of collecting the appropriate data on the server.

Open a case with Microsoft Product Support Services to get this data looked at.

Most current related KB articles

2814847 - Rapid growth in transaction logs, CPU use, and memory consumption in Exchange Server 2010 when a user syncs a mailbox by using an iOS 6.1 or 6.1.1-based device

2621266 - An Exchange Server 2010 database store grows unexpectedly large

996191 - Troubleshooting Fast Growing Transaction Logs on Microsoft Exchange 2000 Server and Exchange Server 2003

Kevin Carker
(based on a blog post written by Mike Lagase)


Viewing all articles
Browse latest Browse all 34890

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>