SharePoint 2013 – DCS keeps crashing – distributedcacheservice.exe System.UriFormatException

Grr, what's up with the Distributed Cache not working Out of the box? 

This was happening on my newly upgrade SharePoint Farm.  Every time I would do a FBA login the service would crash (or so I thought they were related, but it just turns out the identity is attempted to be saved to the token cache, but doesn't matter to the overall login process).  I found some posts online about client dlls not being correct.  It seems like they may have deployed the dlls with the same version even though the code is different!  Thinking that was the problem, I stopped the service in Central Administration, then I uninstalled the AppFabric 1.1 software from uninstall programs.  I then downloaded a fresh copy of it and the 1.1 Cumulative Update.  After that I started the service and it quit crashing.  I did notice that it didn't deploy the dlls to the GAC, so the service runs, but when a web application tries to do anything with the cache…it fails by not finding the dlls (see below).  For example, you will find that you cannot do in-place site upgrades, you will get the following error:

Exception: Could not load file or assembly 'Microsoft.ApplicationServer.Caching.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified.

This happens if the OWSTIMER.exe can't find the AppFabric 1.1 dlls (site upgrade occurs in the timer service). You can add them to the GAC (recommended way) or add them to the BIN directory in the SharePoint root (where OWSTIMER.exe lives). You have to use the /gac option in the installer via commandline or you can also use the gacutil.exe utility to load them from the C:Program FilesAppFabric 1.1 for Windows Server folder.

Ok, so its up and running again right?  No.  You will get the following error:

Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0009>:SubStatus<ES0001>:Cache referred to does not exist. Contact administrator or use the Cache administration tool to create a Cache. 

This is because a fresh install of AppFabric doesn't setup the default cache stores for SharePoint.  That is done in the pre-req installer process.  You need to run the following powershell command in a SharePoint Management Shell:

Add-SPDistributedCacheServiceInstance

Once this is done, you will find that the distributed cache may start crashing again (back to where we started damn it):

Application: DistributedCacheService.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: Microsoft.ApplicationServer.Caching.DataCacheException

Stack:

at Microsoft.ApplicationServer.Caching.VelocityWindowsService.ThrowCallback(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()

You wont' find anything helpful in the ULS logs, but a deeper look into the event logs (Applications adn Services Logs->Microsoft->Windows->Application Server – System Services->Microsoft-Windows-Application Server-System Services/Admin) you will find an details error:

ErrorCode<UnspecifiedErrorCode>:SubStatus<ES0001>:Invalid URI: The hostname could not be parsed.

The config files for the distributed cache is here:

C:Program FilesAppFabric 1.1 for Windows ServerDistributedCacheService.exe.config

I made two changes to this file:

  1. Updated the account to the Farm account
  2. Changed the name to the fully qualified DNS

Then realized that I had some entries in my host file for the local server name.  I took those out too.

I then switched back to the PowerShell window and ran the following:

Stat-CacheHost -Computername <ComputerName> -CachePort 22233

After waiting a few seconds, I run:

Get-CacheHost

You should see the service is in an "UP" state. 

HostName : CachePort Service Name            Service Status Version Info
——————– ————            ————– ————
blah:22233    AppFabricCachingService UP             3 [3,3][1,3]

Once you have verified this, open a browser to a sharepoint team site.  You should see that you are now getting cache hits in the ULS logs:

11/25/2012 16:06:00.97  OWSTIMER.EXE (0x0510)                    0x08B0 SharePoint Server              General                        ahjne Verbose  Looking for a cached value matching cb69ce2d-a0dc-4771-9af6-9559c61e007c in the Profile Property Cache cache. 5cc3e49b-e920-00b0-677e-a7fd04a0711b
11/25/2012 16:06:00.97  OWSTIMER.EXE (0x0510)                    0x08B0 SharePoint Server              General                        ahjnh Verbose  Cache hit. 5cc3e49b-e920-00b0-677e-a7fd04a0711b

Chris