The workflow failed to start due to an internal error

Yeah, that error.  Like an STD, its the one you hope you don't get! There are many reasons this error can creep up on you.  The most common being that the content types are corrupted and cannot be assigned to the task and history lists.

There are several blog posts that try to help here:

  • Uninstall and re-install the workflow features
    • In essense this should be the fall back, as it should remove the content types and put them back – as I found out, this didn't work in my case
  • If you are building your own workflow, ensure that your dlls have been loaded and the assembly manifests match up 
  • Make sure your Infopath forms are in the right places and haven't been modified
    • Typically, the forms are loaded up at the site collection under (_Catalogs/wfpub)
  • Make sure your web.config file was not corrupted and all the workflow entries exist (System.Workflow.ComponentModel.WorkflowCompiler)

So what was special in my case that none of these worked?  A simple answer, an orphaned web.  We were doing some migration work and we started to move a very large web from one web application to another.  We stopped the process halfway through, then we deleted the failed web.  Unfortunately, we didn't realize that this created the orphaned web.  As part of being orphaned, the transaction that does the deleting doesn't complete all its tasks and you are left with residual in the database. 

How did I reach this conclusion?  I tested workflows in another site collection in the same web application, they worked (ruling out that corruption occured via cumulative updates and feature corruption).  I tested the workflows in two different webs in the same site collection, both failed.  I tested an import on a different QA server and workflows still worked (but being that I didn't replicate what we did exactly last time, stop halfway, not a reliable test).  I kept at it and re-fired the workflow and saw that it would execute on the second or third execution, but it wouldn't create any tasks.  That lead me to believe that there was something wrong with the task list.  I looked at the working task list and saw that the content types DID exist on the list, but in the broken site, they did not.  I decided to manually add the content type via powershell and bingo…a helpful error ("Object reference not found").  Doing some searching I ran across this post:

http://social.technet.microsoft.com/Forums/zh/sharepoint2010programming/thread/dc211298-75b4-4c1a-8c95-acf6d610ed6f

Each content type has to have a field.  Looking at the content types, they didn't have any fields!  How weird right?  That made me realize that the content types were corrupt.  Therefore, I needed to delete all the content types and parents that didn't have fields.  This started at Publishing Approval Workflow Task (en-US)->SharePoint Server Workflow Task->Workflow Task.  Workflow task being one before the system content type "Task".  If you lose that one, you are screwed!

Luckily we didn't have many workflows so I decided to delete all the workflow content types (from all Task lists, which means the tasks must be deleted too).  I was able to delete the first two easily (turn off the features), but when I went to delete "Workflow Task" it would not delete as it was part of a feature (that meant chaning the bit column in the database for the content type)!  I turned off the feature (695b6570-a48b-4a8e-8ea5-26ea7fc1d162), but it still would not delete as it was being used!  So using the handly SPContentTypeUsage class (as blogged here – http://sharepoint.mindsharpblogs.com/NancyB/archive/2011/01/17/Finding-children-the-easy-way-(Content-Type-children,-that-is).aspx).  I was able to find the usage of the content type.  To my amazement…a deleted/orphaned web was showing. 

Once a web is deleted, you have no visibility to it via the object model.  That means a trip to the database tables was the next step.  Remember, the residual I mentioned earlier?   In our case, the ContentType and ContentTypeUsage tables had yuky residual in them.  I had to manually remove the ContentTypeUsage records, then manually delete the ContentType row for the orphaned web.

After that i was able to remove the ContentType via the object model.  I then reactived the workflow features and my problem was solved!  All the content types show fields for them now and all workflows work as expected!

Enjoy,
Chris