Extending the Ceres Engine with custom flows and operators

So what the heck does that title mean?  Well, for those of you that are not familiar with Search (which is a majority of you out there).  The actual engine is called "Ceres".  As in the dwarf planet in our solar system (Wikipedia).  Keeping with universe terms, there is also a constellation of nodes in the search engine that make up the universe of bodies in the engine.  If you take a minute, you will find several references to Constellation in the various classes inside the assemblies, but enough about the universe, what about extending the Ceres engine?

When it comes to search, many of you are already familiar with the various nodes types that make up the system.  This includes:

  • Admin
  • Content Processing
  • Query
  • Indexing
  • Analytics

But that's the easy part. and so are the architecture design aspects!  This post will take you into a rabbit hole that you may never come out of!  For the purpose of this post, we are interested in the Content Processing component AND the Query component.  If you dive into the core of the Content Processing component you will find that it is made up of a series of flows.  You can find the registered flows in the "C:Program FilesMicrosoft Office Servers15.0SearchResourcesBundles" directory, I will describe what these dlls are and how they get generated later in the post.  Here is the full list (in the future I will update this post with what each of these flows purpose is):

  • Microsoft.ContentAlignmentFlow
  • Microsoft.CustomDictionaryDeployment
  • Microsoft.ThesaurusDeployment
  • Microsoft.CXDDeploymentCaseInSensitive
  • Microsoft.CXDDeploymentCaseSensitive
  • Microsoft.PeopleAnalyticsOutputFlow
  • Microsoft.PeopleAnalyticsFeederFlow
  • Microsoft.ProductivitySearchFlow
  • Microsoft.SearchAnalyticsFeederFlow
  • Microsoft.SearchAnalyticsInputFlow
  • Microsoft.SearchAnalyticsOutputFlow
  • Microsoft.SearchAuthorityInputFlow
  • Microsoft.SearchClicksAnalysisInputFlow
  • Microsoft.SearchDemotedInputFlow
  • Microsoft.SearchReportsAnalysisInputFlow
  • Microsoft.UsageAnalyticsFeederFlow
  • Microsoft.UsageAnalyticsReportingAPIDumperFlow
  • Microsoft.UsageAnalyticsUpdateFlow
  • Microsoft.CrawlerFlow
  • Microsoft.CrawlerAcronymExtractionSubFlow
  • Microsoft.CrawlerAlertsDataGenerationSubFlow
  • Microsoft.CrawlerAliasNormalizationSubFlow
  • Microsoft.CrawlerComputeFileTypeSubFlow
  • Microsoft.CrawlerCCAMetadataGenerationSubFlow
  • Microsoft.CrawlerContentEnrichmentSubFlow
  • Microsoft.CrawlerDefinitionClassificationSubFlow
  • Microsoft.CrawlerDocumentSignatureGenerationSubFlow
  • Microsoft.CrawlerDocumentSummaryGenerationSubFlow
  • Microsoft.CrawlerHowToClassificationSubFlow
  • Microsoft.CrawlerLanguageDetectorSubFlow
  • Microsoft.CrawlerLinkDeleteSubFlow
  • Microsoft.CrawlerNoIndexSubFlow
  • Microsoft.CrawlerPhoneNumberNormalizationSubFlow
  • Microsoft.CrawlerSearchAnalyticsSubFlow
  • Microsoft.CrawlerTermExtractorSubFlow
  • Microsoft.CrawlerWordBreakerSubFlow
  • Microsoft.SharePointSearchProviderFlow
  • Microsoft.PeopleExpertiseSubFlow
  • Microsoft.PeopleFuzzyNameMatchingSubFlow
  • Microsoft.PeopleKeywordParsingSubFlow
  • Microsoft.PeopleLinguisticsSubFlow
  • Microsoft.PeopleResultRetrievalAndProcessingSubFlow
  • Microsoft.PeopleSearchFlow
  • Microsoft.PeopleSecuritySubFlow
  • Microsoft.OpenSearchProviderFlow
  • Microsoft.ExchangeSearchProviderFlow
  • Microsoft.DocParsingSubFlow
  • Microsoft.MetadataExtractorSubFlow
  • Microsoft.AcronymDefinitionProviderFlow
  • Microsoft.BestBetProviderFlow
  • Microsoft.QueryClassificationDictionaryCompilationFlow
  • Microsoft.RemoteSharepointFlow
  • Microsoft.PersonalFavoritesProviderFlow
  • Microsoft.QueryRuleConditionMatchingSubFlow
  • Microsoft.CrawlerDocumentRetrievalSubFlow
  • Microsoft.CrawlerIndexingSubFlow
  • Microsoft.CrawlerPropertyMappingSubFlow
  • Microsoft.CrawlerSecurityInsertSubFlow
  • Microsoft.OOTBEntityExtractionSubFlow
  • Microsoft.CustomEntityExtractionSubFlow

The most important flow is the Microsoft.Crawlerflow.  This flow is the master flow and defines the order of how all the other flows will be executed.  A flow is simply an xml document that defines the flows and operators that should be executed on an item that is processed in the engine.  The xml makes up an OperatorGraph.  Each Operator has a name and a type attribute.  The type attribute is made up of the namespace where the class lives that contains the code for the flow and then the name property of a special attribute added to the class.  Each operator is deserialized into an instance of a class as the flow is "parsed".  As you review the xml, you should see that the flow of the flow is determined by the "operatorMonkier" that has the name of the next operator that should be executed.  The first part of this file looks like the following:

 

<?xml version="1.0" encoding="utf-8"?>
<OperatorGraph dslVersion="1.0.0.0" name="" xmlns="http://schemas.microsoft.com/ceres/studio/2009/10/flow">
  <Operators>

    <Operator name="FlowInput" type="Microsoft.Ceres.Evaluation.Operators.Core.Input">
      <Targets>
        <Target breakpointEnabled="false">
          <operatorMoniker name="//Init" />
        </Target>
      </Targets>
      <Properties>
        <Property name="inputName" value="&quot;CSS&quot;" />
        <Property name="useDisk" value="False" />
        <Property name="sortedPrefix" value="0" />
        <Property name="updatePerfomanceCounters" value="True" />
      </Properties>
      <OutputSchema>
        <Field name="content" type="Bucket" />
        <Field name="id" type="String" />
        <Field name="source" type="String" />
        <Field name="data" type="Blob" />
        <Field name="getpath" type="String" />
        <Field name="encoding" type="String" />
        <Field name="collection" type="String" />
        <Field name="operationCode" type="String" />
      </OutputSchema>
    </Operator>

    <Operator name="Init" type="Microsoft.Ceres.ContentEngine.Operators.BuiltIn.Mapper">
      <Targets>
        <Target breakpointEnabled="false">
          <operatorMoniker name="//Operation Router" />
        </Target>
      </Targets>
      <Properties>
        <Property name="expressions" value="{&quot;externalId&quot;=&quot;ToInt64(Substring(id, 7))&quot;}"/>
        <Property name="fieldsToRemove" />
        <Property name="adaptableType" value="True" />
      </Properties>
      <OutputSchema>
        <Field name="tenantId" type="Guid" canBeNull="true" expression="IfThenElse(BucketHasField(content, &quot;012357BD-1113-171D-1F25-292BB0B0B0B0:#104&quot;), ToGuidFromObject(GetFieldFromBucket(content, &quot;012357BD-1113-171D-1F25-292BB0B0B0B0:#104&quot;)), ToGuid(&quot;0C37852B-34D0-418E-91C6-2AC25AF4BE5B&quot;))" />
        <Field name="isdir" type="Boolean" canBeNull="true" expression="NullValue(ToBoolean(GetFieldFromBucket(content, &quot;isdirectory&quot;)),false)" />
        <Field name="noindex" type="Boolean" canBeNull="true" expression="NullValue(ToBoolean(GetFieldFromBucket(content, &quot;noindex&quot;)),false)" />
        <Field name="oldnoindex" type="Boolean" canBeNull="true" expression="NullValue(ToBoolean(GetFieldFromBucket(content, &quot;oldnoindex&quot;)),false)" />
        <Field name="getpath" type="String" expression="IfThenElse(BucketHasField(content, &quot;path_1&quot;), GetStringFromBucket(content, &quot;path_1&quot;), GetStringFromBucket(content, &quot;path&quot;))" />
        <Field name="extrapath" type="String" expression="GetStringFromBucket(content, &quot;path_1&quot;)" />
        <Field name="size" type="Int32" expression="TryToInt32(GetFieldFromBucket(content, &quot;size&quot;))" />
        <Field name="docaclms" type="Blob" expression="GetFieldFromBucket(content, &quot;docaclms&quot;)" />
        <Field name="docaclsp" type="Blob" expression="GetFieldFromBucket(content, &quot;spacl&quot;)" />
        <Field name="docaclmeta" type="String" expression="IfThenElse(BucketHasField(content, &quot;2EDEBA9A-0FA8-4020-8A8B-30C3CDF34CCD:docaclmeta&quot;), GetStringFromBucket(content, &quot;2EDEBA9A-0FA8-4020-8A8B-30C3CDF34CCD:docaclmeta&quot;), GetStringFromBucket(content, &quot;docaclmeta&quot;))" />
        <Field name="docaclgrantaccesstoall" type="Boolean" canBeNull="true" expression="NullValue(ToBoolean(GetFieldFromBucket(content, &quot;grantaccesstoall&quot;)),false)" />
        <Field name="externalId" type="Int64" expression="&quot;ToInt64(Substring(id, 7))&quot;" />
        <Field name="sitecollectionid" type="Guid" canBeNull="true" expression="ToGuid(GetStringFromBucket(content, &quot;00130329-0000-0130-C000-000000131346:ows_SiteID&quot;))" />
        <Field name="fallbackLanguage" type="String" expression="&quot;en&quot;" />
        <Field name="Path" type="String" expression="GetStringFromBucket(content, &quot;49691C90-7E17-101A-A91C-08002B2ECDA9:#9&quot;)"/>
        <Field name="SiteID" type="String" expression="GetStringFromBucket(content, &quot;00130329-0000-0130-C000-000000131346:ows_SiteID&quot;)" />
  <Field name="ContentSourceID" type="Int64" canBeNull="true" expression="IfThenElse(BucketHasField(content, &quot;012357BD-1113-171D-1F25-292BB0B0B0B0:#662&quot;), ToInt64(GetFieldFromBucket(content, &quot;012357BD-1113-171D-1F25-292BB0B0B0B0:#662&quot;)), ToInt64(-1))" />
  <Field name="Attachments" type="List&lt;Stream&gt;" canbenull="true" expression="GetFieldFromBucket(content, &quot;attachments&quot;)"/>
  <Field name="FileExtension" type="String" expression="IfThenElse(BucketHasField(content, &quot;0B63E343-9CCC-11D0-BCDB-00805FCCCE04:FileExtension&quot;), GetStringFromBucket(content, &quot;0B63E343-9CCC-11D0-BCDB-00805FCCCE04:FileExtension&quot;), &quot;&quot;)" />
      </OutputSchema>
    </Operator>

 

As you can see the first operator that is executed is of the type "Microsoft.Ceres.Evaluation.Operators.Core.Input".  This means that if you look in the "Microsoft.Ceres.Evaluation.Operators" namespace, you will find a class that is decorated like the following:

 

[Serializable, Operator("Input", MinInputCount=0, MaxInputCount=0)]
public class InputOperator :
TypedOperatorBase<InputOperator>, IMemoryUsingOperator, IOutputTypeConfigurableOperator
{

 

You should note that the class is marked as serializable and that the Operator attribute has been added with the name "Input".  Again, the combination of the namespace of the class and the name of the attribute are used to find the operator when the flow is executed.

One of the flows that I am most interested in is the Microsoft.CrawlerContentEnrichmentSubFlow.  As some of you are aware, you can "extend", really don't like that word used in context of Content Enrichment now that I know how to do flow insertion, using a web service to add your own logic to create new crawled properties on items that pass through the engine.  You can find more information about content enrichment and examples of using it at http://msdn.microsoft.com/en-us/library/jj163968.aspx.  Now, Microsoft is going to tell you that this is the only supported way to extend the Ceres engine.  And that is correct.  What I am about to show you has never been done outside of Microsoft and if you venture down this path, you do so on your own.  Anyway, the problem with the CES is that it is not flexible and it uses stupid old technology called web services.  that means it is sending this big ugly xml around on the wire…not JSON.  Bummer.  That's not the only thing.  When you look at the pipeline and all the  things you are indexing, if you do not put a trigger on the CES, EVERY single item will be passed to your service.  You would then need to have all kinds of logic to determine the type of the item, what properties exist on it by looping through them all and so many other weird bad things it just makes me cringe.  Now, if you do put a trigger on it, you are now limiting yourself to implementing a very targeted set of logic.  You have no ability to add more than one CES with different triggers with different logic. Huh?  Big feature gap here.  I'm not a fan.  So for people that just don't want to multiple the time it takes to do a crawl by 100 to 1000x over because you implemented CES, you need a better option.  A faster option.  A more reliable and performant option.  One that lives in the engine, not outside of it.  If you want to know how to do this…keep reading!

Ok, so this is all simple so far.  But how does one add a new flow to the Ceres engine and then implement your own Operators?  Well, this is much more difficult than you think!

The first step is to create an operator class that inherits from TypedOperatorBase<T>. Where T is the class name. This is an abstract class and you must implement the method called ValidateAndType. You can see most of this in the operator example above. The next step is to add the Serializable and Operator attributes to the class. Ok, fair enough, now what do we do? If you look at the XML of an operator, you will see that you can implement properties and that those properties are simply deseriabled to the properties in the class. Ok, so add some properties. In my example, I create a class with one property:

 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres.Evaluation;
using Microsoft.Ceres.Evaluation.Operators;
using Microsoft.Ceres.Evaluation.Operators.PlugIns;

namespace CustomOperator
{
    [Serializable, Operator("CustomOperator")]
    public class CustomOperator : TypedOperatorBase<CustomOperator>
    {
        private string custom = "";

        public CustomOperator()
        {
            this.custom = "Chris Givens was here";
        }

        [Property(Name="custom")]
        public string Custom
        {
            get { return custom; }
            set { custom = value; }
        }

        protected override void ValidateAndType(OperatorStatus status, IList<Microsoft.Ceres.Evaluation.Operators.Graphs.Edge> inputEdges)
        {
            status.SetSingleOutput(base.SingleInput.RecordSetType);           
        }
    }
}

 

Ok, great.  So now what do we do?  Well, I wasn't sure if the system would just pick up the assembly from the GAC dynamically so I figured, let's just deploy the solution and try to add a flow with the operator in it.  Here's how you do that:

Deploy the assembly to the GAC…easy, right-click the project, select "Deploy"

Next, create a new flow (xml file) that uses the operator:

 

<?xml version="1.0" encoding="utf-8" ?>
<OperatorGraph dslVersion="1.0.0.0" name="CustomFlow" xmlns=" http://schemas.microsoft.com/ceres/studio/2009/10/flow">
  <Operators>   

    <Operator name="SubFlowInput" type="Microsoft.Ceres.ContentEngine.Operators.BuiltIn.SubFlow.SubFlowInput">
      <Targets>
        <Target breakpointEnabled="false">         
          <operatorMoniker name="//CustomOperator" />
          <!–
          <operatorMoniker name="//SubFlowOutput" />
          –>
        </Target>
      </Targets>
      <Properties>
        <Property name="adaptableType" value="True" />
      </Properties>
    </Operator>
       
    <Operator name="CustomOperator" type="CustomOperator.CustomOperator">                                         
      <Targets>
        <Target breakpointEnabled="false">
          <operatorMoniker name="//SubFlowOutput" />
        </Target>
      </Targets>
      <Properties>
        <Property name="custom" value="2048"/>
      </Properties>
    </Operator>  
   
    <Operator name="SubFlowOutput" type="Microsoft.Ceres.ContentEngine.Operators.BuiltIn.SubFlow.SubFlowOutput" />
 
  </Operators>
</OperatorGraph>

 

Connect to the ceres engine and try to deploy the flow:

 

Add-PsSnapin Microsoft.SharePoint.Powershell
& "C:Program FilesMicrosoft Office Servers15.0SearchScriptsceresshell.ps1"
Connect-System -Uri (Get-SPEnterpriseSearchServiceApplication).SystemManagerLocations[0] -ServiceIdentity contososp_farm
Connect-Engine -NodeTypes InterActionEngine
$flowname = "CustomFlow"
Remove-Flow $flowname
Get-Content C:CustomOperatorCustomOperator$flowname.xml | Out-String | Add-Flow $flowname
Stop-Flow –FlowName $flowname –ForceAll

 

You will get the following error that the system cannot find the Operator called CustomOperator.CustomOperator.  Bummer.  So that didn't work.  So how do I "register" my operator with the engine?  Well, it turns out that their is so much more that needs to be done than simply creating an operator class.  You also need to create several other classes with special attributes attached to them.  Sooo…here we go!

First off, you will need to create a Producer class.  This producer is really the class that does all the work.  The operator is really just a way to get some parameters into the producer.  As you can see the Producer inherits from SingleOutputProducer<T>, where T is your operator class. Here is an example of the producer:

 

 using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres;
using Microsoft.Ceres.Evaluation;
using Microsoft.Ceres.Evaluation.DataModel;
using Microsoft.Ceres.Evaluation.DataModel.Types;
using Microsoft.Ceres.Evaluation.Processing;
using Microsoft.Ceres.Evaluation.Processing.Producers;

namespace CustomOperator
{
    public class CustomProducer : SingleOutputProducer<CustomOperator>
    {
        private CustomOperator op;
        private IRecordSetTypeDescriptor type;
        private IEvaluationContext context;
       
        public CustomProducer(CustomOperator op, IRecordSetTypeDescriptor type, IEvaluationContext context)
        {
            this.op = op;
            this.type = type;
            this.context = context;
        }

        private IUpdateableRecord holder;
        //private Item holderItem;

        public override void ProcessRecord(IRecord record)
        {
            this.holder.UpdateFrom(record);

            base.SetNextRecord(record);
        }
    }
}

 

Next up is to create a NamedPlugInSource.  Operators are also called "PlugIns".  These plugins must be registered with the system in order for you to use them.  If you review all the operator assembiles, you will see that there is always some kind of *PlugInSource class that has the role of adding plugins to the Ceres core system.  For my pluginsource, I only have one operator and that is my CustomOperator:

 using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres.CoreServices.Services;
using Microsoft.Ceres.CoreServices.Services.Node;

using Microsoft.Ceres.Evaluation.Operators;
using Microsoft.Ceres.Evaluation.Operators.PlugIns;

namespace CustomOperator
{
    [DynamicComponent]
    public class CustomPlugInSource : NamedPlugInSource<OperatorBase>
    {
        public static OperatorBase PlugIn1()
        {
            File.AppendAllText(@"c: empsearch.txt", "PlugIn1");
            return new CustomOperator();
        }

        protected override void AddPlugIns()
        {
            File.AppendAllText(@"c: empsearch.txt", "AddPlugIns");
            Func<OperatorBase> f = PlugIn1;
            base.Add(typeof(CustomOperator),f);
        }
    }
}

Now that you have the plugin built.  You will notice that is has been decorated with the "DynamicComponent" attribute.  This is where the "Ah-ha" moment kicks in.  By adding this attribute to the assembly, Ceres knows that is must start this as a managed component in the system.  However, just simply deploying this to the GAC, will not get Ceres to recognize the assembly and load the components.  We'll get to that soon, we still have lots more to talk about!

Next up is an Evaluator.  An Evaluator is responsible for actually making the call to the producer.  In my example I create a class that inherits from ProductEvaluator<T> where T is my CustomOperator.  ProductEvaluator is again an abstract class with one method called GetProducer.  You must instatiate your producer here and return it.  There are many types of producers, but I have not had the time to document all of them as of yet. Soon though!

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres.CoreServices.Services;
using Microsoft.Ceres.CoreServices.Services.Node;
using Microsoft.Ceres.CoreServices.Services.Container;
using Microsoft.Ceres.Evaluation.Processing;
using Microsoft.Ceres.Evaluation.Processing.Producers;
using Microsoft.Ceres.Evaluation.Operators;

using Microsoft.Ceres.Evaluation.DataModel;
using Microsoft.Ceres.Evaluation.Operators.Graphs;

namespace CustomOperator
{
    public class CustomEvaluator : ProducerEvaluator<CustomOperator>
    {       
        /*
        protected override IRecordSet SetupOutput(Edge outputEdge, IList<IRecordSet> inputs)
        {
            CustomProducer cp = new CustomProducer();                       
            return null ;
        }
         */

        protected override IRecordProducer GetProducer(CustomOperator op, Microsoft.Ceres.Evaluation.DataModel.Types.IRecordSetTypeDescriptor type, IEvaluationContext context)
        {
            return new CustomProducer(op, type, context);
        }
    }
}

Next on the list is an EvaluatorBinder.  The evaluator binder is responsible for registering an operator with an evaluator.  This class will inherit from AbstractEvaluatorBinder and need to implement the AddBoundOperators and BindEvaluator methods:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres.Evaluation.Processing;
using Microsoft.Ceres.Evaluation.Operators;

namespace CustomOperator
{
    public class CustomEvaluatorBinder : AbstractEvaluatorBinder
    {
        protected override void AddBoundOperators()
        {
          base.Add(typeof(CustomOperator));
        }

        public override Evaluator BindEvaluator(OperatorBase op, IEvaluationContext context)
        {
            if (op is CustomOperator)
            {
                return new CustomEvaluator();
            }

            return null;
        }
    }
}

Last on the list is the EvaluatorBinderSource.  Similar to a PlugInSource, this will also be decorated with the DynamicComponent attribute which will instantiate and register the evaluators.  Here is the binder source:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres.CoreServices.Services;
using Microsoft.Ceres.CoreServices.Services.DependencyInjection;
using Microsoft.Ceres.CoreServices.Services.Node;
using Microsoft.Ceres.CoreServices.Services.Container;
using Microsoft.Ceres.Evaluation.Processing;
using Microsoft.Ceres.Evaluation.Operators;

namespace CustomOperator
{
    [DynamicComponent]
    public class CustomEvaluatorBinderSource : AbstractContainerManaged
    {
        [Exposed]
        public IEvaluatorBinder CieEvaluatorBinder
        {
            get
            {
                this.exampleBinder = new CustomEvaluatorBinder();
                return this.exampleBinder;
            }
        }

        private CustomEvaluatorBinder exampleBinder;
    }
}

You now have everything you need to add a new flow and operator to the Ceres engine!  Kinda.  If you deploy the code at this point, you will notice if you try to run the above install script, you will still get the same error!  This is because the assemblies only get loaded when you restart the Host Controller service.  NOTE:  You can read more about the Host controller service in Randy Williams and I's MSPress book on SharePoint due out very soon.  Ok, so re-start the service.  Try the commands…NO GO…bummer.  But I did everything you said Chris!  Why doesn't it recognize my operator? Well…going back to my previous statement, Ceres nodes don't look at the entire GAC and analyze every class. That would be WAAAY to expensive. So it only does the one that it is told to do. This was the final magic step that I stumbled upon very luckily.

For each node that is started (via the NodeRunner.exe process), each one is fed its own configuration file that drives the WCF configuration.  This file is stored in C:Program FilesMicrosoft Office Servers15.0SearchRuntime1.0
oderunner.exe.config.  It is a very generic file, not much going on here.  As part of the NodeController code, it will look for another file and feed some special values into the process in addition to the regular app.config file.  These files are stored in the Ceres node directory which is in C:Program FilesMicrosoft Office Servers15.0DataOffice ServerApplicationsSearchNodes<RandomNodeID>.  Each role that has been assigned to the server will get a directory under this path.  Since  most of what we are doing is related to the ContentProcessingComponent, let's look there first.  If you open and explore this directory, what you will find is a nodeprofile.xml file.  It looks like this…tell me if you notice anything interesting:

<?xml version="1.0" encoding="utf-8"?>
<NodeProfile xmlns="http://schemas.microsoft.com/ceres/hostcontroller/2011/08/nodeprofile">
  <AutoStart xmlns="">true</AutoStart>
  <Stopped xmlns="">false</Stopped>
  <Modules xmlns="" />
  <Properties xmlns="">
    <Property Key="Managed.Node.Name" Type="string" Value="ContentProcessingComponent1" />
    <Property Key="Managed.SystemManager.ConstellationName" Type="string" Value="A99B1A" />
    <Property Key="Managed.Node.SystemName" Type="string" Value="A99B1A" />
    <Property Key="Managed.SystemManager.ConstellationVersion" Type="int" Value="-1" />
    <Property Key="Managed.Runtime.Version" Type="string" Value="1.0" />
    <Property Key="Managed.Node.LocalSystemManager" Type="bool" Value="False" />
    <Property Key="Managed.Node.ShutdownOnComponentFailed" Type="bool" Value="True" />
    <Property Key="Managed.Node.ProcessPriorityClass" Type="string" Value="BelowNormal" />
    <Property Key="Managed.Node.DynamicAssemblies" Type="string" Value="Microsoft.Ceres.ContentEngine.AnnotationPrimitives, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Bundles, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Component, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.DataModel.RecordSerializer, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Fields, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.LiveEvaluators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.NlpEvaluators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.NlpOperators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Operators.BuiltIn, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Parsing.Component, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Parsing.Evaluators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Parsing.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Processing, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Processing.BuiltIn, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Properties, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.AliasLookup, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.RecordCache, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.RecordType, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Repository, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Services, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.SubmitterComponent, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Types, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Util, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Processing.Mars, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.DataModel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.DataModel.Types, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Engine, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Engine.WcfTransport, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Operators.BuiltIn, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Operators.Core, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Operators.Parsing, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Processing, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Processing.BuiltIn, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Services, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.DocumentModel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Admin, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.ContentRouter, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Services, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Utils, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Schema.SchemaCatalogProxy, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Query.MarsLookupComponent, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.FastServerMessages, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Schema.SchemaCatalog, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.AnnotationStore, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.Automata, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.Dictionaries, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.DictionaryInterface, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.Ese.Interop, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.RichFields, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.RichTypes, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.StringDistance, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.Transformers, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.IndexTokenizer, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.AnalysisEngine.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchAnalytics.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.UsageAnalytics.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;CustomOperator, Version=1.0.0.0, Culture=neutral, PublicKeyToken=7d300eac1b9f50c2" />
    <Property Key="Managed.Node.SearchServiceApplicationName" Type="string" Value="14087e61-67e2-4245-b23d-0e52c6dcf704" />
    <Property Key="Managed.Node.SystemDisplayName" Type="string" Value="0a1ee46f-59f2-49b7-bfca-bb4d20adaf1a" />
    <Property Key="Managed.Node.BasePort" Type="int" Value="17042" />
    <Property Key="Managed.Node.BasePort.4" Type="int" Value="17046" />
    <Property Key="PortShared" Type="bool" Value="True" />
  </Properties>
</NodeProfile>

 

If you guessed the "Managed.Node.DynamicAssemblies" property…then you are very smart! [:D]  Yep…that is what we are looking for.  Those are the only assemblies that will be loaded into the AppDomain.  Only these assemblies will be interrogated for the DynamicComponent attribute.  Great!  So as you can see, I have added my CustomOperater assembly to the list.  Let's try again and run the script.  Dang it!  NO GO!  It still doesn't like my CustomOperator.CustomOperator operator!   Grrr….so at this point, I'm really wondering if my assembly is getting laoded…after a browse in the ULS logs…I see these:

08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Inactive] to [Configuring and eventSent=False] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiyyo Verbose  Microsoft.Ceres.CoreServices.Management.ManagementServer : Registered agent CustomOperator.CustomEvaluatorBinderSource.ComponentManager of type Microsoft.Ceres.CoreServices.Services.Container.IComponentManagerManagementAgent 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Configuring] to [Configured and eventSent=False] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Configured] to [Resolving and eventSent=True] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Resolving] to [Readying and eventSent=False] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Readying] to [Ready and eventSent=True] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Ready] to [Activating and eventSent=False] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Activating] to [Active and eventSent=True] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywj Medium   ComponentManager(CustomOperator.CustomPlugInSource) : CustomOperator.CustomPlugInSource [Active] started 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiyv9 Verbose  ComponentManager(CustomOperator.CustomPlugInSource) : ***** QUEUESENTINEL finished task for CustomOperator.CustomPlugInSource: CustomOperator.CustomPlugInSource[Active]state Active 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Inactive] to [Configuring and eventSent=False] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Configuring] to [Configured and eventSent=False] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Configured] to [Resolving and eventSent=True] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Resolving] to [Readying and eventSent=False] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Readying] to [Ready and eventSent=True] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Ready] to [Activating and eventSent=False] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Activating] to [Active and eventSent=True] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywj Medium   ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomOperator.CustomEvaluatorBinderSource [Active] started 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiyv9 Verbose  ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : ***** QUEUESENTINEL finished task for CustomOperator.CustomEvaluatorBinderSource: CustomOperator.CustomEvaluatorBinderSource[Active]state Active 

Ok…they ARE being loaded.  So what the hell is going on?  Well…the clue WAS in the logs files.  After running the PowerShell to attempt to add the flow, I noticed something.  It was the name of the component that is actually being used to register a flow.  Its called QueryProcessingComponent1.  Well ok, so it seems that even though the content processing node does all the work, the query processing component manages all the registration of the plugins and operators.  After going back to the node directory, I find the QueryProcessingComponent1 directory and find that it too has a NodeProfile.xml file.  Bingo.  Adding the assembly to the property and restarting the host controller one more time, I again attempt to add a custom flow, with a custom operator.

YYYYYEEEESSSS!!!!  NO ERROR…………..I successfully inserted my flow and operator into the Ceres engine!  Now, what part is missing?  Well, even though the flow is now installed and working, it is not a part of the main flow ("Microsoft.CrawlerFlow").  I would need to insert the flow into that main file and then redeploy it.  The main issue with that, is that not all of the operators are recognized by the system.  Yeah, weird I know.  This is part of the installationdeployment of the search service application and is there by default.  if you ever want to make changes, you would need to add all the possible assemblies to the query processing component and then update the main flow.

In terms of debugging, you can attach to the NodeRunner.exe processes and debug your operator and evaluators.  Easy.

Now for some clean up.  All those bundles of flows at the top of this post.  How did they get there?  Well, what happens is each time you upload a flow, it will generate a new assembly with the flow added to it as a resource.  If you were to reflect on any of the assemblies above, you can get the flow xml out of the assembly.  But this is also easily done using the Windows PowerShell commands above.

I will be posted all the code for this project on code.msdn.microsoft.com.  You can use it as a starting point for implementing your own flows and operators.  But you are probably asking, why would I do something that is not supported.  Well, its the same reason you want to keep your job.  The customer wants high performance and needs to implement way more than the Content Enrichment Service can provide and saying no will stop any chance you have of completing an incredibly cool and awesome project.  Now,  why is this not supported if you CAN do it?  Well, as you can see, it is VERY complex.  Only a few people in the world are going to be able to build these, deploy them and successfully use them.  So you are still asking yourself…why did you post this if we can't really do it…great question!

BECAUSE I WANT IT SUPPORTED.  If we band together and find various use cases for doing this, the product team will have no choice but to train the Level I, II, and III support people on how to troubleshoot these.  As of right now, it is simply unsupported from the fact that the support people don't even know what a flow and operator is when it comes to supporting SharePoint Search (update: met with the search team and the COE support team *is* familiar with flows so were one step closer to support).  It would be my goal to get some ISVs to start playing around with creating custom flows and operators to make SharePoint Search a BEAST (not that is already isn't cuz its the best on the market right now, sorry Google Appliance but you suck big time)!  So…there you have it.  Do with it what you will, have fun, be smart and as always…enjoy!

Chris

Creating Security Descriptors for MOSS 2007 Managed Protocol Handlers

A few months ago I found this great blog by John Kozell at Microsoft.  He showed us how to build custom SharePoint Search extensions that I though was very innovative and included in my SharePoint 2007 Search Customization course.  Here is a link to the code that I modified to use the new .NET 3.0 classes.

http://blogs.msdn.com/johnkoz/archive/2009/03/05/creating-security-descriptors-for-moss-2007-managed-protocol-handlers.aspx

Enjoy!
Chris