Shredded Storage Whitepaper

This is the data and information from the codeplex project on the Shredded Storage testing framework.  This was to be released several months ago, but things just didn't work out the way we all planned.  So I'm posting it here:

Purpose of this whitepaper

Shredded Storage has up to this point been a very elusive
feature of SharePoint 2013.  There have
been several posts in the community that have attempted to tackle this
incredibly advanced topic, and although some have been spot on in most regards,
they were far from complete when describing the intricate dance that occurs
throughout the many layers of SharePoint and Office.  Each of the authors had different goals and
motivations for creating this whitepaper. 
These ranged from simply the curiosity standpoint to the much more
directed and measurable product vendor implications on features like RBS and
Deduplication. The result of our ambitions is this highly technical white paper
and supporting code that uncovers how shredded storage works in various
scenarios and how it can be improved in the future.  We have also provided you with a guide on how
you can achieve the same conclusions as we have here.

Introduction to Shredded Storage

Shredded Storage is a new data platform improvement in
SharePoint 2013 related to the management of large binary objects.  Shredded Storage is designed to accomplish
three tasks; reduce storage, optimize bandwidth and optimize file I/O.  The idea behind Shredded Storage is if
subsequent writes back to the database are the same, then there is no need to save
those parts again and thus you save on storage, network and disk I/O.  This particular feature was originally
designed to manage heavy write scenarios such as co-authoring. 

All of these are accomplished by simply breaking apart the
blobs into smaller pieces.  It really is
as simple as that, but the details are much more complex. Think about if you
were to manually take a piece of paper and rip it into a bunch of pieces.  If the piece of paper has an image on it,
then it would make sense that you should be able to easily piece it back
together.  The difference here, is your
brain is a pretty amazing machine and can recognize the pieces.  Let’s take a blank piece of paper.  How easy do you think it would be to put it
back together if you randomly rip it apart? 
Actually, it would be quite easy as the pieces will fit nicely.  Now say you put the blank piece of paper into
a shredder that shreds them all into the same size.  How easy do you think it will be to put it
back together?  Not very easy.  You would need something to guide you in
putting it back together.  Maybe some
ultraviolet markers?  That’s where the
implementation of shredded storage comes into play.

Shredded Storage is much more than simply a set of stored
procedures in a SQL Server database.  It
is actually a tiered solution made up of three layers.  The SharePoint layer (which includes the
basic object model and other services like CelllStorage.svc and OneNote.svc
that support Office Clients and Office Web Apps), the Cobalt layer, and the
database backend.  The combination of
these three parts are what define Shredded Storage.  However, as you will learn, the Windows
Operating System has a MAJOR role to play in everything.

This white paper will dive very deeply into all three layers
and how they work with each other. 

SharePoint Layer (Object Model)

When it comes to download and upload of files, SharePoint
will utilize the SharePoint Object Model to make calls to the SPFile methods
(SaveBinary and OpenBinary).  This method
in turn calls the SPRequest class methods. 
Inside this method is where the magic happens.  This method, which is not managed code, but
in the COM portion of the SharePoint application will make a call to generate
an SPFileStreamStore based on the SPFile information.  This SPFileStreamStore is managed by the
SPFIleStreamManager class.  The
SPFileStreamManager is the key class to managing the Cobalt layer on the
SharePoint side.  You will also see some
other Microsoft.SharePoint.Cobalt* classes that help with the interaction.

Writing Files

When it comes to writing a file to the SharePoint database,
the SPFileStreamManager will execute its ComputeStreamsToWrite method.  This method is in charge of creating a
SPFIleStreamStore object which is then used in creating a
CobaltStreamRevokableContainer object. 
This object will then contain a CobaltFilePartition which in turns
contains a schema. , getting back a Schema from the SPHostBlobStore .

Reading Files

When it comes to reading a file from the database, again the
SPFileStreamStore is the main class that contains these methods.  The most important methods are the
GetBlobsById and GetBlobsAfterBsn methods. 
These two methods are where blobs are retrieved from the database.  A set of stream ids are passed to the
proc_GetStreamsById stored procedure and the first chunk of data is returned.  A data reader is opened and each blob is
converted to a SPCoordinatedStreamBuffer. 
So where does the FileReadChunkSize come into play?  It is actually used as an input parameter to
the GetStreamsById stored procedure to tell SQL Server to try to break up the
file content into smaller parts (but it will only be called once and only get
for each requested shred no more than the FileReadChunkSize.  After the first return of the data, if there
is more to be returned, subsequent calls are made using the GetStream stored
procedure and passing in the offset for the content, but yet again, only
returning up to the FileReadChunkSize. 
These parts are then aggregated by the SPCoordinatedStreamBuffer and
once all parts of the shreds have been retrieved they are put back together
into the actual shreds for consumption by the Cobalt layer.

These fully pieced together shreds are made into a Cobalt
HostBlobStore.Blob and a list of these are returned from the method.   The data of the shred is converted to a
Cobalt Atom.

The net result is the value of the FileReadChunkSize does
have an effect on the number and size of the result sets that are returned back
to the web front end.  If this setting is
several multiples smaller than the FileWriteChunkSize, you will see an almost
similar multiple in the time it takes to download the file.

SharePoint Layer (CellStorage.svc)

A separate but related feature to Shredded Storage is the
ability for Office Clients to do delta updates when working with Office XML
based documents that reside in SharePoint. 
As part of the process, Office clients will attempt to make a call to
the CellStorage WCF service.   If the
service is available the WFE will only send the requested pieces of a file to
the client.  On a save operation, only
delta changes to a file are sent back to the WFE.  Note that when this service is not available
or an error occurs, the Office client will fall back to the normal HTTP PUT
mechanisms as if it were a regular file share and the entire file will be sent
on each save. When this delta update and Shredded Storage is being utilized,
you see both an optimization from the client to the WFE and the WFE to SQL
server.   

Delta updates only support Office XML documents. The reason
they are only Office XML is that deep inside CellStorage.svc it makes a call to
create a pointer to an XmlReader object. 
Obviously, the older files are not based on this file format and an
attempt to read them is futile, therefore, you won't have any calls to
cellstorage.svc, but simply the regular HTTP PUT calls.  The CellStorage protocol (which is part of
the Cobalt assemblies) supports various different command types:

·        
GetDocMetaInfo

·        
WhoAmI

·        
ServerTime

·        
Cell (get and set)

·        
Coauth

·        
SchemaLock

·        
ReleaseLock

There is a standard process to this:

·        
First step is to send a request for the Document
MetaInfo

·        
Second step is to actually get the parts of the
document that are being viewededited at that moment (cell get)

·        
Third step is to request to start editing the
document (requesting a schema lock)

·        
Fourth step is send back any changes that a
person makes (cell set)

·        
Last step is to tell the server you are done
(release the schema lock)

During this entire process, the client will ping the
sharedaccess.asmx web service to ensure that it is the only one editing the
document.  This is done about every 20
seconds.  As part of the request, it is
looking for the ETag to change.  If it
has changed, that means someone updated the document and the version you have
is now old and you will need to refresh your copy, or overwrite what they
did.  This scenario should never happen,
but it looks like they attempt to check that it somehow does.

CellStorage.svc requests and responses are using the older
XML format as part of the messages.  This
is a costly approach because XML is a bloated format as compared to the newer
JSON format which is used in Office Web Apps communications.  We hope one day this protocol will change to
support the newer format and increase the performance even further.

SharePoint 2013’s has several WCF Services aside from the
CellStorage service.  Many of these
service reside in the ISAPI directory of the SharePoint Root (C:Program
FilesCommon FilesMicrosoft SharedWeb Server Extensions15ISAPI).  If you open the web.config file of the ISAPI
directory, you will find that all the WCF service bindings are configured with
a maxBufferSize of “4194304”.  

This will limit the size of the files from which the Office
Clients will be able to download the file shreds.  As you will see later, the performance of the
IO decreases dramatically when you set a FileWriteChunkSize higher than “4MB”.

Office Web Apps 2013 – (Onenote.ashx)

Office Web Apps has an entirely different set of multi-user
editing features.  These features are
using the end point OneNote.ashx.  This
API works hand and hand with Cobalt to “lock” sections of a document when
multiple users are using it.  This multi-user
and locking operation will generate “partitions” in the shredded storage
database store.  Partitions are sets of
shreds in the content database that correspond to what each user is changing in
a file for their editing session. 

When using regular Object Model calls, these shreds are
normally broken apart by the boundaries defined by the Cobalt schema of the
file being edited.  When using multi user
editing, an even granular break down of the shreds is performed (down to the
separate xml files in the Office XML document) and mapped to the internal xml
components being edited.  When the
editing sessions are closed, the partitions are collapsed back to their single
partition.  Having an editing session
“end”, is a very important part to the process. 
If a user doesn’t close their session, the partitions will be left in
the database until the service cleans them up. 
In our code testing, we found it possible to lock yourself out of
editing, when you had another session open. 
From a coding standpoint, this is the correct way to do this.  The default timeout for an Office Web Apps
session is 5 minutes.[A8] 

The process of starting an editing session works like this:

·        
User clicks on an Office XML document from a
library

·        
A sharepoint page called WOPIiframe.aspx is
called.  This page sets up the necessary
access tokens that OWA will use for communicating to SharePoint

·        
At this point the IFrame will redirect you to
the OWA server and initialize the session via the WordViewerFrame.aspx page and
the docdatahandler.ashx http handler

·        
A JSON object that fully describes the document
will be sent to the browser.  This JSON
notation is a very complex set of JSON objects that map to the XML elements
inside the Office XML document

·        
A user starts an editing session with OWA, which
initiates the first call to OneNote.ashx. 
From this point on, all operations are done through OneNote.ashx.

·        
Every few seconds the browser will ping
OneNote.ashx to request any changes. 
This JSON format and protocol could be the discussion of a whitepaper on
its own

As the above conversation between the client browser and
Office Web Apps is performed, the Office Web Apps server is talking to
CellStorage.svc through the WordCompanionServer class. This class will
reference the HostEnvironment class for information on where the file is
located and what request adapters are implemented to proxy the calls between
OWA and the target.  The two most
important adapters are the ICellStorageAdapter and the ICoauthAdapter.  The ICellStorageAdapter is an interface that
is implemented by the CellStorage.svc WCF service mentioned above.   

Now that you have some background of the process and
interaction of these components, it’s time to look at the Cobalt layer which is
where a majority of the code for our CellStorage.svc resides.

Cobalt

The shredding part of the process is implemented by a much
improved Cobalt API (Microsoft.Cobalt). 
Cobalt is the layer responsible for implementing a break-up schema that
dissects the files into smaller parts, commonly called shreds in
SharePoint.  Cobalt is designed to be
utilized with any type of file, not just Office file types.  The Cobalt assembly also contains the code
that handles all the FSHTTP requests discussed earlier.  Cobalt operations include:

·         TBD

Cobalt supports several different schemas that define how a
file is broken apart.  Cobalt will accept
a file, determine what type of file it is (similar with how search determine a
files format) and will then apply an algorithm based on this file type.  This means that the type of file
matters.  You simply cannot upload an
Office XML file and expect to get the same results as compared to a simple
image file of the same size.  Not only
does the algorithm change based on file type, it also changes based on the size
of the file.  Table 1 shows the paths
that are taken and the algorithm that is applies when that path is picked.

TABLE 1:

TBD

     
     
     

 

Each algorithm will break apart the blobs in a different
way.  Therefore the same sized file that
is a Word Office XML document will break apart differently than an older Word
document that is simply binary.  You may
also find that images (blobs within blobs in the case of Office XML Documents)
will be treated differently and may get their own shred.

As the Cobalt breaks apart the blob, there is the possibly
of two extra shreds added to the set of shreds that store configuration
information about the other shreds. 
These configuration shreds help the web front end determine if the
shreds have changed and new shreds need to be committed to the database.

 

Database

SPFarm Configuration

In SharePoint 2013, the out of box default for the maximum
size of these parts is no larger than 64K, however this can be configured to be
larger.  This FileWriteChuckSize setting
is a major focus of this whitepaper.

It is again important to note that the size of the shreds
will not equal the file size divided by the write chunk size.  It is simply a watermark that is used as a
guide for the Cobalt algorithm involved.

These configuration shreds are used by the WFE to determine
what shreds need to be saved back to the content database.  This keeps the WFE from having to download
the entire set of shreds to do a comparison. 
These configuration shreds will add a small amount to each blob added to
the database.  Note however, there are
some circumstances where there is only one shred created that includes both
configuration and data

This break-up of the blob into shreds doesn’t achieve any
storage optimization until you turn on SharePoint versioning on a document
library.  When versioning has been
enabled, each time a user saves a file, the SharePoint WFE will break apart the
blob into segments using the same method that it used before.  Then each of these shreds will be compared
with the shreds already in the database. 
If they did not change, then there is no need to send the shred to the
database for persistence.  This database
level de-duping operation has been proven to achieve 30-40% storage savings.  This is where you gain in network
optimization between the WFE and the SQL Server.  You also gain optimization from the
standpoint of file IO.  Now you are only
sending and saving a small part of the file rather than the entire file.  This is also important in the fact that if
you had done a similar operation in SharePoint 2010, but only updated the
metadata for a file, you would end up with another version of the file in the
database.  With Shredded Storage, this is
not the case and your storage is optimized greatly.  Lastly, when you have implemented a log
shipping disaster recovery strategy, you will notice smaller writes produce
smaller transaction log sizes and drive a more efficient log shipping experience.

Content Database – Overview

Once Cobalt and SharePoint have determined the Cobalt
HostBlobStore.Blob(s) that need to be saved, each one is passed to the content
database via the .

Shredded Storage optimization is document focused.  This requires a quick description of how
SharePoint saves files.  When you upload
a document to a document library or add an item with an attachment, it creates
a new list item with a unique id assigned to it. It will also get a unique
document id created for the file attached to the item when not a document library.  The two of these are tied to each other and
both are unique guids.  When you upload
the same file to a second list, a new list item and document id will be created
that have no relation to the first set. When it comes to Shredded Storage, it
is this document id that is the important part. 
So in summary, if you upload the same document in two different document
libraries, you will gain no Shredded Storage benefits. 

Content Database – Tables

There are four tables in the database that manage the shredded
storage blobs.  This includes:

·        
dbo.DocStreams

·        
dbo.DocsToStreams

·        
dbo.AllDocs

·        
dbo.AllDocVersions

In SharePoint 2010 dbo.AllDocStreams stored the document
stream and related data for documents with content streams, in SharePoint
Server 2013 dbo.DocStreams replaces dbo.AllDocStreams where each row stores a
portion of the BLOB.

The improved protocols associated with Shredded Storage
identify the rows (in the new DocStreams table) necessary to be updated to
support the change and updates the BLOB associated with that change in the
corresponding row. Several new columns are present in the DocStreams table that
represent a shredded BLOB including:

·        
BSN: The BSN of the stream binary piece.

·        
Data: Contains a subset of the binary data of
the stream binary piece unless the stream binary piece is stored in Remote BLOB
Storage.

·        
Offset: The offset into the stream binary piece
where this subset data belongs.

·        
Length: The size, in bytes, of this subset data
of the stream binary piece.

·        
RbsId: If this stream binary piece is stored in
remote BLOB storage, this value MUST contain the remote BLOB storage identifier
of a subset of the binary data of the stream binary piece. Otherwise it MUST be
NULL.

A new DocToStreams table contains a pointer to a
corresponding row in dbo.DocStreams.  The
BLOB Sequence Number (BSN) is used to manage the BLOB sequence across
dbo.AllDocVersions, dbo.DocsToStreams, and dbo.DocStreams.  NextBSN is used to manage the last BSN for
each BLOB.

dbo.AllDocs contains a single row per file similar to
SharePoint Server 2010.

dbo.AllDocVersions contains one or more rows per file and
one row per file version.

Content Database – Stored Procedures

The main stored procedures in the process include:

·         TBD

Read and Write Scenarios

Client to WFE (Read)

TBD

WFE to Database (Read)

The following table outlines some of the generic results
from our testing when utilizing the FileReadChunkSize as compared to the size
of the file (and not compared to the size of the FileWriteChunkSize):

FileReadChunksize (% of file size)

Performance Hit

>12.5%

Typical Normal read operation

6% < x > 12.5%

10% hit on read operation

3% < x > 6%

20% hit on read operation

x < 3%

50% hit on read operation

These values would point to the fact that the default
setting anticipates your average file size will be under 512KB.  It should be noted that when you install
SharePoint 2013, all files that accompany the software are under this limit.

Client to WFE (Write[A9] )

These calls can be measured using Fiddler tracing.  You can simulate an Office 2010 client by
disabling Shredded Storage and attempting to download a file from SharePoint.

WFE to Database (Write)

TBD…

Remote Blob Storage

Two quick facts; Shredded Storage does not require RBS and
RBS does not require Shredded Storage. However, you should also be aware that
when Shredded Storage is used with RBS, there are side effects to having both
Shredded Storage and RBS enabled.  RBS’s
real value is realized only when you are working with larger files. When
combined with Shredded Storage’s default max file size of 64K, the actual
implementation of RBS can have a negative impact on the performance of
retrieving a complete blob from the storage subsystem.  However, some RBS implementations presents
capabilities that Shredded Storage does not.  One such feature is the ability to actually do
the de-duplication of the shreds. 
De-duplication will monitor for when the same blob is being saved and
then creates a pointer to a single instance of the blob.  This solves the problem of uploading the same
document to multiple libraries and having it use just as much space in the disk
subsystem.  As previously mentioned, this
is something that Shredded Storage does not do in this current implementation
and depending on your RBS provider, can do.

 

Deduplication

In testing of this feature when enabled with shredded
storage, it has been found that the de-dupping feature actually causes a
performance hit when writing files with target size of 64K and loses a majority
of its efficiency when the shredded storage watermark is maxed.  Proper testing should be done to ensure that
any changes you make to the Shredded Storage setting or features you have
implemented on your disk subsystem are compatible with your performance requirements.

 

Simple Shredded Storage Facts

Shredded Storage can provide other network optimizations
other than WFE to SQL when configured with the proper set of tools.  Thus far, you have been presented with the
utilization of Cobalt on the WFE to SQL Server side of the wire.  Originally, Cobalt was designed with
multi-user editing in mind.  This goes
hand and hand with SharePoint perfectly as it is a collaboration platform.  This makes sense as a common pattern is allowing
users to be able to collaborate on documents at the same time, whether that
collaboration feature is in a viewing or editing capacity.

Disabling Shredded Storage

There are properties that allow you to attempt to turn off
shredded storage.  This attempt can be
accomplished by setting the FileWriteChuckSize so that the shreds are incredibly
large, up to 2GB, and force the likeliness that only one shred will be created.
Be warned that changing the shredded storage setting over 4MB will incur a
large I/O hit for larger files. It is not recommended that you set this value
above 4MB.

It should be noted that you CAN disable shredded
storage.  This can be accomplished NOT by
changing the FileWriteChunkSize, but the *FILTERED OUT*

This operation is not something you should proceed with
unless you are doing it in a test farm. 
If you have installed SharePoint 2013 and let users upload information
and files, then subsequently disable Shredded Storage, NONE of the files will be modifiable again unless you delete the file
and then add it back OR re-enable shredded storage. 

Shredded Storage Testing Framework

As part of this whitepaper we have published a complete
Shredded Storage testing framework (including the source code) that will allow
you to build your own tests and to validate our tests to confirm our
results.  The following section is
designed to walk you through how to use the tool and analyze the results.

Installing the Tool

Follow these steps for installing the tool:

·        
Download the tool from Http://shreddedstorage.codeplex.com

·        
Run the db script or restore the test database
to SQL Server

·        
Install SQL Server 2008 R2 or later with SQL
Profiler tools

·        
Install the Office SDK 2.0 or higher

·        
Install Fiddler

·        
Copy the SPProfiler.exe, SPProfiler.exe.config
and MonitorSharePoint.tdf to the C: empsystem32 directory

How the tool works

First and foremost. 
The tool IS NOT MULTITHREADED.  Please take this into consideration when
modifying the code as it is not possible to modify it to support multi-threaded
calls with the various logging components used.

Fiddler integration
– You can wrap any call in the StartProxy Method to record any http requests
and responses to see what kind of traffic is being generated.  These tests can help to analyze what the
different is between older non-cobalt calls and cobalt calls.

SQL Profiler
integration
– You can start the SQL Profiler tool at any time by calling
StartTrace().

Time Monitoring – You
can record the time it takes for any method calls by calling StartTimer().

TestResult class – All
results should be saved to a TestResult object. 
Once the object is populated, you can simply save it to the database
with the SaveTestResult() method.

Generating content
– There are several methods that support the dynamic generation of Word and
PowerPoint files.  You can use these
methods to create a directory of a specific size, then make the call to
UploadDirectory (described below) to test the upload of the files

Reading Results

The following SQL queries provided with the tool are
provided to help you analyze your data:

·         TBD

Performance Tests

The most basic
tests would revolve around uploading and download files.  This can be easily tested by creating sets of
files of varying sizes and uploading and downloading them several times.  This is but one piece of the testing that is
needed for shredded storage. 

The following table
outlines some of the methods you can call to run various tests in your
environment.  You can build your own
calls using PowerShell or modifying the tools code

 

Method
Name

Shredded
Storage On?

What
does it test?

What
should you see?

RunDynamicUploadTest

Yes

This
test uploads a file starting at a target file size and then halves the
FileWriteChunkSize as many times as you tell it.

The
shred sizes will increase as the FileWriteChunkSize decreases.

RunDynamicDownloadTest

Yes

This
tests downloads a file starting at a target FileReadChunkSize and will half
the value as many times as you tell it

The
number of read stored procedures will increase and the time it takes for the
file to download will take longer.

RunImageUploadTest

Yes

This
test will generate a word document with images and then upload them.

You
will see that the entire file is shredded and the image is not separated out
into its own shred

RunOfficeDownload

YesNO

This
test will use the office client to open a file stored in SharePoint.

You
should see that the file size is almost exactly the same size as the file
itself in the HTTP calls.

UploadDirectory

YesNo

This
test method will upload a local directory to the sharepoint server.

You
can use this to test a batch of file uploads for comparison between SP2010 vs
SP2013.  Also a good way to test
various size files with FileWriteChunkSize settings

RunOWATest

Yes

This
method tests the HTTP calls between client and OWA server, you also see the
SQL IO on the backend.

This
method is not fully implemented as the OneNote.ashx protocol is VERY
difficult to reverse engineer.

RunShreddedStorageDisableTest

Both

This
test demonstrated that it is very bad to disable shredded storage using the
ServerDebugFlags. 

You
will see that any subsequent uploads of files will break after disabling
shredded storage.

AnalyzeFarm

N/A

This
will analyze all the files in your Farm and give you stats on the files
contained in all your content databases.

You
will need these stats to pick the proper setting for your FileReadChunkSize.

TestOpenLargeFileFromOffice

Yes

This
test is for analyzing the effect WCF 4MB limit has on the Cobalt layer in
SharePoint.

You
will see that the SQL IO has a HUGE hit when the FileWriteSChunkSize is set
above 4MB.  However, for smaller files
the actual amount of time it take for the client to render is not affected.

RunMaxWriteSettingTest[A10] 

Yes

This
test will set the FileWriteChunkSize to its maxiumum setting of 2GB and then
compares how the system runs between the default setting of 64KB.

You
should see that all the files are single sets of shreds versus multiple shreds
with the lower value.

TBD
– modify the WCF Settings and run large file test

 

 

 

TBD
– DeDub enable, monitor file size

 

 

 

TBD
– Enabled RBS, monitor upload speeds

 

 

 

TBD
– Enable RBS, monitor download speeds

 

 

 

RunManualTest

YesNo

This
test wills start recording any of the actions you perform and then save to
the database when you are finished

This
will assist with any custom testing you want to do without writing any code.

RunModifiedPropertyVersionTest

Yes

This
test will look at when you simply change a property like “Title” and don’t
change the file bytes.

You
should see the entire file gets written back when a single shred exists. This
is UNEXPECTED behavior.

RunModifiedVersionTest

Yes

This
test will generate a word file, upload the file, then modify it by adding
text to the end and upload it again.

This is the best case scenario for
shredded storage.  You should see that
the count and total size of the shreds don’t change drastically as in the
worst case scenario test.
[A11] 

RunChangingVersionTest

Yes

This
test will generate a file, upload the file, then generate a completely new
file and upload it with the same name.

This
is a worst case scenario test.  You
should see the total number of shreds increase in the database with each
version uploaded.  You will also see
the overhead that shredded storage adds to the base file.  By the time you get to version 10, you
should have 10x the number of shreds and size.

 

Comparison to SharePoint 2010

TBD

Death of an MCM

As many of you are well aware, Microsoft Learning (which has nothing to do with the product team or the larger corp Microsoft components so when you tweet about this, tweet @MSLearning instead of @microsoft) has decided to end the Masters programs (MCM, MCA, etc).  Several posts to explain what's going on:

  • http://www.theregister.co.uk/2013/08/31/microsoft_cans_three_pinnacle_certifications_sparking_user_fury/
  • https://connect.microsoft.com/SQLServer/feedback/details/799431/please-dont-get-rid-of-the-mcm-and-mca-programs
  • http://www.zdnet.com/microsoft-abruptly-pulls-masters-certification-hints-a-replacement-may-come-7000020093/

As a top selling author for the Microsoft Learning (MSL) course ware library and a long time MCT, I have always been privy to several details about the underpinnings of MSL.  I remember the day that they approached me about the MCM program (well before it was made public).  They asked me if a "Masters" program would interest me.  I told them no, as I had already built many many days of training for every nook and cranny of SharePoint and I didn't see any value to paying a bunch of money to learn stuff I already knew.  I also had several large project under my belt and my name was already well know, so again, for myself, there was no value add.  However, that being said, there were many people in the world that wanted to be known for their experience and get some kind of recognition for the knowledge and depth they had in SharePoint, yet they were never ones to use twitter, facebook or any other means (like speaking at conferences) to let the world know.  These were also individuals that work on top secret military projects that can't even talk about the work they do even if they wanted too.  These people were also those that would never contribute on the forums or in any other community standpoint, so MVP was also not an option for them.  And hence, the market for the MCM *was* and *is* there.

The program started to much fan fare. MSL put a lot of time and money into the program to make sure the content was top of the line (although it did have mistakes in it every now and then).  That time and money is really what I want to blog about.  For those of you with your MBA, running a business is relatively tough for most.  The final equation is P = R-C  (Profit = Revenue – Costs).  When you look at the cost of building a regular Microsoft Official Curriculum (MOC) course, the cost is well into the $1M range.  Why?  Because you have to pay for the content (powerpoints, student and instructor manual, labs,  VM images)  to be created by someone.  Not only that, but you have to pay for someone to tech edit it, check the grammar, translate it to the top 5 languages, then transform it into the final output(s), manage the program deployment to partners and support it.  This is for a 5-day course that typically is pretty average in terms of content and as most MCTs will tell you, not very exciting (although I have tried hard over the past few years to get the quality up on the MOC side…just ask Dan Holme and Chris Beckett).  So what does a typical average MOC course bring in?  Well, considering it is run all over the world in 1000s of centers every week at $40+daystudent. I think you can do the math.  Over the lifetime of a MOC course, it will bring in several millions.

So being that a simple 5-day MOC course cost $1M to create and brings in several multiples above that in revenue.  How much do you think a Masters course takes to create and what revenue it generates?  I don't know the exact costs, but I'm sure it is several multiples above a basic MOC course.  Why does it cost so much and can you even measure the costs?  Let's try:

  • Content creation – most of the content must be pulled from product team members.  This is stuff that doesn't have MSDN documentation for and takes the time and resources of internal staff to pull together.  Alot of it is built before MSDN documentation is even available.  Let's say this is 1 person module with a month to build a module.  Billing at $200/hr, 160 hrs is $480K, but then think of the time and effort of the product team to support the content generation, immeasurable in terms of productivity loss and $$$.
  • Instructors – yeah, someone has to pay for Spence, et all to fly over the pond all the time to teach those classes.  As you can imagine, flying in experts to teach the classes costs flight, hotel, their regular rate (remember, these are experts and their billable rate is well over $200/hr).  Let's say each person is $3000 flight, $1500 hotel, $10000 for the week ($14,500). like at least 5 instructors each rotation.
  • Program management – the hidden costs of actually managing the program.  The MSL staff that registers you, keeps track of your progress, assigns you the cert, etc, etc.  2 people to manage?  Each getting paid $100k/yr?  Then the legal aspects of the program, what does that cost?
  • Lab creation and management – the labs have to be built, hosted, managed…that also costs money
  • Translation – zero…they aren't translated

 Let's look at the cost (only the money you pay to MSL) for you to take the course(s):

  • $16,000 ($8000 for MCTs) 

Hmm, how many people take the program each rotation…up to 20.  Rotationsyear…3-4?  How many multiples lower in revenue is that as compared to a MOC course? Do the enrollments even pay for the program???  I'm thinking maybe break even after the first 2-3 rotations.  So then, what would the cost need to be to pay for the program or make it as profitable as the MOC courses?  I'm guessing at least twice as much.  So if they up the cost to $32K to keep the program alive…are you going to take it?  What else could you do with $32K?  Uhh, maybe get your MBA so you can analyze a program that is losing money or its profit to effort ratio is lower than everything else you do?  Yeah, I'm thinking you'll go for the MBA or that CS mastersphd degree over taking a program that you could easily do without and just work hard and study without being spoon fed it. 

So ok, you increase the price.  Now what?  Scale.  There are very few people that can teach the curriculum and answer the tough questions that are asked in the presentations.   I know my blog posts have come up several times in MCM rotations about lots of topics and the answers require tact and research and a strong connection to the product team (props to the people that teach it right now, every one that teaches it are people that I respect their opinion).  So to scale means we need to ensure that people can teach the module and answer these questions.  One answer, let's regurgitate the MCMs and get them to teach…ok…but the main reasons they took it was to do projects and make lots of money (or their employer sent them to get the cert so the employer would make lots of money).  MCM teaching doesn't pay that well, it is just an instructor gig and not as cool as doing real world top-secret projects. And teaching, is oh so very different than consulting.  There are certifications for teaching as it is tough to do, and then throw in people that are taking a program that want to be know it alls…yeah…that is hard.  So, little to no scale.  Hmm…as a business owner and executive, I'm saying "cut it". 

Now multiple all that effort times all the programs they have an MCM for…yikes!  That is a massive set of programs, I can, and other people that own their own businesses, can see how a not so profitabl
e adventure can be reallocated based on budget and resources to more profitable ones.

Now…that all being said.   The best conversations I have ever had have been with MCMs (Miguel, Chris, Shannon, etc).  I'm highly technical.  My pet peeve is people that think they know what they are talking about and have no clue.  I can point out several, but let's skip that.  You have no idea how awesome it is to be able to sit down with someone, explain to them what you just went through and them able to understand every aspect and appreciate it and save it for later.  And similar, for them to be able to tell me stories about this and that, that are just highly technical and worth my time.  I will miss that part of having the technical filter of MCM certification to keep me from having a convo with someone that won't even know what I'm talking about.  And no, MVP is an award not a certification, it is not a validation of your technical prowlness to be able to solve any problem in SharePoint or be able to stand in front of a customer and defend your architecture.  The MCM is a validation that you have been beaten up, scratch, kicked, thrown in the lions pit, and emerged a king (or queen) ready for battle in the field against anything thrown at you. 

In summary, although the lower levels in MSL love and are committed to the program, when you get to the higher levels like director, VP, Sr. VP.  It is always about money and budget.  If you don't get the spin of my blog's title, then you probably haven't read Death of a Salesman, here are some of the better quotes about the illusion of the MCM program in terms of business and success:

http://www.shmoop.com/death-of-a-salesman/dreams-hopes-plans-quotes.html

So, rest in peace MCM, you were loved by many, hated by few. May you be resurrected again in another life.

With sadness,
Chris

Office Web Apps 2013 OneNote Site Notebook 500 error

I ran into this the other day.  I thought possibly something was broken on the OWA servers, but after reviewing, I didn't find anything wrong with OWA.

Amazingly, what I found was that the site url had been changed and the link to the site notebook stays the same!  Attempting to click on the link will cause OWA to attempt to open the site notebook, but it will fail miserably.  You will need to update your navigation node to point to your new site url.

Chris

Programmatically working with Managed Navigation

There are some pretty sweet posts on how to do this here:

This biggest piece of it is that everything is done via the property bag of a term set and terms.  You just need to know what the property names are, which include:

  • _Sys_Nav_SimpleLinkUrl
  • _Sys_Nav_TargetUrl
  • _Sys_Nav_IsNavigationTermSet
  • _Sys_Facet_IsFactedTermSet
  • _Sys_Nav_AttachedWeb_SiteId
  • _Sys_Nav_AttachedWeb_WebId
  • _Sys_Nav_AttachedWeb_OriginalUrl

Chris

Programmatically working with Device Channels

There are several questions about how to create and assign device channels programmatically.  I just attempted to do the same and found the following:

  • You can create a Device Channel using code and Windows PowerShell as they are stored in a list
  • Creating a master page to device channel mapping is not technically available via object model or Windows PowerShell (at least via the APIs that they have provided you)

To create a device channel, you can do this:

using (SPSite site = new SPSite("http://intranet.contoso.com"))
            using (SPWeb web = site.RootWeb)
            {
                SPList list = web.Lists.TryGetList("Device Channels");

                SPListItem li = list.AddItem();
                li["Name"] = "Windows Phone";
                li["ContentType"] = "Device Channel";
                li["Active"] = true;
                //alias can contain no spaces
                li["Alias"] = "WindowsPhone";
                li["Description"] = "The windows phone mobile channel";
                li["Device Inclusion Rules"] = "Windows Phone";
                li.Update();               
            }

If you look at how the master page settings page is laid out, it shows you all the device channels and any master pages that are tied to them.  When you look at the code, you will find that the settings are converted into a MasterPageMappingsFile object (in the Microsoft.SharePoint.Publishing.Mobile namespace). It inherits from a base class called MappingsFile<T>, both of which are marked as internal and thus you cannot use them.  When you review how it builds the list of mappings, it does so using a file called __DeviceChannelMappings.aspx that is stored in the "/_catalogs/masterpage/__DeviceChannelMappings.aspx".  It looks like this:

<%@ Reference VirtualPath="~CustomMasterUrlForMapping0" %><%@ Reference VirtualPath="~CustomMasterUrlForMapping1" %><%@ Page Language="C#" Inherits="Microsoft.SharePoint.Publishing.Internal.WebControls.MappingsFileBasePage" %><html xmlns:mso="urn:schemas-microsoft-com:office:office" xmlns:msdt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"><%@ Register Tagprefix="SharePoint" Namespace="Microsoft.SharePoint.WebControls" Assembly="Microsoft.SharePoint, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c" %>
<head>
<meta name="WebPartPageExpansion" content="full" />
<!–[if gte mso 9]>
<SharePoint:CTFieldRefs runat=server Prefix="mso:" FieldList="FileLeafRef"><xml>
 
<mso:CustomDocumentProperties>
<mso:ContentTypeId msdt:dt="string">0x010100FDA260FD09A244B183A666F2AE2475A6</mso:ContentTypeId>
</mso:CustomDocumentProperties>
</xml></SharePoint:CTFieldRefs><![endif]–>
</head><body><mappings>
  <mapping>
    <channelAlias>WindowsPhone</channelAlias>
    <masterUrl href="/_catalogs/masterpage/windowsphone.intranet.master" token="~sitecollection/_catalogs/masterpage/windowsphone.intranet.master" />
  </mapping>
  <defaultChannelMapping>
    <siteMasterUrl token="~sitecollection/_catalogs/masterpage/seattle.master" href="/_catalogs/masterpage/seattle.master" />
    <systemMasterUrl token="~sitecollection/_catalogs/masterpage/seattle.master" href="/_catalogs/masterpage/seattle.master" />
    <alternateCssUrl token="" href="" />
    <themedCssFolderUrl token="" href="" isthemeshared="false" />
  </defaultChannelMapping>
</mappings></body></html>

Now that you know where the values are stored, you can programmatically modify the file using XML tools by downloading the files, changing it and then uploading it.  It should be noted that the file format may change in the future and its most likely why they have locked it down from an object model code standpoint.

Enjoy!
Chris

Extending the Ceres Engine with custom flows and operators

So what the heck does that title mean?  Well, for those of you that are not familiar with Search (which is a majority of you out there).  The actual engine is called "Ceres".  As in the dwarf planet in our solar system (Wikipedia).  Keeping with universe terms, there is also a constellation of nodes in the search engine that make up the universe of bodies in the engine.  If you take a minute, you will find several references to Constellation in the various classes inside the assemblies, but enough about the universe, what about extending the Ceres engine?

When it comes to search, many of you are already familiar with the various nodes types that make up the system.  This includes:

  • Admin
  • Content Processing
  • Query
  • Indexing
  • Analytics

But that's the easy part. and so are the architecture design aspects!  This post will take you into a rabbit hole that you may never come out of!  For the purpose of this post, we are interested in the Content Processing component AND the Query component.  If you dive into the core of the Content Processing component you will find that it is made up of a series of flows.  You can find the registered flows in the "C:Program FilesMicrosoft Office Servers15.0SearchResourcesBundles" directory, I will describe what these dlls are and how they get generated later in the post.  Here is the full list (in the future I will update this post with what each of these flows purpose is):

  • Microsoft.ContentAlignmentFlow
  • Microsoft.CustomDictionaryDeployment
  • Microsoft.ThesaurusDeployment
  • Microsoft.CXDDeploymentCaseInSensitive
  • Microsoft.CXDDeploymentCaseSensitive
  • Microsoft.PeopleAnalyticsOutputFlow
  • Microsoft.PeopleAnalyticsFeederFlow
  • Microsoft.ProductivitySearchFlow
  • Microsoft.SearchAnalyticsFeederFlow
  • Microsoft.SearchAnalyticsInputFlow
  • Microsoft.SearchAnalyticsOutputFlow
  • Microsoft.SearchAuthorityInputFlow
  • Microsoft.SearchClicksAnalysisInputFlow
  • Microsoft.SearchDemotedInputFlow
  • Microsoft.SearchReportsAnalysisInputFlow
  • Microsoft.UsageAnalyticsFeederFlow
  • Microsoft.UsageAnalyticsReportingAPIDumperFlow
  • Microsoft.UsageAnalyticsUpdateFlow
  • Microsoft.CrawlerFlow
  • Microsoft.CrawlerAcronymExtractionSubFlow
  • Microsoft.CrawlerAlertsDataGenerationSubFlow
  • Microsoft.CrawlerAliasNormalizationSubFlow
  • Microsoft.CrawlerComputeFileTypeSubFlow
  • Microsoft.CrawlerCCAMetadataGenerationSubFlow
  • Microsoft.CrawlerContentEnrichmentSubFlow
  • Microsoft.CrawlerDefinitionClassificationSubFlow
  • Microsoft.CrawlerDocumentSignatureGenerationSubFlow
  • Microsoft.CrawlerDocumentSummaryGenerationSubFlow
  • Microsoft.CrawlerHowToClassificationSubFlow
  • Microsoft.CrawlerLanguageDetectorSubFlow
  • Microsoft.CrawlerLinkDeleteSubFlow
  • Microsoft.CrawlerNoIndexSubFlow
  • Microsoft.CrawlerPhoneNumberNormalizationSubFlow
  • Microsoft.CrawlerSearchAnalyticsSubFlow
  • Microsoft.CrawlerTermExtractorSubFlow
  • Microsoft.CrawlerWordBreakerSubFlow
  • Microsoft.SharePointSearchProviderFlow
  • Microsoft.PeopleExpertiseSubFlow
  • Microsoft.PeopleFuzzyNameMatchingSubFlow
  • Microsoft.PeopleKeywordParsingSubFlow
  • Microsoft.PeopleLinguisticsSubFlow
  • Microsoft.PeopleResultRetrievalAndProcessingSubFlow
  • Microsoft.PeopleSearchFlow
  • Microsoft.PeopleSecuritySubFlow
  • Microsoft.OpenSearchProviderFlow
  • Microsoft.ExchangeSearchProviderFlow
  • Microsoft.DocParsingSubFlow
  • Microsoft.MetadataExtractorSubFlow
  • Microsoft.AcronymDefinitionProviderFlow
  • Microsoft.BestBetProviderFlow
  • Microsoft.QueryClassificationDictionaryCompilationFlow
  • Microsoft.RemoteSharepointFlow
  • Microsoft.PersonalFavoritesProviderFlow
  • Microsoft.QueryRuleConditionMatchingSubFlow
  • Microsoft.CrawlerDocumentRetrievalSubFlow
  • Microsoft.CrawlerIndexingSubFlow
  • Microsoft.CrawlerPropertyMappingSubFlow
  • Microsoft.CrawlerSecurityInsertSubFlow
  • Microsoft.OOTBEntityExtractionSubFlow
  • Microsoft.CustomEntityExtractionSubFlow

The most important flow is the Microsoft.Crawlerflow.  This flow is the master flow and defines the order of how all the other flows will be executed.  A flow is simply an xml document that defines the flows and operators that should be executed on an item that is processed in the engine.  The xml makes up an OperatorGraph.  Each Operator has a name and a type attribute.  The type attribute is made up of the namespace where the class lives that contains the code for the flow and then the name property of a special attribute added to the class.  Each operator is deserialized into an instance of a class as the flow is "parsed".  As you review the xml, you should see that the flow of the flow is determined by the "operatorMonkier" that has the name of the next operator that should be executed.  The first part of this file looks like the following:

 

<?xml version="1.0" encoding="utf-8"?>
<OperatorGraph dslVersion="1.0.0.0" name="" xmlns="http://schemas.microsoft.com/ceres/studio/2009/10/flow">
  <Operators>

    <Operator name="FlowInput" type="Microsoft.Ceres.Evaluation.Operators.Core.Input">
      <Targets>
        <Target breakpointEnabled="false">
          <operatorMoniker name="//Init" />
        </Target>
      </Targets>
      <Properties>
        <Property name="inputName" value="&quot;CSS&quot;" />
        <Property name="useDisk" value="False" />
        <Property name="sortedPrefix" value="0" />
        <Property name="updatePerfomanceCounters" value="True" />
      </Properties>
      <OutputSchema>
        <Field name="content" type="Bucket" />
        <Field name="id" type="String" />
        <Field name="source" type="String" />
        <Field name="data" type="Blob" />
        <Field name="getpath" type="String" />
        <Field name="encoding" type="String" />
        <Field name="collection" type="String" />
        <Field name="operationCode" type="String" />
      </OutputSchema>
    </Operator>

    <Operator name="Init" type="Microsoft.Ceres.ContentEngine.Operators.BuiltIn.Mapper">
      <Targets>
        <Target breakpointEnabled="false">
          <operatorMoniker name="//Operation Router" />
        </Target>
      </Targets>
      <Properties>
        <Property name="expressions" value="{&quot;externalId&quot;=&quot;ToInt64(Substring(id, 7))&quot;}"/>
        <Property name="fieldsToRemove" />
        <Property name="adaptableType" value="True" />
      </Properties>
      <OutputSchema>
        <Field name="tenantId" type="Guid" canBeNull="true" expression="IfThenElse(BucketHasField(content, &quot;012357BD-1113-171D-1F25-292BB0B0B0B0:#104&quot;), ToGuidFromObject(GetFieldFromBucket(content, &quot;012357BD-1113-171D-1F25-292BB0B0B0B0:#104&quot;)), ToGuid(&quot;0C37852B-34D0-418E-91C6-2AC25AF4BE5B&quot;))" />
        <Field name="isdir" type="Boolean" canBeNull="true" expression="NullValue(ToBoolean(GetFieldFromBucket(content, &quot;isdirectory&quot;)),false)" />
        <Field name="noindex" type="Boolean" canBeNull="true" expression="NullValue(ToBoolean(GetFieldFromBucket(content, &quot;noindex&quot;)),false)" />
        <Field name="oldnoindex" type="Boolean" canBeNull="true" expression="NullValue(ToBoolean(GetFieldFromBucket(content, &quot;oldnoindex&quot;)),false)" />
        <Field name="getpath" type="String" expression="IfThenElse(BucketHasField(content, &quot;path_1&quot;), GetStringFromBucket(content, &quot;path_1&quot;), GetStringFromBucket(content, &quot;path&quot;))" />
        <Field name="extrapath" type="String" expression="GetStringFromBucket(content, &quot;path_1&quot;)" />
        <Field name="size" type="Int32" expression="TryToInt32(GetFieldFromBucket(content, &quot;size&quot;))" />
        <Field name="docaclms" type="Blob" expression="GetFieldFromBucket(content, &quot;docaclms&quot;)" />
        <Field name="docaclsp" type="Blob" expression="GetFieldFromBucket(content, &quot;spacl&quot;)" />
        <Field name="docaclmeta" type="String" expression="IfThenElse(BucketHasField(content, &quot;2EDEBA9A-0FA8-4020-8A8B-30C3CDF34CCD:docaclmeta&quot;), GetStringFromBucket(content, &quot;2EDEBA9A-0FA8-4020-8A8B-30C3CDF34CCD:docaclmeta&quot;), GetStringFromBucket(content, &quot;docaclmeta&quot;))" />
        <Field name="docaclgrantaccesstoall" type="Boolean" canBeNull="true" expression="NullValue(ToBoolean(GetFieldFromBucket(content, &quot;grantaccesstoall&quot;)),false)" />
        <Field name="externalId" type="Int64" expression="&quot;ToInt64(Substring(id, 7))&quot;" />
        <Field name="sitecollectionid" type="Guid" canBeNull="true" expression="ToGuid(GetStringFromBucket(content, &quot;00130329-0000-0130-C000-000000131346:ows_SiteID&quot;))" />
        <Field name="fallbackLanguage" type="String" expression="&quot;en&quot;" />
        <Field name="Path" type="String" expression="GetStringFromBucket(content, &quot;49691C90-7E17-101A-A91C-08002B2ECDA9:#9&quot;)"/>
        <Field name="SiteID" type="String" expression="GetStringFromBucket(content, &quot;00130329-0000-0130-C000-000000131346:ows_SiteID&quot;)" />
  <Field name="ContentSourceID" type="Int64" canBeNull="true" expression="IfThenElse(BucketHasField(content, &quot;012357BD-1113-171D-1F25-292BB0B0B0B0:#662&quot;), ToInt64(GetFieldFromBucket(content, &quot;012357BD-1113-171D-1F25-292BB0B0B0B0:#662&quot;)), ToInt64(-1))" />
  <Field name="Attachments" type="List&lt;Stream&gt;" canbenull="true" expression="GetFieldFromBucket(content, &quot;attachments&quot;)"/>
  <Field name="FileExtension" type="String" expression="IfThenElse(BucketHasField(content, &quot;0B63E343-9CCC-11D0-BCDB-00805FCCCE04:FileExtension&quot;), GetStringFromBucket(content, &quot;0B63E343-9CCC-11D0-BCDB-00805FCCCE04:FileExtension&quot;), &quot;&quot;)" />
      </OutputSchema>
    </Operator>

 

As you can see the first operator that is executed is of the type "Microsoft.Ceres.Evaluation.Operators.Core.Input".  This means that if you look in the "Microsoft.Ceres.Evaluation.Operators" namespace, you will find a class that is decorated like the following:

 

[Serializable, Operator("Input", MinInputCount=0, MaxInputCount=0)]
public class InputOperator :
TypedOperatorBase<InputOperator>, IMemoryUsingOperator, IOutputTypeConfigurableOperator
{

 

You should note that the class is marked as serializable and that the Operator attribute has been added with the name "Input".  Again, the combination of the namespace of the class and the name of the attribute are used to find the operator when the flow is executed.

One of the flows that I am most interested in is the Microsoft.CrawlerContentEnrichmentSubFlow.  As some of you are aware, you can "extend", really don't like that word used in context of Content Enrichment now that I know how to do flow insertion, using a web service to add your own logic to create new crawled properties on items that pass through the engine.  You can find more information about content enrichment and examples of using it at http://msdn.microsoft.com/en-us/library/jj163968.aspx.  Now, Microsoft is going to tell you that this is the only supported way to extend the Ceres engine.  And that is correct.  What I am about to show you has never been done outside of Microsoft and if you venture down this path, you do so on your own.  Anyway, the problem with the CES is that it is not flexible and it uses stupid old technology called web services.  that means it is sending this big ugly xml around on the wire…not JSON.  Bummer.  That's not the only thing.  When you look at the pipeline and all the  things you are indexing, if you do not put a trigger on the CES, EVERY single item will be passed to your service.  You would then need to have all kinds of logic to determine the type of the item, what properties exist on it by looping through them all and so many other weird bad things it just makes me cringe.  Now, if you do put a trigger on it, you are now limiting yourself to implementing a very targeted set of logic.  You have no ability to add more than one CES with different triggers with different logic. Huh?  Big feature gap here.  I'm not a fan.  So for people that just don't want to multiple the time it takes to do a crawl by 100 to 1000x over because you implemented CES, you need a better option.  A faster option.  A more reliable and performant option.  One that lives in the engine, not outside of it.  If you want to know how to do this…keep reading!

Ok, so this is all simple so far.  But how does one add a new flow to the Ceres engine and then implement your own Operators?  Well, this is much more difficult than you think!

The first step is to create an operator class that inherits from TypedOperatorBase<T>. Where T is the class name. This is an abstract class and you must implement the method called ValidateAndType. You can see most of this in the operator example above. The next step is to add the Serializable and Operator attributes to the class. Ok, fair enough, now what do we do? If you look at the XML of an operator, you will see that you can implement properties and that those properties are simply deseriabled to the properties in the class. Ok, so add some properties. In my example, I create a class with one property:

 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres.Evaluation;
using Microsoft.Ceres.Evaluation.Operators;
using Microsoft.Ceres.Evaluation.Operators.PlugIns;

namespace CustomOperator
{
    [Serializable, Operator("CustomOperator")]
    public class CustomOperator : TypedOperatorBase<CustomOperator>
    {
        private string custom = "";

        public CustomOperator()
        {
            this.custom = "Chris Givens was here";
        }

        [Property(Name="custom")]
        public string Custom
        {
            get { return custom; }
            set { custom = value; }
        }

        protected override void ValidateAndType(OperatorStatus status, IList<Microsoft.Ceres.Evaluation.Operators.Graphs.Edge> inputEdges)
        {
            status.SetSingleOutput(base.SingleInput.RecordSetType);           
        }
    }
}

 

Ok, great.  So now what do we do?  Well, I wasn't sure if the system would just pick up the assembly from the GAC dynamically so I figured, let's just deploy the solution and try to add a flow with the operator in it.  Here's how you do that:

Deploy the assembly to the GAC…easy, right-click the project, select "Deploy"

Next, create a new flow (xml file) that uses the operator:

 

<?xml version="1.0" encoding="utf-8" ?>
<OperatorGraph dslVersion="1.0.0.0" name="CustomFlow" xmlns=" http://schemas.microsoft.com/ceres/studio/2009/10/flow">
  <Operators>   

    <Operator name="SubFlowInput" type="Microsoft.Ceres.ContentEngine.Operators.BuiltIn.SubFlow.SubFlowInput">
      <Targets>
        <Target breakpointEnabled="false">         
          <operatorMoniker name="//CustomOperator" />
          <!–
          <operatorMoniker name="//SubFlowOutput" />
          –>
        </Target>
      </Targets>
      <Properties>
        <Property name="adaptableType" value="True" />
      </Properties>
    </Operator>
       
    <Operator name="CustomOperator" type="CustomOperator.CustomOperator">                                         
      <Targets>
        <Target breakpointEnabled="false">
          <operatorMoniker name="//SubFlowOutput" />
        </Target>
      </Targets>
      <Properties>
        <Property name="custom" value="2048"/>
      </Properties>
    </Operator>  
   
    <Operator name="SubFlowOutput" type="Microsoft.Ceres.ContentEngine.Operators.BuiltIn.SubFlow.SubFlowOutput" />
 
  </Operators>
</OperatorGraph>

 

Connect to the ceres engine and try to deploy the flow:

 

Add-PsSnapin Microsoft.SharePoint.Powershell
& "C:Program FilesMicrosoft Office Servers15.0SearchScriptsceresshell.ps1"
Connect-System -Uri (Get-SPEnterpriseSearchServiceApplication).SystemManagerLocations[0] -ServiceIdentity contososp_farm
Connect-Engine -NodeTypes InterActionEngine
$flowname = "CustomFlow"
Remove-Flow $flowname
Get-Content C:CustomOperatorCustomOperator$flowname.xml | Out-String | Add-Flow $flowname
Stop-Flow –FlowName $flowname –ForceAll

 

You will get the following error that the system cannot find the Operator called CustomOperator.CustomOperator.  Bummer.  So that didn't work.  So how do I "register" my operator with the engine?  Well, it turns out that their is so much more that needs to be done than simply creating an operator class.  You also need to create several other classes with special attributes attached to them.  Sooo…here we go!

First off, you will need to create a Producer class.  This producer is really the class that does all the work.  The operator is really just a way to get some parameters into the producer.  As you can see the Producer inherits from SingleOutputProducer<T>, where T is your operator class. Here is an example of the producer:

 

 using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres;
using Microsoft.Ceres.Evaluation;
using Microsoft.Ceres.Evaluation.DataModel;
using Microsoft.Ceres.Evaluation.DataModel.Types;
using Microsoft.Ceres.Evaluation.Processing;
using Microsoft.Ceres.Evaluation.Processing.Producers;

namespace CustomOperator
{
    public class CustomProducer : SingleOutputProducer<CustomOperator>
    {
        private CustomOperator op;
        private IRecordSetTypeDescriptor type;
        private IEvaluationContext context;
       
        public CustomProducer(CustomOperator op, IRecordSetTypeDescriptor type, IEvaluationContext context)
        {
            this.op = op;
            this.type = type;
            this.context = context;
        }

        private IUpdateableRecord holder;
        //private Item holderItem;

        public override void ProcessRecord(IRecord record)
        {
            this.holder.UpdateFrom(record);

            base.SetNextRecord(record);
        }
    }
}

 

Next up is to create a NamedPlugInSource.  Operators are also called "PlugIns".  These plugins must be registered with the system in order for you to use them.  If you review all the operator assembiles, you will see that there is always some kind of *PlugInSource class that has the role of adding plugins to the Ceres core system.  For my pluginsource, I only have one operator and that is my CustomOperator:

 using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres.CoreServices.Services;
using Microsoft.Ceres.CoreServices.Services.Node;

using Microsoft.Ceres.Evaluation.Operators;
using Microsoft.Ceres.Evaluation.Operators.PlugIns;

namespace CustomOperator
{
    [DynamicComponent]
    public class CustomPlugInSource : NamedPlugInSource<OperatorBase>
    {
        public static OperatorBase PlugIn1()
        {
            File.AppendAllText(@"c: empsearch.txt", "PlugIn1");
            return new CustomOperator();
        }

        protected override void AddPlugIns()
        {
            File.AppendAllText(@"c: empsearch.txt", "AddPlugIns");
            Func<OperatorBase> f = PlugIn1;
            base.Add(typeof(CustomOperator),f);
        }
    }
}

Now that you have the plugin built.  You will notice that is has been decorated with the "DynamicComponent" attribute.  This is where the "Ah-ha" moment kicks in.  By adding this attribute to the assembly, Ceres knows that is must start this as a managed component in the system.  However, just simply deploying this to the GAC, will not get Ceres to recognize the assembly and load the components.  We'll get to that soon, we still have lots more to talk about!

Next up is an Evaluator.  An Evaluator is responsible for actually making the call to the producer.  In my example I create a class that inherits from ProductEvaluator<T> where T is my CustomOperator.  ProductEvaluator is again an abstract class with one method called GetProducer.  You must instatiate your producer here and return it.  There are many types of producers, but I have not had the time to document all of them as of yet. Soon though!

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres.CoreServices.Services;
using Microsoft.Ceres.CoreServices.Services.Node;
using Microsoft.Ceres.CoreServices.Services.Container;
using Microsoft.Ceres.Evaluation.Processing;
using Microsoft.Ceres.Evaluation.Processing.Producers;
using Microsoft.Ceres.Evaluation.Operators;

using Microsoft.Ceres.Evaluation.DataModel;
using Microsoft.Ceres.Evaluation.Operators.Graphs;

namespace CustomOperator
{
    public class CustomEvaluator : ProducerEvaluator<CustomOperator>
    {       
        /*
        protected override IRecordSet SetupOutput(Edge outputEdge, IList<IRecordSet> inputs)
        {
            CustomProducer cp = new CustomProducer();                       
            return null ;
        }
         */

        protected override IRecordProducer GetProducer(CustomOperator op, Microsoft.Ceres.Evaluation.DataModel.Types.IRecordSetTypeDescriptor type, IEvaluationContext context)
        {
            return new CustomProducer(op, type, context);
        }
    }
}

Next on the list is an EvaluatorBinder.  The evaluator binder is responsible for registering an operator with an evaluator.  This class will inherit from AbstractEvaluatorBinder and need to implement the AddBoundOperators and BindEvaluator methods:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres.Evaluation.Processing;
using Microsoft.Ceres.Evaluation.Operators;

namespace CustomOperator
{
    public class CustomEvaluatorBinder : AbstractEvaluatorBinder
    {
        protected override void AddBoundOperators()
        {
          base.Add(typeof(CustomOperator));
        }

        public override Evaluator BindEvaluator(OperatorBase op, IEvaluationContext context)
        {
            if (op is CustomOperator)
            {
                return new CustomEvaluator();
            }

            return null;
        }
    }
}

Last on the list is the EvaluatorBinderSource.  Similar to a PlugInSource, this will also be decorated with the DynamicComponent attribute which will instantiate and register the evaluators.  Here is the binder source:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

using Microsoft.Ceres.CoreServices.Services;
using Microsoft.Ceres.CoreServices.Services.DependencyInjection;
using Microsoft.Ceres.CoreServices.Services.Node;
using Microsoft.Ceres.CoreServices.Services.Container;
using Microsoft.Ceres.Evaluation.Processing;
using Microsoft.Ceres.Evaluation.Operators;

namespace CustomOperator
{
    [DynamicComponent]
    public class CustomEvaluatorBinderSource : AbstractContainerManaged
    {
        [Exposed]
        public IEvaluatorBinder CieEvaluatorBinder
        {
            get
            {
                this.exampleBinder = new CustomEvaluatorBinder();
                return this.exampleBinder;
            }
        }

        private CustomEvaluatorBinder exampleBinder;
    }
}

You now have everything you need to add a new flow and operator to the Ceres engine!  Kinda.  If you deploy the code at this point, you will notice if you try to run the above install script, you will still get the same error!  This is because the assemblies only get loaded when you restart the Host Controller service.  NOTE:  You can read more about the Host controller service in Randy Williams and I's MSPress book on SharePoint due out very soon.  Ok, so re-start the service.  Try the commands…NO GO…bummer.  But I did everything you said Chris!  Why doesn't it recognize my operator? Well…going back to my previous statement, Ceres nodes don't look at the entire GAC and analyze every class. That would be WAAAY to expensive. So it only does the one that it is told to do. This was the final magic step that I stumbled upon very luckily.

For each node that is started (via the NodeRunner.exe process), each one is fed its own configuration file that drives the WCF configuration.  This file is stored in C:Program FilesMicrosoft Office Servers15.0SearchRuntime1.0
oderunner.exe.config.  It is a very generic file, not much going on here.  As part of the NodeController code, it will look for another file and feed some special values into the process in addition to the regular app.config file.  These files are stored in the Ceres node directory which is in C:Program FilesMicrosoft Office Servers15.0DataOffice ServerApplicationsSearchNodes<RandomNodeID>.  Each role that has been assigned to the server will get a directory under this path.  Since  most of what we are doing is related to the ContentProcessingComponent, let's look there first.  If you open and explore this directory, what you will find is a nodeprofile.xml file.  It looks like this…tell me if you notice anything interesting:

<?xml version="1.0" encoding="utf-8"?>
<NodeProfile xmlns="http://schemas.microsoft.com/ceres/hostcontroller/2011/08/nodeprofile">
  <AutoStart xmlns="">true</AutoStart>
  <Stopped xmlns="">false</Stopped>
  <Modules xmlns="" />
  <Properties xmlns="">
    <Property Key="Managed.Node.Name" Type="string" Value="ContentProcessingComponent1" />
    <Property Key="Managed.SystemManager.ConstellationName" Type="string" Value="A99B1A" />
    <Property Key="Managed.Node.SystemName" Type="string" Value="A99B1A" />
    <Property Key="Managed.SystemManager.ConstellationVersion" Type="int" Value="-1" />
    <Property Key="Managed.Runtime.Version" Type="string" Value="1.0" />
    <Property Key="Managed.Node.LocalSystemManager" Type="bool" Value="False" />
    <Property Key="Managed.Node.ShutdownOnComponentFailed" Type="bool" Value="True" />
    <Property Key="Managed.Node.ProcessPriorityClass" Type="string" Value="BelowNormal" />
    <Property Key="Managed.Node.DynamicAssemblies" Type="string" Value="Microsoft.Ceres.ContentEngine.AnnotationPrimitives, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Bundles, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Component, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.DataModel.RecordSerializer, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Fields, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.LiveEvaluators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.NlpEvaluators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.NlpOperators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Operators.BuiltIn, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Parsing.Component, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Parsing.Evaluators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Parsing.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Processing, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Processing.BuiltIn, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Properties, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.AliasLookup, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.RecordCache, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.RecordType, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Repository, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Services, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.SubmitterComponent, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Types, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Util, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.ContentEngine.Processing.Mars, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.DataModel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.DataModel.Types, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Engine, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Engine.WcfTransport, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Operators.BuiltIn, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Operators.Core, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Operators.Parsing, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Processing, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Processing.BuiltIn, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.Evaluation.Services, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.DocumentModel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Admin, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.ContentRouter, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Services, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Utils, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Schema.SchemaCatalogProxy, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Query.MarsLookupComponent, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.FastServerMessages, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchCore.Schema.SchemaCatalog, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.AnnotationStore, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.Automata, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.Dictionaries, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.DictionaryInterface, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.Ese.Interop, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.RichFields, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.RichTypes, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.StringDistance, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.Transformers, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.NlpBase.IndexTokenizer, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.AnalysisEngine.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.SearchAnalytics.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;Microsoft.Ceres.UsageAnalytics.Operators, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c;CustomOperator, Version=1.0.0.0, Culture=neutral, PublicKeyToken=7d300eac1b9f50c2" />
    <Property Key="Managed.Node.SearchServiceApplicationName" Type="string" Value="14087e61-67e2-4245-b23d-0e52c6dcf704" />
    <Property Key="Managed.Node.SystemDisplayName" Type="string" Value="0a1ee46f-59f2-49b7-bfca-bb4d20adaf1a" />
    <Property Key="Managed.Node.BasePort" Type="int" Value="17042" />
    <Property Key="Managed.Node.BasePort.4" Type="int" Value="17046" />
    <Property Key="PortShared" Type="bool" Value="True" />
  </Properties>
</NodeProfile>

 

If you guessed the "Managed.Node.DynamicAssemblies" property…then you are very smart! [:D]  Yep…that is what we are looking for.  Those are the only assemblies that will be loaded into the AppDomain.  Only these assemblies will be interrogated for the DynamicComponent attribute.  Great!  So as you can see, I have added my CustomOperater assembly to the list.  Let's try again and run the script.  Dang it!  NO GO!  It still doesn't like my CustomOperator.CustomOperator operator!   Grrr….so at this point, I'm really wondering if my assembly is getting laoded…after a browse in the ULS logs…I see these:

08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Inactive] to [Configuring and eventSent=False] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiyyo Verbose  Microsoft.Ceres.CoreServices.Management.ManagementServer : Registered agent CustomOperator.CustomEvaluatorBinderSource.ComponentManager of type Microsoft.Ceres.CoreServices.Services.Container.IComponentManagerManagementAgent 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Configuring] to [Configured and eventSent=False] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Configured] to [Resolving and eventSent=True] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Resolving] to [Readying and eventSent=False] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Readying] to [Ready and eventSent=True] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Ready] to [Activating and eventSent=False] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomPlugInSource) : CustomPlugInSource moved from [Activating] to [Active and eventSent=True] 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiywj Medium   ComponentManager(CustomOperator.CustomPlugInSource) : CustomOperator.CustomPlugInSource [Active] started 
08/20/2013 22:32:32.18  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x0784 Search                         Search Platform Services       aiyv9 Verbose  ComponentManager(CustomOperator.CustomPlugInSource) : ***** QUEUESENTINEL finished task for CustomOperator.CustomPlugInSource: CustomOperator.CustomPlugInSource[Active]state Active 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Inactive] to [Configuring and eventSent=False] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Configuring] to [Configured and eventSent=False] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Configured] to [Resolving and eventSent=True] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Resolving] to [Readying and eventSent=False] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Readying] to [Ready and eventSent=True] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Ready] to [Activating and eventSent=False] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywq High     ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomEvaluatorBinderSource moved from [Activating] to [Active and eventSent=True] 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiywj Medium   ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : CustomOperator.CustomEvaluatorBinderSource [Active] started 
08/20/2013 22:32:32.20  NodeRunnerContent1-0a1ee46f-59f (0x3994) 0x39E4 Search                         Search Platform Services       aiyv9 Verbose  ComponentManager(CustomOperator.CustomEvaluatorBinderSource) : ***** QUEUESENTINEL finished task for CustomOperator.CustomEvaluatorBinderSource: CustomOperator.CustomEvaluatorBinderSource[Active]state Active 

Ok…they ARE being loaded.  So what the hell is going on?  Well…the clue WAS in the logs files.  After running the PowerShell to attempt to add the flow, I noticed something.  It was the name of the component that is actually being used to register a flow.  Its called QueryProcessingComponent1.  Well ok, so it seems that even though the content processing node does all the work, the query processing component manages all the registration of the plugins and operators.  After going back to the node directory, I find the QueryProcessingComponent1 directory and find that it too has a NodeProfile.xml file.  Bingo.  Adding the assembly to the property and restarting the host controller one more time, I again attempt to add a custom flow, with a custom operator.

YYYYYEEEESSSS!!!!  NO ERROR…………..I successfully inserted my flow and operator into the Ceres engine!  Now, what part is missing?  Well, even though the flow is now installed and working, it is not a part of the main flow ("Microsoft.CrawlerFlow").  I would need to insert the flow into that main file and then redeploy it.  The main issue with that, is that not all of the operators are recognized by the system.  Yeah, weird I know.  This is part of the installationdeployment of the search service application and is there by default.  if you ever want to make changes, you would need to add all the possible assemblies to the query processing component and then update the main flow.

In terms of debugging, you can attach to the NodeRunner.exe processes and debug your operator and evaluators.  Easy.

Now for some clean up.  All those bundles of flows at the top of this post.  How did they get there?  Well, what happens is each time you upload a flow, it will generate a new assembly with the flow added to it as a resource.  If you were to reflect on any of the assemblies above, you can get the flow xml out of the assembly.  But this is also easily done using the Windows PowerShell commands above.

I will be posted all the code for this project on code.msdn.microsoft.com.  You can use it as a starting point for implementing your own flows and operators.  But you are probably asking, why would I do something that is not supported.  Well, its the same reason you want to keep your job.  The customer wants high performance and needs to implement way more than the Content Enrichment Service can provide and saying no will stop any chance you have of completing an incredibly cool and awesome project.  Now,  why is this not supported if you CAN do it?  Well, as you can see, it is VERY complex.  Only a few people in the world are going to be able to build these, deploy them and successfully use them.  So you are still asking yourself…why did you post this if we can't really do it…great question!

BECAUSE I WANT IT SUPPORTED.  If we band together and find various use cases for doing this, the product team will have no choice but to train the Level I, II, and III support people on how to troubleshoot these.  As of right now, it is simply unsupported from the fact that the support people don't even know what a flow and operator is when it comes to supporting SharePoint Search (update: met with the search team and the COE support team *is* familiar with flows so were one step closer to support).  It would be my goal to get some ISVs to start playing around with creating custom flows and operators to make SharePoint Search a BEAST (not that is already isn't cuz its the best on the market right now, sorry Google Appliance but you suck big time)!  So…there you have it.  Do with it what you will, have fun, be smart and as always…enjoy!

Chris

BCS, OData and Subscriptions – How to get it working!

So what have I been working on for the past two weeks?  Well, other than consulting clients, books and working on my garden, I have also been involved with the Microsoft Learning SharePoint 2013 Advanced Development Microsoft Official Curriculum (MOC) course 20489 that will be available later in the year (sorry no link just yet but it will be here when it is released).  I was able to finish up two chapters on Search quickly as that is one of my main fortes, but then decided to take what I though was the middle of two Business Connectivity Services (BCS) chapters.  For those of you not familiar with BCS, you can find a great overview here. It turns out, the module was the hardest one!  Why?  Because it covers things that no one has ever done before (outside of the product team that is). 

So what is the big deal about what I worked on?  You are probably saying to yourself…BCS has been around for a while right?  Well, yes this is very true, and there are several great posts about how to expose external data using SharePoint Designer and Visual Studio using the various BDC model types (Database, WCF, .NET Connectivity and Custom).  You can also find how to implement stereotyped methods that support CRUD methods and search indexing (link here).  Given all that content, there were a game changing set of features that were added to BCS in SharePoint 2013 that add a whole new level of complexity.  These features include:

There are plenty of posts on OData in general (this one from MSDN is pretty awesome if you are just getting started) and a few posts on how to setup a BDC OData model.  And although my fellow SharePoint MVP Scot Hillier did a presentation on the subscriber model at the last SharePoint Conference it was only in context of a database model.  When it comes to integrating the two features (OData and the subscriber methods) together, that is where a massive black hole exists and is the focus of this blog post. 

The first step to getting this whole thing to work is to create an OData service.  This is very simple with the tools provided by Visual Studio and steps to do this are provided in this MSDN post

    The next step is to build your basic BCS model using the new item template wizard provided in Visual Studio 2012.  This has also been nicely blogged about by several of my colleagues and does have an article on MSDN.  The important thing to note about the MSDN article I reference is that it is using an OData feed that is hosted by http://services.odata.org.  Since you do not own this service, you will not be able to extend it to implement the subscribe and unsubscribe methods that I discuss later in this post.  Therefore, you can follow the steps in the article, but use a local instance of your OData service. 

     

    Once the service has been generated, you must add some supporting methods to your OData service to accept data from SharePoint when a subscription occurs.  There are some gotchas to this.  Currently there is no real guidance on how to set this up properly. The little that does exist will point you to mixed signals as to how to successfully setup the communication layers. In my example below, you will see that I am using a GET for the http method.  This was the only successful way that I was able to get the method parameters to populate in the web method in the OData service.  As you will see later, there are also some very important BDC method properties that must be set in order for all of this to work:

    [WebGet]
            public string Subscribe(string DeliveryURL, int EventType, string EntityName, string SelectColumns)
            {
                //HttpRequest req = System.Web.HttpContext.Current.Request;                       

                // Generate a new Guid that will function as the subscriptionId.
                string subscriptionId = Guid.NewGuid().ToString();

                if (DeliveryURL == null || EventType == null || EntityName == null || SelectColumns == null)
                    throw new Exception(""Missing parameters");

                // This sproc will be used to create the subscription in the database.
                string subscribeSproc = "SubscribeEntity";

                string sqlConn = "Data Source=.;Initial Catalog=Northwind;uid=sa;pwd=Pa$$w0rd";

                // Create connection to database.
                using (SqlConnection conn = new SqlConnection(sqlConn))
                {
                    SqlCommand cmd = conn.CreateCommand();
                    cmd.CommandText = subscribeSproc;
                    cmd.CommandType = CommandType.StoredProcedure;

                    cmd.Parameters.Add(new SqlParameter("@SubscriptionId", subscriptionId));
                    cmd.Parameters.Add(new SqlParameter("@EntityName", EntityName));
                    cmd.Parameters.Add(new SqlParameter("@EventType", EventType));
                    cmd.Parameters.Add(new SqlParameter("@DeliveryAddress", DeliveryURL));
                    cmd.Parameters.Add(new SqlParameter("@SelectColumns", SelectColumns));

                    try
                    {
                        conn.Open();
                        cmd.ExecuteNonQuery();
                    }
                    catch (Exception e)
                    {
                        throw e;
                    }
                    finally
                    {
                        conn.Close();
                    }

                    return subscriptionId;
                }
            }

     [WebGet]
            public void Unsubscribe(string subscriptionId)
            {
                HttpRequest req = System.Web.HttpContext.Current.Request;
               
                // This sproc will be used to create the subscription in the database.
                string subscribeSproc = "UnsubscribeEntity";

                string sqlConn = "Data Source=.;Initial Catalog=Northwind;uid=sa;pwd=Pa$$w0rd";

                // Create connection to database.
                using (SqlConnection conn = new SqlConnection(sqlConn))
                {
                    SqlCommand cmd = conn.CreateCommand();
                    cmd.CommandText = subscribeSproc;
                    cmd.CommandType = CommandType.StoredProcedure;

                    cmd.Parameters.Add(new SqlParameter("@SubscriptionId", subscriptionId));

                    try
                    {
                        conn.Open();
                        cmd.ExecuteNonQuery();
                    }
                    catch (Exception e)
                    {
                        throw e;
                    }
                    finally
                    {
                        conn.Close();
                    }               
                }
            }

    On the BCS side, you need to add the stereotyped methods that will send the data to the web methods you just created in the last step.  This includes the EntitySubscriber and EntityUnsubscriber methods.  First the let's review the EntitySubscriber method.  In the table below, you will notice that I am sending the OData web method parameters in the querystring.  You can use the '@' parameter notation just like in regular BDC Models to token replace the values.  You should also use HTML encoded '&amp;' to signify the ampersand (this was one of the things that took me a while to figure out).  Notice the various method parameters.  They include:

    • ODataEntityUrl – this is appended to the ODataServiceURL property of the LobSystemInstance (note that later when doing an explicit subscription call, the notification callback url will NOT be used)
    • ODataHttpMethod – the type of HTTP method you will perform (GET, POST, MERGE, etc).  I was never able to get POST to work with a Visual Studio generated OData layer, more on that later.
    • ODataPayloadKind – This is one of the more confusing aspects of OData.  You can find the enumeration for the ODataPayloadKind here, but there is very little documentation on how it works between SharePoint and the custom methods you generate on the OData service side.  It took me forever to figure out that the "Entity" payload just doesn't work.  After running through just about every permutation of Http methods, payloads and formats, I finally found a working combination with the "Property" payload
    • ODataFormat – This was another painful setting to determine.  When you create your OData service, it is expecting a very specific Content-Type http header to be sent, this header is based on the version of Visual Studio you have.  I learned this the hard way, but things started to make sense for me after I reviewed this awesome post about how the OData service generation and layers works in Microsoft world and how to customize its behavior after generating it. For more information on OData supported version, check out this post.  In several examples, you may see that the format is set to "application/atom+xml".  Well, that format is not supported in the OData service!  What you will end up with is an http exception being sent to the calling client (in this case SharePoint) that says "Unsupported media type".  This is very unfortunate.  Why?  Because the error occurs last in the call stack of the web method…AFTER your web method code has run and created the subscription successfully!  In order to catch this type of event, you must override the HandleException method of the OData service and rollback any subscriptions that were created by using some kind of instance variable!  This would apply to anything that happens that would result in an error as the response is being sent back to the client.
    • ODataServiceOperation – still haven't figured out what this does!
    • NotificationParserType – this will be explored more below

    Here is the working method XML for the Subscribe method:

    <Method Name="SubscribeCustomer" DefaultDisplayName="Customer Subscribe" IsStatic="true">
                  <Properties>
                    <Property Name="ODataEntityUrl" Type="System.String">/Subscribe?DeliveryURL='@DeliveryURL'&amp;EventType=@EventType&amp;EntityName='@EntityName'&amp;SelectColumns='@SelectColumns'</Property>
                    <Property Name="ODataHttpMethod" Type="System.String">GET</Property>                
                    <Property Name="ODataPayloadKind" Type="System.String">Property</Property>                
                    <Property Name="ODataFormat" Type="System.String">application/json;odata=verbose</Property>
                    <Property Name="ODataServiceOperation" Type="System.Boolean">false</Property>
     
                 </Properties>
                  <AccessControlList>
                    <AccessControlEntry Principal="NT AuthorityAuthenticated Users">
                      <Right BdcRight="Edit" />
                      <Right BdcRight="Execute" />
                      <Right BdcRight="SetPermissions" />
                      <Right BdcRight="SelectableInClients" />
                    </AccessControlEntry>
                  </AccessControlList>
                  <Parameters>
                    <Parameter Direction="In" Name="@DeliveryURL">
                      <TypeDescriptor TypeName="System.String" Name="DeliveryURL" >
                        <Properties>                      
                          <Property Name="IsDeliveryAddress" Type="System.Boolean">true</Property>
                        </Properties>
                      </TypeDescriptor>
                    </Parameter>
                    <Parameter Direction="In" Name="@EventType">
                      <TypeDescriptor TypeName="System.Int32" Name="EventType" >
                        <Properties>
                          <Property Name="IsEventType" Type="System.Boolean">true</Property>
                        </Properties>                    
                      </TypeDescriptor>
                    </Parameter>
                    <Parameter Direction="In" Name="@EntityName">
                      <TypeDescriptor TypeName="System.String" Name="EntityName" >
                        <DefaultValues>
                          <DefaultValue MethodInstanceName="SubscribeCustomer" Type="System.String">Customers</DefaultValue>
                        </DefaultValues>
                      </TypeDescriptor>
                    </Parameter>
                    <Parameter Direction="In" Name="@SelectColumns">
                      <TypeDescriptor TypeName="System.String" Name="SelectColumns" >
                        <DefaultValues>
                          <DefaultValue MethodInstanceName="SubscribeCustomer" Type="System.String">*</DefaultValue>
                        </DefaultValues>
                      </TypeDescriptor>
                    </Parameter>
                    <Parameter Direction="Return" Name="SubscribeReturn">
                      <TypeDescriptor Name="SubscriptionId" TypeName="System.String" >
                        <Properties>
                          <Property Name="SubscriptionIdName" Type="System.String">SubscriptionId</Property>
                        </Properties>                        
                      </TypeDescriptor>                                        
                    </Parameter>
                  </Parameters>
                  <MethodInstances>
                    <MethodInstance Type="EventSubscriber" ReturnParameterName="SubscribeReturn" ReturnTypeDescriptorPath="SubscriptionId" Default="true" Name="SubscribeCustomer" DefaultDisplayName="Customer Subscribe">
                      <AccessControlList>
                        <AccessControlEntry Principal="NT AuthorityAuthenticated Users">
                          <Right BdcRight="Edit" />
                          <Right BdcRight="Execute" />
                          <Right BdcRight="SetPermissions" />
                          <Right BdcRight="SelectableInClients" />
                        </AccessControlEntry>
                      </AccessControlList>
                    </MethodInstance>
                  </MethodInstances>
                </Method>

    Next is the unsubscribe method, notice how SharePoint must pass back the subscription id that lives in the external system.  The name of the SubscriptionIdName property will always be SubscriptionId. This subscription id must be saved somewhere, but the question is…where?:

    <Method Name="UnSubscribeCustomer" DefaultDisplayName="Customer Unsubscribe">
                  <Properties>
                    <Property Name="ODataEntityUrl" Type="System.String">/UnSubscribe?SubscriptionId='@SubscriptionId'</Property>
                    <Property Name="ODataHttpMethod" Type="System.String">GET</Property>
                    <Property Name="ODataPayloadKind" Type="System.String">Property</Property>
                    <Property Name="ODataServiceOperation" Type="System.Boolean">false</Property>
                  </Properties>
                  <AccessControlList>
                    <AccessControlEntry Principal="NT AuthorityAuthenticated Users">
                      <Right BdcRight="Edit" />
                      <Right BdcRight="Execute" />
                      <Right BdcRight="SetPermissions" />
                      <Right BdcRight="SelectableInClients" />
                    </AccessControlEntry>
                    <AccessControlEntry Principal="Contosodomain users">
                      <Right BdcRight="Edit" />
                      <Right BdcRight="Execute" />
                      <Right BdcRight="SetPermissions" />
                      <Right BdcRight="SelectableInClients" />
                    </AccessControlEntry>
                  </AccessControlList>
                  <Parameters>
                    <Parameter Name="@SubscriptionId" Direction="In">
                      <TypeDescriptor Name="SubscriptionId" TypeName="System.String">
                        <Properties>
                          <Property Name="SubscriptionIdName" Type="System.String">SubscriptionId</Property>
                        </Properties>                   
                      </TypeDescriptor>
                    </Parameter>
                  </Parameters>
                  <MethodInstances>
                    <MethodInstance Name="UnSubscribeCustomer" DefaultDisplayName="Customer
                 Unsubscribe" Type="EventUnsubscriber" Default="true">
                      <AccessControlList>
                        <AccessControlEntry Principal="NT AuthorityAuthenticated Users">
                          <Right BdcRight="Edit" />
                          <Right BdcRight="Execute" />
                          <Right BdcRight="SetPermissions" />
                          <Right BdcRight="SelectableInClients" />
                        </AccessControlEntry>
                      </AccessControlList>
                    </MethodInstance>
                  </MethodInstances>
                </Method>

    Now that those items are setup, you need to deploy your BCS model and set permissions.  This is very common activity so I'll skip the details in this blog post, however I will say that it is annoying that the user that uploads the model is not automatically added (or have an option somewhere to add them on the import page) as a admin with permissions to the model and methods [:(]

    Now that the model is deployed, the next step is to enable a feature that enables subscription support, which brings us back to the question brought up before…where does SharePoint store the subscription id of the external system?  A list of course!  To create this list, there are two features of which you can enable.  One is called BCSEvents, the other is called ExternalSubscription.  The funny thing about these two features and their relationship is that the BCSEvents feature is made up of a feature activation receiver.  That receiver has only one goal:  To activate the ExternalSubscription feature.  In addition to this interesting design, you will find that the BCSEvents is a hidden feature whereas the ExternalSubscription feature is actually visible in the web features settings page.  What does the ExternalSubscription feature do?  It creates our list of course!  This list is called "External Subscriptions Store".  This is a hidden list and can be unhidden using PowerShell, but it exists in the "_private/ExtSubs" folder and has no views from which you can view the data, so again Windows PowerShell is the way to go if you want to see what lives in the list.  Here is a screen shot of the columns of the list:

    Next you need to create a subscription.  This can be done explicitly or implicitly.  The explicit way is to make a call to the entity's subscribe method as shown here (as previously pointed out above, the notification callback url is ignored in an OData Model):

    function SubscribeEntity() {
        var notificationCallback = new SP.BusinessData.Runtime.NotificationCallback(context, "http://localhost:19739/northwind.svc");
        var url = web.get_url();
        notificationCallback.set_notificationContext(url);
        context.load(notificationCallback);
        var subscription = entity.subscribe(1, notificationCallback, "administrator@contoso.com", "SubscribeCustomer", lobSystemInstance);
        context.load(subscription);
        context.executeQueryAsync(OnSubscribeSuccess, failmethod);
    }

     //these are the helper methods and variables

     var context;
    var web;
    var user;
    var entity;
    var lob;
    var lobSystemInstance;
    var lobSystemInstances;

    // This code runs when the DOM is ready and creates a context object which is needed to use the SharePoint object model
    $(document).ready(function () {
        context = SP.ClientContext.get_current();
        web = context.get_web();
        context.load(web);

        entity = web.getAppBdcCatalog().getEntity("NorthwindModel", "Customers");
        context.load(entity);

        lob = entity.getLobSystem();
        context.load(lob);

        lobSystemInstances = lob.getLobSystemInstances();
        context.load(lobSystemInstances);

        context.executeQueryAsync(GetLobSubscribesystemInstance, failmethod);
    });

    // Initialize the LobSystemInstance.
    function GetLobSubscribesystemInstance() {
        var $$enum_1_0 = lobSystemInstances.getEnumerator();
        while ($$enum_1_0.moveNext()) {
            var instance = $$enum_1_0.get_current();
            lobSystemInstance = instance;
            context.load(lobSystemInstance);
            break;
        }
        context.executeQueryAsync(SubscribeEntity, failmethod);
    }

    Subscriptions can be one of three types (you can learn more about event types here):

    • ItemAdded (1)
    • ItemUpdated (2)
    • ItemDeleted (3)

    Note that there is no event type that supports ItemAdding, ItemUpdating or ItemDeleting.  This means you cannot cancel the insertion, update or deletion in the external source, you can only expect to receive notification after the event has occurred.

    The implicit way is to create an alert or to setup an event receiver.  This means you should setup an external list pointing to your OData model.  You can then use the ribbon to create an alert which will in turn execute the web method call to create the subscription.  Note that you must setup your outgoing email settings on your farm, or the alert ribbon button will not display!  If an error occurs when creating an event receiver, you will be passed the web method exception from your OData service.  This can be very helpful for troubleshooting. 

    NOTE:  When you create an external list, there are several items that seems to get cached in the external list's properties that will require you to delete the list and then re-create it.  This means that as you are testing your solution, you should create a Windows PowerShell script that will remove your BDC model, re-deploy it, remove the external list and then add it back.

    Once this has all been completed, you can now start telling SharePoint that things have changed.  As much work as we have done to this point, it is really rather simple compared to the amount of work needed for this component of the eco-system.  There are several approaches you could take to do this:

    • Triggers on the database to populate a table monitored by SQL Server to send events directly to SharePoint
    • Triggers on the database to populate a table monitored by a windows service
    • No triggers and just a simple row timestamp monitoring that checks for any insertsupdatesdeletes and sends the notification
    • Code that sends changes to an event queue like MSMQ or BizTalk that will then send it to SharePoint

    Each of these have advantages and drawbacks in terms of time and complexity.  No matter what, you need some component that will tell SharePoint that something has changed.  In the code samples I provide, you have a simple console application that will allow you to send the notification to SharePoint for testing purposes.

    So now that you have something that can send a message to SharePoint, what does that message look like?  This process of communication is un-documented anywhere, until now, and is the real meat of this post!  It turns out that there are two message parsers that come out of the box with SharePoint.  These include an IdentityParser and an ODataEntryContentNotificationParser.  The difference between the two is that one only tells SharePoint that a set of identities has changed and the other actually can pass the changed properties of the item to SharePoint.  Both requires a completely different style of ATOM message to be sent.

    In the case of the IdentityParser, it is looking for a message that looks like the code snippet below.  This particular piece of XML must have a valid XPath to "/a:feed/a:entry/a:content/m:properties/b:BcsItemIdentity".  If it does not, then any call to "retrieve the item" in your event receiver will fail.  The message will be received and the event receiver will execute as long as you don't make calls to the various properties that will not be available without the id.  You should also be aware that none of the other items that live outside of the XPath are ever looked at and can be anything you like as they are not validated or used:

    <?xml version="1.0" encoding="utf-8" standalone="yes"?>
    <feed xml:base="http://services.odata.org/OData/OData.svc/"
    xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
    xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
    xmlns:b="http://schemas.microsoft.com/bcs/2012/"
    xmlns="http://www.w3.org/2005/Atom">
    <entry>
    <title type="text">Customers</title>
    <id>http://www.northwind.com/customers</id>
    <author>
    <name>External System</name>
    </author>
    <content type="application/xml">
    <m:properties>                            
    <b:BcsItemIdentity m:type="Edm.String"><CustomerID>ALFKI</CustomerID></b:BcsItemIdentity>
    <d:Name>Customer</d:Name>
    </m:properties>
    </content>
    </entry>
    </feed>

    In the case of the ODataEntryContentNotificationParser, you must pass an XML message that has a valid XPath to "/a:entry/a:link/m:inline/a:entry".  The XML node in this XPath must itself be a valid ATOM message.  Again, everything that is outside the XPath seems to be ignored and only the embedded ATOM Message is used:

    <?xml version="1.0" encoding="utf-8" standalone="yes"?>
    <atom:entry xml:base="http://sphvm-92723:90/WcfDataService2.svc" xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:atom="http://www.w3.org/2005/Atom">
    <atom:category term="NorthwindModel.EntitySubscribe" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
      <content type="application/xml">
         <m:properties>
          <d:SubscriptionId m:type="Edm.Int32">1</d:SubscriptionId>
          <d:EntityName>Customers</d:EntityName>
          <d:DeliveryURL>{11}</d:DeliveryURL>
          <d:EventType m:type="Edm.Int32">{12}</d:EventType>
          <d:UserId m:null="true" />
          <d:SubscribeTime m:type="Edm.Binary">AAAAAAAABE4=</d:SubscribeTime>
          <d:SelectColumns>*</d:SelectColumns>
        </m:properties>
      </content>
      <id>OuterId</id>
      <atom:id>http://sphvm-92723:90/WcfDataService2.svc/EntitySubscribes(1)</atom:id>
      <atom:link href="EntitySubscribe(1)" rel="self" title="EntitySubscribe"/>
      <atom:link href="Customers(2147483647)" rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/customers" type="application/atom+xml;type=entry">
        <m:inline>
          <entry xml:base="http://sphvm-92723:90/WcfDataService2.svc/" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom">
            <id>http://sphvm-92723:90/WcfDataService2.svc/Customers('57849')</id>
            <title type="text" />
            <updated>2012-04-30T11:50:20Z</updated>
            <author>
            <name />
            </author>
            <link rel="edit" title="Customer" href="Customers('57849')" />
            <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/Orders" type="application/atom+xml;type=feed" title="Orders" href="Customers('57849')/Orders" />
            <link rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/CustomerDemographics" type="application/atom+xml;type=feed" title="CustomerDemographics" href="Customers('57849')/CustomerDemographics" />
            <category term="NorthwindModel.Customer" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
            <content type="application/xml">
              <m:properties>
                <d:CustomerID>{0}</d:CustomerID>
                <d:CompanyName>{1}</d:CompanyName>
                <d:ContactName>{2}</d:ContactName>
                <d:ContactTitle>{3}</d:ContactTitle>
                <d:Address>{4}</d:Address>
                <d:City>{5}</d:City>
                <d:Region>{6}</d:Region>
                <d:PostalCode>{7}</d:PostalCode>
                <d:Country>{8}</d:Country>
                <d:Phone>{9}</d:Phone>
                <d:Fax>{10}</d:Fax>
              </m:properties>
            </content>
          </entry>
        </m:inline>
      </atom:link>
      <title>New Customer entry is added</title>
      <updated>2011-07-12T09:21:53Z</updated>
    </atom:entry>

    In addition to the two out of the box parsers, there is a setting that specifies "Custom".  By implementing our own NotificationParser, we can format the message in a much more simple and efficient way such as JSON. The main method to implement is the ChangedEntityInstance method.  As part of this parser, you will be passed the message byte array in the initialization and it would be your responsibility to parse the message and pass back the entity instance.

    public abstract class NotificationParser
    {
        // Methods
        protected NotificationParser()
        {
        }

        public void Initialize(NameValueCollection headers, byte[] message, IEntity entity, ILobSystemInstance lobSystemInstance)
        {
            if (message == null)
            {
                message = new byte[0];
            }
            if (entity == null)
            {
                throw new ArgumentNullException("entity");
            }
            if (lobSystemInstance == null)
            {
                throw new ArgumentNullException("lobSystemInstance");
            }
            this.NotificationHeaders = headers;
            this.NotificationMessage = message;
            this.Entity = entity;
            this.LobSystemInstance = lobSystemInstance;
        }

        // Properties
        public virtual IEntityInstance ChangedEntityInstance
        {
            get
            {
                Identity changedItemIdentity = this.ChangedItemIdentity;
                return this.Entity.FindSpecific(changedItemIdentity, this.LobSystemInstance);
            }
        }

        public abstract Identity ChangedItemIdentity { get; }
        protected IEntity Entity { get; private set; }
        protected ILobSystemInstance LobSystemInstance { get; private set; }
        public NameValueCollection NotificationHeaders { get; private set; }
        public byte[] NotificationMessage { get; private set; }
    }

    Summary:

    Now that you have all the pieces, you can download the code I have placed on the code.msdn.microsoft.com site here.  This code has a BCS OData model fully working with the subscriber methods.  As code generation techniques have become more common place, OData layers generated via Visual Studio are more common as well.  It will be well worth implementing these new BDC method stereotypes in your OData model and in your OData services to provide the ability to be notified when data changes in your remote systems!

    SharePoint’s Navigation Struggles – A Bit of History

    SharePoint has always struggled with navigation.  Every customer I have been to we have either torn out the navigation completely, or had to customize it in some way to achieve their goals.  SharePoint 2013 is no exception to these navigation woes, and if anything, we have taken a step backwards. In looking at what customer's want, I have found there are three types of navigation:

    • Global-global – this navigation exists across the top of all sites in a SharePoint farm and is exactly the same.
    • Local-global – this navigation is for the site collection navigation and is the same across the site collection
    • Local-local – this navigation is for the site navigation and shows important lists and pages on a site

    To be clear, SharePoint has never had "Global-Global" navigation.  It has however had the "Local-global" and "local-local" navigation, with some helper tools for inter-site collection navigation.  In order to fully point out this backwards momentum, let's take a look at some of the previous out of the box UIs:

    2007:

     

    As you can see we started with some nice tabbed navigation, with the quick launch.  It was nice and simple, we also had the breadcrumb to get us back up to anywhere in the site tree.  However, to really get valuable navigation, you had to enable the
    PublishingSite and PublishingWeb features to get the fly out navigation
    of subsites:

     

    2010:

     

    In 2010, we lost the breadcrumb control that was directly above the place holder main area, but we gained the folder icon in above the ribbon.  Again, we have to enable the publishing features to get any real value out of the local-global.

    2013:

     

    In 2013, the local-global was moved from directly below the ribbon, to inside the ribbon.  We also lost the folder icon.

    As you can see, we have lost (removed from the master pages yet still exist in the code base) some great functionality over the years.

    It had seemed that Metadata navigation was in many of our minds (based on marketing hype) the solution for the "global-global" navigation problem that SharePoint has had for many years, but alas it is not:

     

    Unfortunately, if you have actually tried to implement it, you will run into these sets of errors:

    Trying to use the term store more than once:

     

    Using Windows PowerShell to set it:

    $navSettings.CurrentNavigation.Source = 1;
    $navSettings.CurrentNavigation.TermStoreid = new-object System.Guid("6ffccd26-5aba-44a5-83a9-60a468261054");
    $navSettings.CurrentNavigation.TermSetId = new-object System.Guid("420d7ef6-6040-4138-8e15-1d04773955ba")
    $navSettings.GlobalNavigation.Source = 1;
    $navSettings.GlobalNavigation.TermStoreid = new-object System.Guid("6ffccd26-5aba-44a5-83a9-60a468261054");
    $navSettings.GlobalNavigation.TermSetId = new-object System.Guid("420d7ef6-6040-4138-8e15-1d04773955ba")
    $navSettings.Update()

    This gets you the dreaded "Error loading navigation: The Managed Navigation term set is improperly attached to the site":



    So where does that leave us?  It means we have to move the Local-Global of 2013 from the ribbon, back to where it was in 2010 (directly below the ribbon) and then implement our own global navigation provider in the ribbon, we also add back the breadcrumb to the content placeholder main:

     

    How did I do this?  Well, its not pretty.  Especially when you realize you have to make your own custom master page, not only that, but just about every site definition has some variation on the basic seattle.master.  This shows up in the Search Center and My Site templates.  Which means you pretty much have to implement a custom master page for each one (which I have done in 2013).  Although it is a lot of work, it is well worth it when your customer gives you the thumbs up and is actually able to navigate the sites inside and outside of SharePoint with ease!  A few steps to get here:

    • Create a custom navigation provider that points to a global list with a hierarchy of elements (or where ever)
    • Remove the ribbon navigation (local-global), replace with your navigation provider menu control
    • Put the removed ribbon navigation directly below the ribbon, add the older CSS to give it the top and bottom border
    • Add the breadcrumb directly above the content place holder main area

    In a second spin…what does that mean for O365 customers that want to migrate their intranet or anything else they did to SharePoint Online?  Its not good I'm afraid.  You aren't allowed to push your own code, so the navigation providers are out.  That means you won't get global navigation in SharePoint online unless you figure out a way to do it via a custom <DIV> that is populated based on Javascript methods from some data source (preferably from a SPList via REST).  But now you have to implement all the flyout code (unless their is some reliable and simple SDK you can find on the internet to do it and plug into your SP Rest calls and hope that SharePoint CSS doesn't mess with it).   And don't forget about the lovely security in JavaScript that prevents cross domain calls.

    One other option I thought about for the global-global was to put it in the first row of the 2013 page.  I had lots of issues with the div tags and flyouts of the basic asp and SharePoint menu's.  Although I'm sure I could figure it out at some point, it ended up being much easier just to put the global-global in the ribbon and move the local-global down.

    So, with this little bit of education, hopefully we can get someone in the product team to get serious about providing this type of functionality to meet our customer's needs (global-global) without us having to do anything.  I still think MMS is the way to go, but I'd guess their are issues with doing this in O365 and that's why it was limited to one site collection.

    Chris

    Taking Office Web Apps 2013 and SharePoint 2013 integration one step further

    Office Web Apps and SharePoint are integrated in several very cool ways.  For example, you can see the callout menu on a document library to create new documents:

    You can view document previews in the callout of a document:

    You can see the document preview in the callout of a search result:

    One of the things I noticed right off the bat with Office Web Apps 2013 was that the call out menu has the same options for all office content types of a certain type.  This causes some issues when users try to click on the links.

    An example of this is the "Follow" link.  If you click on the "Follow" link of a document that is stored in a file share, it will error with the following:

    I'm sure you could get into the display template and remove that action, but it wasn't as important as some of the other items on my plate when it comes to OWA and SharePoint Search integration.  For example, you will also notice that you do not get the ability to view a document that lives in the file share in the thumbnail previewer.  I find this unacceptable and so did my customer!  So…I started out on the path to figure out how to get it working!  Here's what I came up with.

    The first thing I thought was cool about Office Web Apps is the ability to "embed" documents in your web pages that live pretty much anywhere.  If you open the "http://svr-owa/op/generate.aspx" page, you will see you have the ability to create an embedded link:

    Once you create the link to a fileshare document you can put it on any of your HTMLSharePoint pages. However, after creating the link, if you try to open it, you will typically get a "File not found" error:

    Turns out this is the main wrapper error around just about anything error that happens in OWA.  The main reason that the files won't display is because of permission issues on the Office Web App server.  You see, it doesn't open the file as you, it opens it as the Office Web Apps service identity!  You can find out what your identity is by opening the IIS Manager on the OWA server and looking for the OpenFromUrlWeb app pool:

    By default, this account is set to "Network Service".  This account can't do much with secured network based resources unless you assign it those permissions.  This means doing the whole give DOMAINCOMPUTERNAME$ access to your file share and all the OWA servers in your OWA farm.  This then implies that any software that is running on your OWA servers will then have access to your file share.  If only OWA is installed, then you should be ok, but don't forget about all the other service that are running as NETWORK SERVICE on the services applet.  I'm not a big fan of this and in my case, I made it a specifc OWA domain account that has at least read access to the shares that contain your data. NOTE:  This change is not supported by Microsoft, but let's be clear about what unsupported means.  There are two types of unsupported features, ones they don't have the scripts for at the first level of support, and the ones they do have at second and third levels of support.  In my eyes, this falls into the first category and is a simple change but they have not trained anyone on how to do this or troubleshoot it, so this is at your own discretion.  So why does OWA work like this?  The reason lies in the way that Office Web Apps must access a file
    in order to render it.  When it is stored in SharePoint, SharePoint
    will pass an OAuth access token that OWA can use to access the file as the
    user.  This will always ensure that OWA accesses a file that the user can access.  When accessing through the OpenFromUrl means, it has to access it
    as itself using regular windows auth. This has some security implications.  Search does only show files the user has access too, so that isn't really a security hole when using the method below, but where things do get interesting if a savvy user figures out how to construct the web page viewer url to a file they don't have access too.  This can elevate their privileges and allow them to look at a file but not change it.  In this case, you should place a "Deny" ACL on the OWA account (whether domain or network service) to prevent it from reading the document.

    If you aren't comfortable in making these changes, then don't. You can take the also not so great approach of simply copying your file share (all 5TB of it) into SharePoint to gain the functionality.

    Once that is done, you should now be able to see your documents open in the embedded link generated:

    So now that, that works.  How do I get SharePoint Search to open the file as a preview?  Hmmm…tricky.  Let's look at how it does files that live in SharePoint.  In order to do this, we have to open the search display templates for Office documents.  The first one is for Word documents and it is called "Item_Word_HoverPanel.html" (yeah, they all have a different display template so you'll have to repeat the steps for each).  You will notice some JavaScript that looks for a specific property ("ServerRedirectedEmbedURL"):

    If this property exists (it is only populated by search if the content lives in SharePoint and is of a specific file type), then it will render the Office Web Apps preview area in the callout.  In order to get OWA to work for fileshare files, we have to add the url that renders the link to the browser view or the embedded iframe.  I wasn't able to get the iframe to work, but I did get a link that users can click on to get another browser to open with the file.  You can do this by modifying the file to have the following:

     <!–#_
            var i = 0;
            var wacurlExist = !Srch.U.e(ctx.CurrentItem.ServerRedirectedURL) && !Srch.U.e(ctx.CurrentItem.ServerRedirectedEmbedURL);
            var id = ctx.CurrentItem.csr_id;
            ctx.CurrentItem.csr_FileType = Srch.Res.file_Word;
            ctx.CurrentItem.csr_ShowFollowLink = true;
            ctx.CurrentItem.csr_ShowViewLibrary = true;
            ctx.currentItem_IsOfficeDocument = true;
            var find = '/';
            var re = new RegExp(find, 'g');
            
            function replaceAll(find, replace, str) {
      return str.replace(new RegExp(find, 'g'), replace);
    }
            
    _#–>
            <div class="ms-srch-hover-innerContainer ms-srch-hover-wacSize" id="_#= $htmlEncode(id + HP.ids.inner) =#_">
                <div class="ms-srch-hover-arrowBorder" id="_#= $htmlEncode(id + HP.ids.arrowBorder) =#_"></div>
                <div class="ms-srch-hover-arrow" id="_#= $htmlEncode(id + HP.ids.arrow) =#_"></div>
                <div class="ms-srch-hover-content" id="_#= $htmlEncode(id + HP.ids.content) =#_" data-displaytemplate="WordHoverPanel">
                    <div id="_#= $htmlEncode(id + HP.ids.header) =#_" class="ms-srch-hover-header">
                        _#= ctx.RenderHeader(ctx) =#_
                    </div>
                    <div id="_#= $htmlEncode(id + HP.ids.body) =#_" class="ms-srch-hover-body">
                    <!–#_
                    if ((ctx.CurrentItem.FileType == "docx") && Srch.U.n(ctx.CurrentItem.ServerRedirectedEmbedURL))
                    {
                            ctx.CurrentItem.csr_DataShown = true;
                            ctx.currentItem_ShowChangedBySnippet = true;

                    _#–>
    <a href="https://svr-owa.contosocom/op/view.aspx?src=_#= $urlHtmlEncode(ctx.CurrentItem.Path.replace('file:','').replace(re,'%5C')) =#_" target="_blank">View File</a>

                    <!–#_
                    }
                    _#–>

    <!–#_
                        if(!Srch.U.n(ctx.CurrentItem.ServerRedirectedEmbedURL))
                        {
                            ctx.CurrentItem.csr_DataShown = true;
                            ctx.currentItem_ShowChangedBySnippet = true;
    _#–>
                            <div class="ms-srch-hover-viewerContainer">
                                <iframe id="_#= $htmlEncode(id + HP.ids.viewer) =#_" src="_#= $urlHtmlEncode(ctx.CurrentItem.ServerRedirectedEmbedURL) =#_" scrolling="no" frameborder="0px" class="ms-srch-hover-viewer"></iframe>
                            </div>
                            <div class="ms-srch-hover-wacImageContainer">
                                <img id="_#= $htmlEncode(id + HP
    .ids.preview) =#_" alt="_#= $htmlEncode(Srch.Res.item_Alt_Preview) =#_" onload="this.parentNode.style.display='block';" />
                            </div>
    <!–#_
                        }
                        else
                        {
                            ctx.CurrentItem.csr_ShowLastModifiedTime = true;
                            ctx.CurrentItem.csr_ShowAuthors = true;
                        }

                        if(!Srch.U.e(ctx.CurrentItem.SectionNames))
                        {
                            ctx.CurrentItem.csr_DataShown = true;
    _#–>
                            <div class="ms-srch-hover-subTitle"><h3 class="ms-soften">_#= $htmlEncode(Srch.Res.hp_SectionHeadings) =#_</h3></div>
    <!–#_
                            var sectionNames = Srch.U.getArray(ctx.CurrentItem.SectionNames);

                            var sectionIndexes = Srch.U.getArray(ctx.CurrentItem.SectionIndexes);
                            if(!Srch.U.n(sectionIndexes) && sectionIndexes.length != sectionNames.length)
                            {
                                sectionIndexes = null;
                            }

                            var hitHighlightedSectionNames = Srch.U.getHighlightedProperty(id, ctx.CurrentItem, "sectionnames");
                            if(!Srch.U.n(hitHighlightedSectionNames) && hitHighlightedSectionNames.length != sectionNames.length)
                            {
                                hitHighlightedSectionNames = null;
                            }

                            var numberOfSectionsToDisplay = Math.min(Srch.SU.maxLinesForMultiValuedProperty, sectionNames.length);
                            var sectionsToDisplay = new Array();

                            var usingHitHighlightedSectionNames = Srch.SU.getSectionsForDisplay(
                                hitHighlightedSectionNames,
                                numberOfSectionsToDisplay,
                                sectionsToDisplay);

                            for(i = 0; i < sectionsToDisplay.length; ++i)
                            {
                                var index = sectionsToDisplay[i];
                                if(Srch.U.n(index))
                                {
                                    continue;
                                }

                                var tooltipEncoded = $htmlEncode(sectionNames[index]);

                                var htmlEncodedSectionName = "";
                                if(usingHitHighlightedSectionNames)
                                {
                                    htmlEncodedSectionName = hitHighlightedSectionNames[index];
                                }
                                else
                    &nbs
    p;           {
                                    htmlEncodedSectionName = tooltipEncoded;
                                }
    _#–>
                                <div class="ms-srch-hover-text ms-srch-ellipsis" id="_#= $htmlEncode(id + HP.ids.sectionName + i) =#_" title="_#= tooltipEncoded =#_">
    <!–#_
                                    if(!Srch.U.n(sectionIndexes) && sectionIndexes.length >= i && !Srch.U.e(sectionIndexes[index]) && wacurlExist)
                                    {
                                        var encodedSectionIndex = "&wdparaid=" + $urlKeyValueEncode(sectionIndexes[index]);
    _#–>
                                        <a clicktype="HoverSection" linkIndex="_#= $htmlEncode(i) =#_" href="_#= $urlHtmlEncode(ctx.CurrentItem.ServerRedirectedURL + encodedSectionIndex) =#_" target="_blank">
                                            _#= htmlEncodedSectionName =#_
                                        </a>
    <!–#_
                                    }
                                    else
                                    {
    _#–>
                                        _#= htmlEncodedSectionName =#_
    <!–#_
                                    }
    _#–>
                                </div>
    <!–#_
                            }
                        }
    _#–>
                        _#= ctx.RenderBody(ctx) =#_
                    </div>
                    <div id="_#= $htmlEncode(id + HP.ids.actions) =#_" class="ms-srch-hover-actions">
                        _#= ctx.RenderFooter(ctx) =#_
                    </div>
                </div>
    <!–#_
                if(!Srch.U.n(ctx.CurrentItem.ServerRedirectedEmbedURL)){
                    AddPostRenderCallback(ctx, function(){
                        HP.loadViewer(ctx.CurrentItem.id, ctx.CurrentItem.id + HP.ids.inner, ctx.CurrentItem.id + HP.ids.viewer, ctx.CurrentItem.id + HP.ids.preview, ctx.CurrentItem.ServerRedirectedEmbedURL, ctx.CurrentItem.ServerRedirectedPreviewURL);
                    });
                }
    _#–>
            </div>

    This will get the "View File" link to display for the file.  You can then click on it to have the file open in a new browser window and the users can now view the files!

    Enjoy!
    Chris

    Content Type Hub publishing in mixed mode sites (14 vs 15) – Upgrade and migration planning

    I noticed this post today from Brad Teed…It's a good one!:

    http://sharepointsblog.com/2013/05/06/sharepoint-2013-content-type-hub/

    I would have assumed that the content types would be the same in either mode, but evidentially not!  There is a check in the code at Microsoft.SharePoint.Taxonomy.ContentTypeSync.Internal.PackageInfo.ValidatePackageVersion().  It checks the hub site collection compatibility level and then checks the current web's site collection compatibility level.  If it doesn't match, then it errors out.  This is a bummer for those that used the Content Publishing Hub in 2010 and must now upgrade all site collections at the same time to "15" in order to get updates from the hub.

    However, there is something else interesting here.  The APIs that are being used by the content type hub publishing are the Content Deployment APIs.  So the package that it is checking is a simple export/import package that you would do with anything else.  The thing that jumps out at me, that I would have assumed one could do, is export from 2010 and import into 2013 using those APIS, but it look like that is not a good path to take being that they specifically are denying you from doing this.  This tells me that you should upgrade your older sites to 2013, then do an export, then do an import into 2013.

    Chris