So as some of you know, I'm on each of the MSPress IT Pro books that will be released on SharePoint 2013. In one of the books, I'm talking about the architecture and how the content databases have changed. One of those changes is Shredded Storage. Yes, I have read Bill Baer's post here and he has a second one here, and I agree with everything he says (even though he doesn't really use the internal architecture's terminology in several cases), except for one point which I bring up at the end. I have also read Dan Holme's blog here. He pretty much comes to the same conclusions that I do. He even nailed it with the max 64K chuck size as the default (but the files don't max at the 64K limit, they end up somewhere below it).
So, even after reading all this technet and SPC12 mumbo jumbo. I really wanted to see this stuff working for myself and that required digging in deep to learn how it really works given all the blogs that are incorrect. In that exploration of the assemblies, classes, tables and stored procedures I have had a few ah ha moments. Some facts:
- Shredded Storage is document focused, and from a storage standpoint, only valuable with versioning turned on. It is not content database or farm focused. What does that mean? It means that when a document is "shredded" (SPHostBlob), the shreds are specific to the document and there are no database wide hashes that is done of the shred. What does this mean? It means that if you upload the same file in two different places, the same shreds will be created and no optimization takes place.
- This means that SharePoint does a "better" effort at managing blobs. It is not what I would call "great" or "stellar" like the solution that StorSimple built. You are still going to need an RBS solution that will aggregate the new "small shredded blobs" in a de-dupping fashion. But BE VERY CAREFUL WITH THIS, as Jeremy points out here, the RBS performance hit for small shreds is not worth it! At this point, I'd say that using Shredded Storage is more of a performance hit than it is worth because you now have a table that will have anywhere from 10-4500 shreds for each file. Multiply that times however many files you have...and that is a very large number of rows in a single table. You must also take into account the CPU cycle it takes appending the shreds together to be sent back to the client (this is NOT done by coalese on the SQL Server). All for the sake of reducing your storage by a marginal amount?
- Shredded Storage works by creating what I am calling a "stream map". This map is stored in the "DocsToStreams" table. The first time you upload a file it is shredded into the smaller parts (except there were a few cases where it does not shred the document and just put a single blob). When you upload a second updated file, the WFE will query for two non-file based shreds (that add about around 10K or more to each file). These configuration shreds have shred information that the WFE will use to determine what shreds need to be saved back to the database. As part of that, a new stream map is built that used any old shreds that didn't change, and the new shreds. All shreds are in a specific order called the BSN. Starting from the lowest BSN to the highest BSN for a "stream map". These shreds are then put back together (minus the configuration shreds) to re-create the file and then passed back to the calling client
- Some important classes to note in the object model:
- SPFileStreamManager - Responsible for Computing the Streams (ComputeStreamsToWrite)
- SPFileStreamStore - Saves the SPHostBlobs back to the database (PutBlobs), and gets blobs from the database (GetBlobs*)
- When using full Office clients, the full files are sent to SharePoint (only if the first call to cellstorage.svc fails...keep reading below). This is done by making an HTTP PUT to the url of the document. SharePoint via its Cobalt classes (CobaltStream which derives from the core Cobalt classes) is responsible for doing the actual shredding and comparisons. I did NOT see the Office client (2010 or 2013), do any type of intelligent saving of documents based on PackageParts in Office files (again this does work if the first call succeeds).
- I did confirm that the shreds are generated no matter what the file type is. This makes me think the shredding is somewhat random (upto the default 64K limit for a shred) and will not catch everything perfectly (ie half the change in one shred and half the change in another shred rather than one shredded component). I tried to see if I could figure out how the shredding was working but things get really crazy when you get into the Cobalt classes because their are too many abstract classes laying around.
UPDATE: See this blog post for my more recent experiences with editing with Office Clients and intelligent updating (it does work)...
Back to one of the above points. I was not able to get the Office clients (2010 or 2013) to do any type of smart updating as mentioned in Bill's post (with Word and PowerPoint with a particular client build). In other wards, I only wanted to send the changes I had made to SharePoint. I had never really tested this before and was looking forward to seeing it in action, but alas, it doesn't seem to do it at all. I'm guessing only in a multi-user editing mode (OWA?) will you see this type of feature being utilized (UPDATE: and this guess was correct! See below!). Just as an FYI, here's what I did with Office and SharePoint OM calls:
- Used our awesome friend Fiddler to monitor the traffic
- Uploaded a PowerPoint file to SharePoint
- Opened the PowerPoint in PowerPoint Client
- Removed a slide from the powerpoint, saved it
- Office sends the entire file to SharePoint (it is visible in the content-size header of the PUT request that this IS the case)
- SharePoint does the shredding and creates any new shreds
- Put the slide back in (Ctrl-Z), save the file, again the whole file is sent
- A new version and similar shreds are created, but some are retained from the first upload\second update
If anyone knows how to get this working reliably with Office Clients and SharePoint OM, please let me know. Otherwise, I'm going to have to say we are getting dupped on this whole "delta" changes from office client non-sense and there is no network optimization going on between Office Client and WFE. In this scenario, Shredded storage is really just saving us a few bytes here and there (if versioning is turned on) which does reduce the number of writes, but at the cost of more "reads" and CPU to rebuild the files.
UPDATE: Office Web Apps and Shredded Storage - this is where you will see the wire optimization between the client (OWA) and the WFE. It works like this. When a request for a file is made from SharePoint, OWA will ask the WFE to give it the file. The file will be built by the WFE from the shreds. When two users open the file for editing, a new type of shred container is created called a "partition". This partition contains shreds that each users is working on. This is where the shreds get broken down to their smaller XML pieces. As each part of a document is changed, new partitions are created. When someone wants to see what another person has done, they will request this new partition and it (and only it) will get sent to the client. Any changes that are made are also sent singly and the entire file is not.
So where does that put us? Here is the reality and correct details about Shredded Storage (as every blog on Shredded Storage is wrong on the internet). You have to test shredded storage with 3 things in mind to see if you really are getting any benefits (whether storage or network based).
- With versioning turned on (you gain the storage benefit, without it, you don't gain any storage benefit)
- With Office Web Apps (you gain the client to WFE network optimization and the "partitioning" effect in mutil-user editing)
- When using Office or SharePoint clients, no matter what, you gain a WFE to SQL Server network optimization only when writing a document back (however there is not a wire benefit between Office and SharePoint clients and the WFE when the first call fails to cellstorage.svc)
If you want more information at a super deep level, buy our MS Press book in October!