ATTENTION: Database Leak in SharePoint 2010!!!

I have been doing a massive Oracle Portal migration, 3000+ pages of content, with a targeted subset of that in my dev envrionment at just over 3GB of content.  I noticed the other day that my development content database kept getting bigger and bigger.  Running a "dbcc checkdb" pointed me to the largest tables.  After clearing the Audit logs, EventLogs and everything else I could think of, then shrinking the database files, the databse size was still WAY bigger than what it was supposed to be (it should be 35GB, but it is currently 72GB).  The dbcc command pointed to the AllDocStreams table as the source of the problem.  I thought it possible the new web recycle bin may be the culprit, so I set out to find the root cause and prove to the community that "Redmond, we have a problem"

My first test was to create a web and then delete it over and over again.  That didn't change the row sizes in the database.  Next test was to do the same thing, only add a document to the doc library.  That didn't cause any leaks.  The last step was to enable versioning and to upload a document a few times, then delete the webs…VIOLA!  Found the leak!  Want to try this on your own…here is the PowerShell that proves that we have a big issue here:

function CheckRows($table, $conn)
{
$sql = "select count(*) as count from $table";
$cmd = new-object system.data.sqlclient.sqlcommand($sql, $conn)

$reader = $cmd.executereader()

while($reader.read())
{
$line = $reader["count"].tostring()
}

$reader.close()

"$table has " + $line + " rows"
}

function ReportRowCounts($cdb)
{

$connString = $cdb.LegacyDatabaseConnectionString

$conn = new-object system.data.sqlclient.sqlconnection
$conn.connectionstring = $connstring
$conn.open()

CheckRows "AllDocs" $conn
CheckRows "AllDocStreams" $conn
CheckRows "AllDocVersions" $conn

$conn.close()

}

function UploadFile($web)
{

$list = $web.lists["Shared Documents"]
$list.EnableModeration = $true
$list.EnableMinorVersions = $true
$list.contenttypesenabled = $true
$list.update()

$spFolder = $list.rootfolder
$spFileCollection = $spFolder.Files
$file = get-item "c:scriptsexport.zip"
$spfile = $spFileCollection.Add($file.name,$file.OpenRead(),$true)
$spfile.checkout()
$spfile.checkin("checkin")
$spfile.approve("approved")

}

$cdb = get-spcontentdatabase "WSS_Content_Contoso"
$web = get-spweb
http://www.contoso.com

ReportRowCounts($cdb)

#create a site
$subweb = $web.webs.add("Test1", "Test1", "", 1033, "STS#0", $false, $false);

ReportRowCounts($cdb)

$subweb.delete()

ReportRowCounts($cdb)

$subweb = $web.webs.add("Test1", "Test1", "", 1033, "STS#0", $false, $false);
UploadFile $subweb
UploadFile $subweb
UploadFile $subweb
UploadFile $subweb
UploadFile $subweb

ReportRowCounts($cdb)

$subweb.delete()

ReportRowCounts($cdb)

You will see the AllDocStreams table continues to get larger and larger.  This is the table that has your BLOBS!  The worst table to have a database leak!  From a high level, what does this mean?  It means if you turn on versioning, and say upload a 1Gb file and then make 5 changes to it, you will have 5GB of storage used in your database.  If you then delete the web, you will lose file pointers in the database and that 5GB will remain in the database forever and never get cleaned up.  Now, how likely is it that you are creating and deleting webs in production?  Not too many people will have a fancy process setup like this, but in Development and QA, you will.  I can't afford to keep restorting a "clean" copy of my development databases everytime it gets too big.

NOTE:  This bug shows up in all builds after SP1 (the web recycle build), my environment is the latest release build of 14.6112.5000.

Here's how to determine if you are having a problem and how BIG it is:

$global:count = 0
$global:size = 0

function CheckDatabase($cdb)
{
$connString = $cdb.LegacyDatabaseConnectionString
$conn = new-object system.data.sqlclient.sqlconnection
$conn.connectionstring = $connstring
$conn.open()

$sql = "select count(*) as count, sum(datalength(content)) as size from alldocstreams where id not in (select id from alldocs)";
$cmd = new-object system.data.sqlclient.sqlcommand($sql, $conn)
$reader = $cmd.executereader()

while($reader.read())
{
$count = [double]::parse($reader["count"].tostring())
$size = [double]::parse($reader["size"].tostring())

}

$reader.close()
$conn.close()

$global:Count += $count
$global:size += $size

}

$cdbs = get-spcontentdatabase

foreach ($cdb in $cdbs)
{
CheckDatabase $cdb
}

"Orphaned rows: " + $global:count
"Orphaned rows size: " + $global:size + " bytes"

Microsoft…you need to get us a cumulative updatehot fix RIGHT NOW.

UPDATE:  AUGUST 2012 Cumulative Update fixes this bug:
http://technet.microsoft.com/en-us/sharepoint/ff800847.aspx

Not one to say enjoy too, but at least I know what is going on now and so do you!
Chris

Follow me on twitter: @givenscj

Check out the previous blog post:
https://blogs.architectingconnectedsystems.com/blogs/cjg/archive/2012/04/20/Project-Server-PSI-WCF-bug-_2D00_-WSEC_5F00_CAT_5F00_UID-and-WSEC_5F00_GRP_5F00_UID.aspx