Category: Technology

Sharepoint or, as a previous boss would have called it, “bloody Sharepoint” has a powerful search service that can index not only Sharepoint documents and sites but also external content. Unfortunately the flexibility of the service breeds considerable complexity and the out-of-the-box behaviour leaves a little to be desired.

Our most recent issue was that we had the standard definition of the ‘Title’ managed property which has the crawled property ‘ows_Title’ as the highest priority. This, you would think, would mean that the the Title entered on the List Item for the a document via the Edit Properties form would be the title for each document in search results. You would be wrong.

It turns out that there is a feature called ‘Optimistic Title Override’ that tries to guess a better title based on the content of the first page of the document (such as a bold, large line of text on page 1). Similar features override the Author (CreatedBy) and LastModifiedTime.

In Sharepoint 2010 these features can be disabled via the registry. In 2013 they can’t be disabled – thankfully we are running 2010 at the moment!

We’ve learnt a lot of lessons during the implementation of our Sharepoint project. The number one lesson to take home is this: Sharepoint can’t handle a large number of lists (Document Libraries) in a single site.

We tried a structure with 20000+ libraries in site and, while accessing each of the libaries was fine, it was impossible to enumerate the lists in Sharepoint designer and deploying content type changes took an eternity. Our new structure has 20 libraries each containing 1000 folders (Document sets). None of the previous problems exist.

The sharepoint limitations documentation talks about number of items in a list and various other limitations but never mentions any practical limitation on the number of lists per site… so let this be a warning: a top heavy design will topple over.

In trying a “bring the kitchen sink” approach to market, Sharepoint ends up almost failing to recognise that the kitchen is for cooking. My workplace has been implementing a relatively complex solution that involves the creation of more than 20000 document libraries each containing up to 10000 documents. As such, we have constantly been challenged by the inherit limitations in the capabilities of Sharepoint to handle large lists and large numbers of list.

You might rightly ask “why not just change the solution design to nicely fit the Sharepoint limitations”? True, that would be a reasonable suggestion if not for part of the whole Sharepoint rationale being a flexible system. My issue is that many of these limitations seem to stem from the underlying schema and could easily be overcome if Microsoft dedicated resources to ‘fix’ (IMHO) the schema.

Sharepoint stores every list in the AllLists table of the content database and all ListItems (Documents etc) in the AllUserData table. The AllUserData table has a fixed number of varchar, integer, and other fields and jams each ListItem into one of more of these rows. It’s a highly inefficient design but highly flexible.

The downside of the design is that all our data for 20000+ lists and 650000+ documents are in two tables. One of the tables is littered with unrequired columns and our data is totally inaccessible using SQL due to the convoluted schema. So what could be done? Microsoft could take a CRM-like approach and instead create a new ‘ContentTypeData’ table for each content type with columns that actually match the content type schema.

Advantages: No more index tables (use SQL Server indexing), Tables/Views for use with SSRS that can be directly queried (instead of needing to create some XML service or a data warehouse to get data to SSRS)

Disadvantages: Changing the underlying data structure need not impact the Object model at all, so aside from requiring a large upgrade process when the schema changes, I can’t see any real disadvantages. Can you?