Welcome!

Java Authors: Maureen O'Gara, Bruce Armstrong, Liz McMillan, Walter H. Pinson, III, Yakov Werde

Related Topics: Java, .NET, CMS

Java: Article

Integrating Content & Search Results with SharePoint

How We Built the SharePoint Connector for Confluence - Part 2

Integrated Search
The goal for integrated search is to have both SharePoint and Confluence show the same search results containing content from both systems.

Integrated Search Overview
Integrated search was implemented by setting up MOSS to search Confluence and redirecting the Confluence search to the MOSS search site. The latter was done rather easily through a custom Confluence plug-in setting, but the former is discussed below.

Initially we considered writing a custom Protocol Handler4 to search Confluence, but quickly realized that using the Web site content source in MOSS was going to be much quicker to implement. Unfortunately, we found that the Web site search didn't work if it required forms-based authentication, which is a typical setup for Confluence. Fortunately, our network engineer, Jerry Rasmussen, found a knowledgebase article5 discussing the problem with a solution (via a hot fix).

The manual setup for search was fairly involved requiring the user to configure several entities, one of which required editing an XML file and loading it through an AddRule.exe tool described in the knowledgebase article. This was because the hot fix only updated the API, not any of the data entry screens. Due to this involved process, we created custom search configuration screens.

In the end, using the hot fix provided in the KB article worked but wasn't a complete solution. This is because Web site searches from MOSS don't store ACL6 information when the content is being crawled by the search engine. The solution was to implement a custom security trimmer7.

The custom search configuration screens and custom security trimmer are discussed below.

Custom Search Configuration
What we needed was an easy way to configure the following search items:

  • Content Source
  • Crawl Rule (with forms-based authentication configuration)
  • Scope
  • Registered Security Trimmer
To allow for searching multiple Confluence sources we decided to create a screen for managing the list of Confluence search sources and for creating an individual one. As done with the site administration screen for the Web parts, we needed to have a custom action that referenced our configuration screen.

<CustomAction
    Id="ConfluenceSearchSettings"
    GroupId="Search"
    Location=
    "Office.Server.ServiceProvider.Administration"
    Sequence="33300"
    Title="Confluence Search Settings" >
    <UrlAction Url="_layouts/Atlassian/ManageConfluenceSearch.aspx" />
</CustomAction>

Unlike the Web parts, this custom action is defined in a feature scoped to the farm level instead of the site collection. The result of this custom action is the link shown in the Shared Services Administration home page under the "Search" group (see Figure 2). The Shared Services Administration home page is found by going to SharePoint 3.0 Central Administration and clicking on the SSP link (e.g., "SharedServices1").

Similar to the site administration screen (see Figure 3 and Figure 4), we used existing SharePoint administration screens to create our two administration screens. To keep the created entites together a tagging mechanism was used. The ContentSource class has a Tag property that we used to store a semicolon delimited string of IDs for the security trimmer, crawl rule, and scope.

The more difficult aspect was figuring out how to register a security trimmer programmatically8. Documentation on how to do this with stsadm is easy to find, but figuring out how to do it without shelling out to stsadm was a little more challenging.

Custom Security Trimmer
A custom security trimmer is a .NET interface that runs at query time to determine the URLs in the search results that the current user has access to. The primary method for a security trimmer is the CheckAccess method.

public BitArray CheckAccess(
    IList<string> documentCrawlUrls,
    IDictionary<string, object> sessionProperties)

The implementation of the security trimmer takes a subset of the documentCrawlUrls provided by culling out pages we didn't want to show in the search results. This was a way to implement a more sophisticated exclusion crawl rule than you can through the search configuration. Then we simply call a Confluence Web Service that takes in a set of URLs and returns an array of Booleans. Finally, we merge the skipped URLs with the Confluence permissions and return a BitArray.

Since this security trimmer runs at query time, performance must be a consideration. However, the security trimmer is only provided a relatively small number of URLs if the ratio of URLs accessible by the user to total URLs is relatively high. Basically the query engine wants to show enough results to fill a page. If the search results page shows 10 results per page, the query engine may provide 15 URLs at a time. If the user has access to less than 10, another 15 may be requested. When the user requests the next search results page, the security trimmer is invoked again.


More Stories By Kirk Liemohn

Kirk Liemohn is a principal software engineer with ThreeWill. His recent project experience includes Microsoft Office SharePoint Server (MOSS) enterprise search projects as well as a Windows SharePoint Services (WSS) business analysis portal. Kirk manages a SharePoint blog at http://www.implementingsharepoint.com.

More Stories By Chris Edwards

Chris Edwards is a senior software engineer with ThreeWill. His project roles have ranged from development/technical lead to development resource. He is certified as MCSD using Microsoft .NET and as MCTS: SharePoint Services 3.0, Application Development. Chris manages resource links related to WSS at http://wssresourceguide.com.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
shirley 03/30/09 04:55:53 AM EDT

We can assist here as we specialise in developing and implementing SharePoint 2007 – that’s all we do. There is more information on this at http://www.nsynergy.com or mail to info@nsynergy.com.