Tuesday 8 May 2007

Crawled properties for HTML documents

We are currently running an intresting project including the index engine in cooperation with Tamino XML server. One action in the project is to index HTML documents from the Tamino XML server and use the MOSS Query Web Service to present search results.

The HTML contains the classic meta tag ">meta name="Keywords" content=""<" which still provides information to some crawlers. Here we encounter a problem. After the content source had been setup and the HTML documents had been crawled the "Keywords" properties didn´t show up in the crawled properties view, but other "meta" tags was, i.e ">meta name="Region" content="Sweden"<". For some reason the "Keywords" meta tag is skipped.

The solution to this case was to add another meta tag, "KeywordsCust" and add the same information as in "Keywords" from the source. After a full crawl the meta tag apperad in crawled properties, then just map it to "Keywords" in managed properties and you are ready to go.

Two test tools for the Query Web Service in MOSS / SPPS
http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=89b3cda7-aad9-4919-8faf-34ef9b28c57b
http://www.mosssearch.com/searchwebservice.html

1 comment:

Anonymous said...

thanks for the post. I referenced you at http://forums.technet.microsoft.com/en-US/sharepointsearch/thread/ea8c9ee1-85b9-4f92-b382-76818138800e/