Thursday, July 21, 2011

Pipeline Extensibility - Logging

The only current way to log information during the pipelining process is to write your file to the FAST service user's LocalLow directory.  I usually use this to debug what information is coming into the pipeline when I am transforming it. 

Imports System
Imports System.Collections.Generic
Imports System.Linq
Imports System.Xml.Linq
Imports System.Text
  
Namespace Search
    Class PipelineLogger
  
        Shared Sub Main(ByVal args As String())
  
'Log the input doc         
WriteOutInputFile(args(0), args(1))
  
        End Sub
  
        ' Write the input file to a location the application has access to write in.
        Private Shared Sub WriteOutInputFile(ByVal inputFile As String, ByVal username As String)
            
Dim localLow As String = "c:\users\" & username & "\appdata\LocalLow"
  
Dim pipelineInputData As String = IO.Path.Combine(localLow, "PipelineExtensibilityLog")
IO.Directory.CreateDirectory(pipelineInputData)           

Dim outFile As String = IO.Path.Combine(pipelineInputData, DateTime.Now.ToString("yyyyMMddHHmmss.ffff") & ".xml")
IO.File.Copy(inputFile, outFile)
            
Return
        End Sub
  
    End Class
End Namespace
Then in the %FASTSEARCH%\etc\pipelineextensibility.xml I input the fields that I wish to debug:
(MOSS2010.Search is the user running the FAST service)
<Run command="C:\FASTSearch\etc\PipelineLogger\PipelineLogger.exe %(input)s MOSS2010.Search >
    <Input>
        <CrawledProperty propertySet="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1" varType="31" propertyName="VIEWS"/>
        <CrawledProperty propertySet="11280615-F653-448F-8ED8-2915008789F2" varType="31" propertyName="url"/>
    </Input>
    <Output>
    </Output>
</Run>

Pipeline Extensibility - Checking for a Value

This post is going to cover an interesting quirk I have found when you attempt to retrieve the value of a web property (non-Sharepoint) in the pipeline extensibility.
Say for example you have a web page with the meta tag of COST. 

<meta name="COST" content="2000">
When the value come into your custom pipelineextensibility .exe it will look like this:

<CrawledProperty propertySet="d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1" varType="31" propertyName="COST">2000  2000</CrawledProperty>

Notice that the value is shown twice in the property.  The character that separates these values is the &H2029 (unicode paragraph seperator).

In order to work around this, you split the value based on that character.

Dim cArray As Char() = {ChrW(&H2029)}

Dim postCategoryValueArray As String() = catXE.Value.Trim().Split(cArray, System.StringSplitOptions.RemoveEmptyEntries)

Dim propValue As String = postCategoryValueArray(0)