Skip to content
Shad Storhaug edited this page Jan 23, 2015 · 16 revisions

BoboBrowse.Net is a faceted browse engine implemented on a top of Lucene.Net. Originally ported from wonderful Java library developed by John Wang (http://javasoze.github.com/bobo/ ). While Lucene.Net is good with unstructured data, BoboBrowse.Net fills in the missing piece to handle semi-structured and structured data.

Bobo Browse is an information retrieval technology that provides navigational browsing into a semi-structured dataset. Beyond the result set from queries and selections, Bobo Browse also provides the facets from this point of browsing.

Since this is for the most part a line-by-line port of the Java version of Bobo-Browse, this guide isn't intended to be a complete replacement for the original documentation, but is to supplement the Java documentation that is currently available in order to get .NET developers up to speed on the differences.

Differences Between the Java Version and This One

For the most part, we remained true to the original API, but some refactoring was done in order to account for differences between Java and .NET conventions, generics, and scoping. Here are some general rules to follow when following the documentation samples that are based on the Java version.

  1. All types from the com.browseengine.bobo.api Java package are in the root namespace BoboBrowse.Net, other packages follow the same namespace conventions as they do in Java.
  2. Casing of all methods and properties were changed to be Pascal case.
  3. Many set[Method] and get[Method] methods were changed to be properties. So, if you don't see a corresponding method, you should look for a property with the same name (minus the set or get prefix). For example, the BrowseRequest.setCount() method was changed to be a property named BrowseRequest.Count.
  4. All interface names were changed to follow the .NET convention of starting with an "I" prefix. For example, the FacetCountCollector interface is now named IFacetCountCollector.
  5. Since the generic type FacetHandler<D> is referenced in Java as a raw FacetHandler type or with a wildcard generic FacetHandler<?>, which isn't allowed in .NET, an IFacetHandler interface was created in order to pass the type from one place to another without the need to also refer to the generic type D. So, wherever you see FacetHandler or FacetHandler<?> as a variable type in the Java documentation, you would need to use IFacetHandler and wherever you see FacetHandler<D> in the Java documentation, you would use FacetHandler<D> in .NET.
  6. The Close() method was renamed to Dispose() in addition to implementing IDisposable to follow the .NET conventions for garbage collection. So instead of calling Close() explicitly in a try-catch block, you should use a using statement in .NET.
  7. A boolean parameter autoClose was added to each static factory method of BoboIndexReader to indicate whether the underlying IndexReader should be closed (and disposed) when Dispose is called on BoboIndexReader. By default, the reader is not closed. This is so it is easy to nest the using statements without getting an error message from Lucene.Net when the IndexReader is disposed twice.

It is helpful to have a working understanding of Lucene.Net, since Bobo-Browse uses Lucene.Net IndexReader in order to obtain index data. You would use Lucene.Net to build an index first as shown in Create a Browse Index, then use Bobo-Browse to create a browse request.

Concepts

Here are some of the basic concepts or objects to understand:

  • BrowseSelection: A selection or filter to be applied, e.g. Color=Red. Note that a browse selection can contain multiple selected options, which will typically expand the number of hits.
  • FacetSpec: Specifies how facets are to be returned on the result object, e.g. Top 10 facets of car types ordered by count with a min count of 5.
  • BrowseRequest: A set of BrowseSelections, a keyword text query, and a set of FacetSpecs.
  • BrowseFacet: a facet, (a string value with a hit count)
  • BrowseResult: Result of a browse operation.
  • FacetHandler: A plugin into the browse engine that knows how to manipulate facet data. See the list of Prebuilt Facet Handler Types.
  • BoboIndexReader: A Lucene.Net IndexReader containing a List of FacetHandlers.

A typical workflow goes something like this:

  1. Build a Lucene.Net index (using Lucene.Net). Typically, this is done out-of-band with the rest of the workflow - an index is built once and then searched multiple times. The fields used for BoboBrowse.Net are always string data type and usually are not tokenized.
  2. Set up facet handlers for the target fields in the Lucene.Net index. This can be done using Spring.Net XML configuration or during each request.
  3. Create a browse request instance and populate it with browse selections, an optional query, FacetSpec objects, and optional sorting instructions.
  4. Get an open Lucene.Net IndexReader.
  5. Decorate the Lucene.Net IndexReader with a BoboIndexReader.
  6. Create a BoboBrowser instance and pass it the BoboIndexReader.
  7. Call the BoboBrowser.Browse() method.
  8. Process the BrowseResult for display. The BrowseResult can contain a single page of hits (corresponding to Lucene.Net documents), a total hit count (all documents that matched), a list of BrowseFacets that match the search with the number of hits per facet, and other optional data depending on the options selected in the BrowseRequest.
  9. Clean up by calling Dispose() on the BoboIndexReader, BoboBrowser, and BrowseResult objects.

###Example###

// define facet handlers

// color facet handler
SimpleFacetHandler colorHandler = new SimpleFacetHandler("color");

// category facet handler
SimpleFacetHandler categoryHandler = new SimpleFacetHandler("category");


// opening a lucene index
Directory idx = FSDirectory.Open(new System.IO.DirectoryInfo("myidx"));

using (IndexReader reader = IndexReader.Open(idx, true))
{

	// decorate it with a bobo index reader
	using (BoboIndexReader boboReader = BoboIndexReader.GetInstance(
		reader, new IFacetHandler[] { colorHandler, categoryHandler }))
	{
		// creating a browse request
		BrowseRequest br = new BrowseRequest();
		br.Count = 10;
		br.Offset = 0;

		// add a selection
		BrowseSelection sel = new BrowseSelection("color");
		sel.AddValue("red");
		br.AddSelection(sel);

		// parse a query
		QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, 
			"contents", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30));
		Query q = parser.Parse("cool car");
		br.Query = q;

		// add the facet output specs
		FacetSpec colorSpec = new FacetSpec();
		colorSpec.OrderBy = FacetSpec.FacetSortSpec.OrderHitsDesc;

		FacetSpec categorySpec = new FacetSpec();
		categorySpec.MinHitCount = 2;
		categorySpec.OrderBy = FacetSpec.FacetSortSpec.OrderHitsDesc;

		br.SetFacetSpec("color", colorSpec);
		br.SetFacetSpec("category", categorySpec);

		// sort by color in descending order
		SortField colorSort = new SortField("color", SortField.STRING, true);    

		br.Sort = new SortField[] { colorSort };

		// perform browse
		IBrowsable browser = new BoboBrowser(boboReader);

		using (BrowseResult result = browser.Browse(br))
		{
			// search query result
			int totalHits = result.NumHits;
			BrowseHit[] hits = result.Hits;

			IDictionary<string, IFacetAccessible> facetMap = result.FacetMap;
			// obtain facet result
			IFacetAccessible colorFacets = facetMap["color"];
			IEnumerable<BrowseFacet> facetVals = colorFacets.GetFacets();
			foreach (BrowseFacet bf in facetVals)
			{
				Console.WriteLine(bf.Value + " " + bf.FacetValueHitCount);
			}
			// cleaning up
		}
	}
}

Resources

Bobo-Browse Resources

Lucene Resources