Umbraco Examine Search

I present to you my last research about Umbraco Examine. I was really surprised at how easy it is to work with this module.
At the end of the article, you will find documents that stay behind all that work.
What is Examine?
This is a provider-based Indexer/Searcher API and wraps the Lucene.Net indexing/searching engine. Working with Examine is very easy and allows you to query or index almost any content in a site.
Umbraco Examine is the Umbraco implementation of Examine. It is not exclusive to Umbraco and can be used as a completely stand alone component on any project that needs a fast Index.
License:
Examine is under Microsoft Public License (Ms-PL), which means we are free of charge to use it, but can not use the examine name, logo, or trademark. More info - here.
Implementation
- Basic configuration
All configuration comes with the main Umbraco installation.
You can start using examine search immediately.
Examine Terminology
Indexer - This is the object that performs the storing data into the index.
Searcher – The searcher is the object that performs the searching of data that is stored in the index.
Index Set - An index set is what defines an index, where the index is saved and how the information is stored in the index.
Naming conventions - Our Indexer, Searcher, and associated Index Set must all be named according to convention so that they match.
Conventions:
{name}Indexer
{name}Searcher
{name}IndexSet
Examples:
ExternalIndexer
ExternalSearcher
ExternalIndexSet
After installing Umbraco site look in the config folder. There is all config files that umbraco needs.
- /Config/ExamineIndex.config
By default is configured 3 IndexSets – InternalIndexSet, InternalMemberIndexSet and ExternalIndexSet.
All index sets that starts with ‘Internal’ prefix are internal umbraco and we don not have work there. The same is valid also for searchers and indexers.
Umbraco provides for us ExternalIndexSet:
<IndexSet SetName="ExternalIndexSet"
IndexPath="~/App_Data/TEMP/ExamineIndexes/External/" />
We can extend this configuration, but this will be seen in next sliders.
For now we can search in all site content(fields and document types).
- /Config/ExamineSettings.config
There are ExamineIndexProviders and ExamineSearchProviders. Also by three of kind – internal and external.
ExternalIndexer
<add name="ExternalIndexer"
type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"/>
We can extend it with these additional properties:
dataService - the type that this provider will instantiate in order to query Umbraco for the data that it requires. Generally this shouldn't need to change unless you want to use test data from a non-umbraco source or you have very custom requirements.
indexSet - explicitly specifies the index set to use. Generally this wired up based on naming convensions.
supportUnpublished - if you want the indexer to index content that is not published.
supportProtected - if you want the indexer to index content that is protected.
runAsync = will process the queue files into the index asynchronously, unless you are testing, this should always be true.
interval = how often the async service will process the file queue in seconds.
analyzer = the Lucene.Net analyzer to use when storing data. See: http://www.aaron-powell.com/lucene-analyzer
enableDefaultEventHandler = will automatically listen for Umbraco events and index when required.
logLevel="Info" or "Verbose". Info is the default, Verbose will show more detailed logs.
ExternalSearcher
<add name="ExternalSearcher"
type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
We can extend it with this additional property:
indexSet - explicitly specifies the index set to use. Generally, this is wired up based on naming conventions.
Note: In ExamineSearchProviders section, defaultProvider must be the name of our search provider - e.g., ExternalSearcher. This is by default.
- Custom configuration
<IndexSet SetName="ExternalIndexSet"
IndexPath="~/App_Data/TEMP/ExamineIndexes/External/">
<IndexAttributeFields>
<!-- Set here all page properties
that we want to be indexed. -->
<add Name="id" />
<add Name="version" />
<add Name="parentID" />
<add Name="writerID" />
<add Name="creatorID" />
</IndexAttributeFields>
<IndexUserFields>
<!-- Set here all site custom properties
that we want to be indexed. -->
<add Name="testTitle" EnableSorting="true" />
<add Name="testDescription" EnableSorting="true" />
</IndexUserFields>
<IncludeNodeTypes>
<!-- Set here all site document types
that we want to be indexed. -->
<add Name="Test"/>
</IncludeNodeTypes>
<ExcludeNodeTypes>
<!-- Set here all site document types
that we want to NOT be indexed. -->
</ExcludeNodeTypes>
</IndexSet>
Basic code examples
Creating custom searcher:
Difference between Lucene boolean clauses – MUST and SHOULD
Assume that there are two clauses: Clause A and Clause B.
Clause A has SHOULD, Clause B – SHOULD
This will imply that even if one of the clauses is satisfied (A or B), then the document will be a hit.
Clause A has MUST, Clause B – SHOULD
In this case, a document will be a hit when it "will" satisfy clause A, whether this document satisfies clause B or not.
But if the document does not satisfy clause A, then no matter whether it satisfies clause B or not, it will not be a hit.
Clause A has MUST, Clause B – MUST
In this case, a document will be a hit only when it satisfies "both" the clauses. If it will fail to satisfy even one of the clauses, then it will not be a hit.
There is a third clause - MUST_NOT. Use this operator for clauses that must not appear in the searching documents.
Another examples:
Fuzzy:
Boosting:
Building raw Lucene query:
Examine interface in Umbraco back office
Additional modules
Umbraco indexing for PDF files
This module will add new Indexer and Searcher to ~/Config/ExamineSettings.config called: "PDFIndexer" and "PDFSearcher"
and new Index Set to ~/Config/ExamineIndex.config called "PDFIndexSet"
After that you can search in selected pdf files, using the Examine Search API like examples below. But you need to specify your searcher:
var searcher = ExamineManager
.Instance
.SearchProviderCollection["PDFSearcher"];
You can install this module via NuGet – use user interface or type this command in Package manager console in Visual studio:
Install-Package UmbracoCms.UmbracoExamine.PDF
Conclusion and negatives
Lucene is super fast search engine and after the Umbraco Examine steps of this platform, we have in our hands very powerful tool.
As you see - working and managing all this resources is very easy, so I can not say any negative comments about Umbraco Examine.
Downloads: