The categorization quality relies on the category definitions. The more accurate the definition, the better the categorization results. To define efficient categories, you can act on several aspects:
You can use keywords, that is, evidence terms and their respective confidence value. Evidence terms can be simple terms or phrases, for which you choose to apply a stemming analysis or keep the phrase order.
A new category can have one simple term already defined: the name of the category. The category name can appear as text or as the keyword @implied. The category name or @implied appears when the category class for this category has the Generate evidence from category name option.
You can define patterns using regular expressions to match specific terms like phone numbers or social security numbers. The Content Intelligence Services Administration Guide provides the detailed procedure for defining patterns.
You can set property rules that allow you to define category assignments according to the values of the repository attributes.
You can use evidence from other categories by setting category links.
When you have created a taxonomy and provided evidence terms for each category, test how well the category definitions guide CIS server in categorizing documents.
Submit some test documents to the CIS server. If the CIS server does not assign some documents to the categories you expect it to, consider revising the category thresholds or the evidence associated with the categories.
If documents appear in a category that should not, it means that the evidence for that category is too broad: consider adding additional terms or increasing the confidence thresholds. If documents that should appear in the category do not, the evidence is too restrictive: consider lowering the thresholds.
If the category owner is required to approve too many documents, you can lower the on-target threshold while leaving the candidate threshold unchanged.
Related topics: