Each category definition lists the words and phrases that serve as evidence that a document belongs to the category. These words and phrases are called evidence terms. When CIS server analyzes a document, it looks for these terms, and determines whether to score a hit based on which terms it finds. For each document, CIS calculates the document score per category.
Each evidence term has a confidence value. The confidence value specifies how certain CIS server can be about scoring a hit for a document when it contains the term. The confidence values are also applicable when using category links as evidence for a category.
The following table illustrates the most common confidence values.
Table 15.2. Confidence values for evidence terms
Confidence value (0 – 100) | Description |
---|---|
High (75) | A term with high confidence is a strong evidence that a document belongs to the category. For example, if a document includes the text IBM, CIS server can be nearly certain that the document relates to the category International Business Machines. Therefore, the confidence level for the term IBM is High. |
Medium (50) | A term with medium confidence is a good indicator that a document belongs to the category. |
Low (15) | A term with low confidence only suggest that the category can be appropriate. For example, if a document includes the text Big Blue, CIS server cannot be certain that it refers to International Business Machines. The confidence level is Low, meaning that CIS server should score a hit for the category International Business Machines only if it encounters the text Big Blue and other evidence of the same category in the document. |
Supporting | This evidence by itself does not cause CIS server to score a hit for a document. However, it increases the confidence level of other evidence found in the same document. |
Exclude | If one of the evidence terms found in a document has this confidence level, then the document is not assigned to the category. For example, suppose you have a category for the company Apple Computers. The term Apple is certainly evidence of the category. However, if the term fruit appears in the same document, you can be fairly sure that Apple refers to the fruit and not the company. To capture this fact, you would add fruit as excluded evidence term to the Apple Computers category. |
Required | These terms are must-have terms but they are not taken into account for the document score. If you define several required terms for a category, the document must contain at least one of them. If only required terms are defined for the category, then only one is sufficient to assign the document to the category. If the evidence terms are not only required terms, then the document must contain one required term and have a confidence score high enough for the category. |
If the document score exceeds or meets the category on-target threshold, CIS server assigns the document to the category. If the score is lower than the on-target threshold but higher than or equal to the candidate threshold, CIS server assigns the document to the category as a pending candidate. The category owner must review and approve the document to complete the assignment. If the score is lower than the candidate threshold, CIS server does not assign the document to the category.
Related topics: