IndexingTools (DLESE Tools API Documentation v1.6.0)

Overview

Package

Class

Tree

Deprecated

Index

Help

DLESE Tools
v1.6.0

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.dlese.dpc.index.writer
Class IndexingTools

java.lang.Object
  org.dlese.dpc.index.writer.IndexingTools

public class IndexingTools
extends Object
extends Object

Tools to aid in indexing.

Author:: John Weatherley

Field Summary
`static String`	`adminDefaultFieldName` Admin default field 'admindefault'
`static String`	`defaultFieldName` Default field 'default'
`static String`	`PHRASE_SEPARATOR` String used to separate and preserve phrases indexed as text, includes leading and trailing white space.
`static String`	`stemsFieldName` Stems field 'stems'

Constructor Summary
`IndexingTools()`

Method Summary
`static void`	`addToAdminDefaultField(org.apache.lucene.document.Document myDoc, String content)` Indexes the given text into the admin default field.
`static void`	`addToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc, String content)` Indexes the given text into the default and stems fields.
`static String`	`encodeToTerm(String text)` Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String)}.
`static String`	`encodeToTerm(String text, boolean encodeWildCards)` Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String,boolean)}.
`static String[]`	`extractSeparatePhrasesFromString(String separatedPhrases)` Extracts the phrases from a String that was created using the method `makeSeparatePhrasesFromNodes(List nodes)` or `makeSeparatePhrasesFromStrings(List strings)`.
`static String[]`	`extractStringsFromString(String separatedWords)` Extracts the words from a String that was created using the method `makeStringFromNodes(List nodes)`.
`static String[]`	`getAnalyzedTerms(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)` Extracts all terms in any field from a Lucene query using the given `Analyzer`.
`static org.apache.lucene.analysis.Token[]`	`getAnalyzedTokens(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)` Extracts all `Token`s from a Lucene query using the given `Analyzer`.
`static StringBuffer`	`getAnalyzerOutput(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)` Creates a StringBuffer to display the tokens created by a given analyzer.
`static String`	`makeSeparatePhrasesFromNodes(List nodes)` Creates a String separated by the phrase separator term from the text of each of the Element or Attributes dom4j Nodes provided.
`static String`	`makeSeparatePhrasesFromStrings(List strings)` Creates a String separated by the phrase separator term from each of the Strings provided.
`static String`	`makeSeparatePhrasesFromStrings(String[] strings)` Creates a String separated by the phrase separator term from each of the Strings provided.
`static String`	`makeStringFromNodes(List nodes)` Creates a String separated by spaces from the text of each of the Element or Attributes dom4j Nodes provided.
`static String`	`tokenizeID(String ID)` Tokenizes a DLESE ID by replacing the char - with a blank space.
`static String`	`tokenizeURI(String uri)` Tokenizes a URI by replacing the unindexable chars with a blank space.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

defaultFieldName

public static final String defaultFieldName

Default field 'default'

See Also:: Constant Field Values

stemsFieldName

public static final String stemsFieldName

Stems field 'stems'

See Also:: Constant Field Values

adminDefaultFieldName

public static final String adminDefaultFieldName

Admin default field 'admindefault'

See Also:: Constant Field Values

PHRASE_SEPARATOR

public static final String PHRASE_SEPARATOR

String used to separate and preserve phrases indexed as text, includes leading and trailing white space.

See Also:: Constant Field Values

Constructor Detail

IndexingTools

public IndexingTools()

Method Detail

addToDefaultAndStemsFields

public static final void addToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc,
                                                    String content)

Indexes the given text into the default and stems fields.

Parameters:: myDoc - Document to add to; content - Content to add

addToAdminDefaultField

public static final void addToAdminDefaultField(org.apache.lucene.document.Document myDoc,
                                                String content)

Indexes the given text into the admin default field.

Parameters:: myDoc - Document to add to; content - Content to add

makeSeparatePhrasesFromNodes

public static final String makeSeparatePhrasesFromNodes(List nodes)

Creates a String separated by the phrase separator term from the text of each of the Element or Attributes dom4j Nodes provided. The input list may be null.

A call to this method might look like:
String value = makeIndexPhrasesFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));

Parameters:: nodes - List of Elements or Attributes
Returns:: A String or null

makeSeparatePhrasesFromStrings

public static final String makeSeparatePhrasesFromStrings(List strings)

Creates a String separated by the phrase separator term from each of the Strings provided. The input list may be null.

Parameters:: strings - List of Strings or null
Returns:: A String or null

makeSeparatePhrasesFromStrings

public static final String makeSeparatePhrasesFromStrings(String[] strings)

Creates a String separated by the phrase separator term from each of the Strings provided. The input list may be null.

Parameters:: strings - Array of Strings or null
Returns:: A String or null

extractSeparatePhrasesFromString

public static final String[] extractSeparatePhrasesFromString(String separatedPhrases)

Extracts the phrases from a String that was created using the method makeSeparatePhrasesFromNodes(List nodes) or makeSeparatePhrasesFromStrings(List strings).

Parameters:: separatedPhrases - String that contains the phrase separator to seperate phrases
Returns:: An array of phrase Strings or null if the imput is null

makeStringFromNodes

public static final String makeStringFromNodes(List nodes)

Creates a String separated by spaces from the text of each of the Element or Attributes dom4j Nodes provided. The input list may be null.

A call to this method might look like:
String value = makeStringFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));

Parameters:: nodes - List of dom4j Nodes of Elements or Attributes
Returns:: A String or null

extractStringsFromString

public static final String[] extractStringsFromString(String separatedWords)

Extracts the words from a String that was created using the method

makeStringFromNodes(List
  nodes)

Parameters:: separatedWords - DESCRIPTION
Returns:: An array of word Strings

tokenizeID

public static final String tokenizeID(String ID)

Tokenizes a DLESE ID by replacing the char - with a blank space.

Parameters:: ID - The ID String
Returns:: The tokenized ID

tokenizeURI

public static final String tokenizeURI(String uri)

Tokenizes a URI by replacing the unindexable chars with a blank space.

Parameters:: uri - A URL or URI
Returns:: The tokenized URI

encodeToTerm

public static final String encodeToTerm(String text)

Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String)}.

Parameters:: text - Text
Returns:: Encoded text

encodeToTerm

public static final String encodeToTerm(String text,
                                        boolean encodeWildCards)

Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String,boolean)}.

Parameters:: text - Text; encodeWildCards - True to encode the '*' wildcard char, false to leave unencoded.
Returns:: Encoded text

getAnalyzedTokens

public static final org.apache.lucene.analysis.Token[] getAnalyzedTokens(String textToParse,
                                                                         String field,
                                                                         org.apache.lucene.analysis.Analyzer analyzer)

Extracts all Tokens from a Lucene query using the given Analyzer.

Parameters:: textToParse - The text to analyze with the analyzer; analyzer - The analyzer to use; field - The field this Analyzer should interpret the text as, or null to use 'default'
Returns:: The Tokens generated by the analyzer

getAnalyzedTerms

public static final String[] getAnalyzedTerms(String textToParse,
                                              String field,
                                              org.apache.lucene.analysis.Analyzer analyzer)

Extracts all terms in any field from a Lucene query using the given Analyzer.

Parameters:: textToParse - The text to analyze with the analyzer; analyzer - The analyzer to use; field - The field this Analyzer should interpret the text as, or null to use 'default'
Returns:: The terms generated by the analyzer

getAnalyzerOutput

public static final StringBuffer getAnalyzerOutput(String textToParse,
                                                   String field,
                                                   org.apache.lucene.analysis.Analyzer analyzer)

Creates a StringBuffer to display the tokens created by a given analyzer. Output is of the form: [token1] [token2].

Parameters:: textToParse - The text to analyze with the analyzer; analyzer - The analyzer to use; field - The lucene field name, or null to use default
Returns:: The analyzerTokenOutput value

Overview

Package

Class

Tree

Deprecated

Index

Help

DLESE Tools
v1.6.0

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.dlese.dpc.index.writer Class IndexingTools

defaultFieldName

stemsFieldName

adminDefaultFieldName

PHRASE_SEPARATOR

IndexingTools

addToDefaultAndStemsFields

addToAdminDefaultField

makeSeparatePhrasesFromNodes

makeSeparatePhrasesFromStrings

makeSeparatePhrasesFromStrings

extractSeparatePhrasesFromString

makeStringFromNodes

extractStringsFromString

tokenizeID

tokenizeURI

encodeToTerm

encodeToTerm

getAnalyzedTokens

getAnalyzedTerms

getAnalyzerOutput

org.dlese.dpc.index.writer
Class IndexingTools