An Investigation of the Latent Semantic Analysis Technique for Document Retrieval
Abstract
Latent semantic analysis (LSA) application in information retrieval promises to offer
better performance by overcoming some limitations that plagues traditional termmatching
techniques. These term-matching techniques have always relied on
matching query terms with document terms to retrieve the documents having terms
matching the query terms. However, by use of these traditional retrieval techniques,
users’ needs have not been adequately served. While users want to search through
information based on conceptual content, natural languages have limited the
expression of these concepts. They present synonymy problem (a situation where
several words may have the same meaning) and polysemy problem (a situation
where a word may have several meanings). Due to these natural language
problems, individual words contained in users’ queries, may not explicitly specify the
intended user’s concept, which may result in the retrieval of some irrelevant
documents. LSA seems to be a promising technique in overcoming these natural
language problems especially synonymy problem. It deals with exploiting the global
relationships between terms and documents and then mapping these documents
and terms in a proximity space, where terms and documents that are closely related
are mapped close to each other in this space. Queries are then mapped to this
space with documents being retrieved based on similarity measures. In this report,
LSA performance in documents retrieval is investigated and compared with
traditional term-matching techniques.