An Investigation of the Latent Semantic Analysis Technique for Document Retrieval
Latent semantic analysis (LSA) application in information retrieval promises to offer better performance by overcoming some limitations that plagues traditional termmatching techniques. These term-matching techniques have always relied on matching query terms with document terms to retrieve the documents having terms matching the query terms. However, by use of these traditional retrieval techniques, users’ needs have not been adequately served. While users want to search through information based on conceptual content, natural languages have limited the expression of these concepts. They present synonymy problem (a situation where several words may have the same meaning) and polysemy problem (a situation where a word may have several meanings). Due to these natural language problems, individual words contained in users’ queries, may not explicitly specify the intended user’s concept, which may result in the retrieval of some irrelevant documents. LSA seems to be a promising technique in overcoming these natural language problems especially synonymy problem. It deals with exploiting the global relationships between terms and documents and then mapping these documents and terms in a proximity space, where terms and documents that are closely related are mapped close to each other in this space. Queries are then mapped to this space with documents being retrieved based on similarity measures. In this report, LSA performance in documents retrieval is investigated and compared with traditional term-matching techniques.