General and Domain-Specific Non- Topical Terms in Web Documents: An Alternative Approach to Natural Language Query Expansion

Document Type : پژوهشی

Author

Abstract

This paper presents a new approach to query expansion in search engines through non-topical terms (NTTs). Such terms can be used in conjunction with topical terms (TTs) to improve precision in retrieval results. In the first phase of the research, 20 topical queries in two domains (Health and the Social Sciences) were carried out in Google and 200 Web documents (800 pages) from the retrieved list were textually analysed. 1071 non-topical terms were identified. The frequency of the NTTs showed that 14.5% were shared between the two domains, 85.5% were domain-specific, 62.4% were non-topical, 37.6% were semi-topical, 65% occurred before and 35% occurred after their respective topical terms (TTs). Findings of the second phase showed that query expansion through NTTs particularly in the exact title and URL search options reduced the retrieval hits considerably and led to more precise and manageable results. There was significant difference between Health and the Social Sciences with regard to retrieval results in keyword and exact phrase searches but there was no significant difference in exact title and exact URL searches. With respect to the ratio of exact phrase, exact title, or exact URL retrieval frequencies to keyword frequency findings showed that there were significant differences between the two domains. Concludes that Searching would be more effective and precision would be higher if the searcher enhances the query with adding non-topical terms to the initial query. Also the retrieval results would be more relevant if the query is carried out in the "exact title' or 'URL search' options. Proposes that a knowledge-based list of NTTs can be developed and be implemented by search engines to help searchers expand their queries. Specialised search engines can develop their own list of NTTs by carrying out research using a similar methodology.
Keywords: Query expansion/natural language/non-topical terms/search engines

CAPTCHA Image