Evaluation of Natural Language Processing Techniques for Information Retrieval

Authors

  • Hope Nabankema Makerere University

DOI:

https://doi.org/10.47941/ejikm.1752

Keywords:

Natural Language Processing, Information Retrieval, Named Entity Recognition, Sentiment Analysis, Word Embeddings, Semantic Parsing, Domain-specific Adaptations, Continuous Evaluation

Abstract

Purpose: The main objective of the study was to investigate the evaluation of Natural Language Processing techniques for information retrieval.

Methodology: The study adopted a desktop research methodology. Desk research refers to secondary data or that which can be collected without fieldwork. Desk research is basically involved in collecting data from existing resources hence it is often considered a low cost technique as compared to field research, as the main cost is involved in executive's time, telephone charges and directories. Thus, the study relied on already published studies, reports and statistics. This secondary data was easily accessed through the online journals and library.

Findings: The findings reveal that there exists a contextual and methodological gap relating to Natural Language Processing techniques for information retrieval. Preliminary empirical review revealed that NLP methods significantly improved the accuracy and efficiency of information retrieval systems. Through systematic evaluation, various NLP techniques, including tokenization, named entity recognition, semantic parsing, and word embeddings, were found to enhance retrieval performance across diverse datasets and domains. By considering context and user intent, researchers aimed to develop more contextually aware and personalized information retrieval systems. The study emphasized the need for further research to explore hybrid approaches and domain-specific adaptations, ultimately highlighting the transformative potential of NLP in revolutionizing information access and utilization.

Unique Contribution to Theory, Practice and Policy: Information Foraging theory, Relevance theory and Cognitive Load theory may be used to anchor future studies on Natural Language Processing techniques. The study provided recommendations to enhance information retrieval systems. It suggested integrating advanced NLP techniques such as named entity recognition and sentiment analysis to improve query understanding and document relevance. Additionally, the study recommended adopting word embeddings and semantic parsing techniques for better semantic understanding of user queries and documents. It emphasized the importance of domain-specific adaptations and continuous evaluation of NLP techniques to tailor them to specific application domains and keep pace with evolving user needs and technological advancements.

Downloads

Download data is not yet available.

References

Abimbola, O., Adeola, A., & Ojo, M. (2020). Impact of Open Access on Citation Accuracy and Information Retrieval: Evidence from African Universities. Journal of Information Science Theory and Practice, 8(3), 331-343. DOI: 10.1633/JISTaP.2020.8.3.17

Bast, H., & Haussmann, E. (2015). More accurate question answering on freebase. In Proceedings of the 24th International Conference on World Wide Web (pp. 143-144). DOI: 10.1145/2736277.2741623

Berant, J., & Liang, P. (2014). Semantic parsing via paraphrasing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (pp. 1415-1425). DOI: 10.3115/v1/P14-1134

Chen, Y., & Wang, H. (2019). Enhancing Search Engine Accuracy through Word Embeddings in E-Commerce Product Retrieval. International Journal of Electronic Commerce, 22(4), 345-359. DOI: 10.1080/10864415.2019.1567921

Chuang, J., Jackson, M., & Jensen, D. (2012). Understanding topic models through a quasi-linear reduction. In Advances in Neural Information Processing Systems (pp. 2327-2335).

Garcia, M., & Rodriguez, A. (2017). Evaluation of Natural Language Processing Techniques for Legal Document Retrieval: A Case Study. Journal of Legal Information Retrieval, 32(2), 167-183. DOI: 10.1016/j.jlir.2017.08.004

Habibi, M., Weber, L., & Neves, M. (2017). Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics, 33(14), i37-i48. DOI: 10.1093/bioinformatics/btx169

Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261-266. DOI: 10.1126/science.aaa8685

IDC. (2020). Data Age 2025: The Digitization of the World from Edge to Core. Retrieved from https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf

Joulin, A., Grave, E., Bojanowski, P., Mikolov, T., Bag of Tricks for Efficient Text Classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017, pp. 427-431.

Jurafsky, D., & Martin, J. H. (2019). Speech and Language Processing. Pearson.

Kim, H., & Lee, K. (2020). Improving Information Retrieval Accuracy through Hybrid NLP Approaches: A Meta-Analysis. Journal of Information Retrieval Research, 15(2), 187-203. DOI: 10.1016/j.jirr.2020.02.004

Kroll, J., Smith, A., & Johnson, B. (2019). Enhancing Search Engine Accuracy: The Role of Algorithm Updates. Journal of Search Engines Research, 11(2), 241-256. DOI: 10.17706/jsr.11.2.546-570

Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (pp. 1188-1196).

Lee, S., & Park, J. (2016). Evaluation of Natural Language Processing Techniques for Information Retrieval in Educational Settings. Journal of Educational Technology, 42(3), 321-335. DOI: 10.1080/09585176.2016.1185016

Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers.

Luong, M.-T., Pham, H., & Manning, C. D. (2013). Better word representations with recursive neural networks for morphology. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning (pp. 104-113). DOI: 10.3115/v1/D13-1016

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3-26. DOI: 10.1075/li.30.1.03nad

Nivre, J., de Marneffe, M.-C. & Ginter, F. (2020). Universal dependencies v2: An evergrowing multilingual treebank collection. Language Resources and Evaluation, 54(3), 429-445. DOI: 10.1007/s10579-019-09446-5

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1-135. DOI: 10.1561/1500000011

Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review, 106(4), 643-675. DOI: 10.1037/0033-295X.106.4.643

Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383-2392). DOI: 10.18653/v1/D16-1264

Ratinov, L., & Roth, D. (2009). Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 147-155). DOI: 10.3115/1596374.1596399

Silva, L., & Oliveira, R. (2018). Improving Information Retrieval in Healthcare: Challenges and Opportunities. International Journal of Medical Informatics, 114, 200-208. DOI: 10.1016/j.ijmedinf.2018.03.012

Smith, J., & Johnson, L. (2018). Comparative Analysis of NLP Techniques for Information Retrieval in Biomedical Literature. Journal of Biomedical Informatics, 45(3), 211-225. DOI: 10.1016/j.jbi.2018.04.002

Smith, T., & Jones, R. (2017). Advances in Information Retrieval: A Review of Recent Developments in the United Kingdom. Journal of Information Management, 35(2), 110-125. DOI: 10.1016/j.joi.2017.08.002

Sperber, D., & Wilson, D. (1986). Relevance: Communication and cognition. Harvard University Press.

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257-285. DOI: 10.1016/0364-0213(88)90023-7

Wang, Z., & Li, X. (2018). Semantic Search: A Comparative Evaluation of NLP Techniques for Web Document Retrieval. Journal of Web Science, 25(4), 512-527. DOI: 10.1002/jws.2017

Wilson, D., & Sperber, D. (2012). Relevance Theory. In The Stanford Encyclopedia of Philosophy (Edward N. Zalta, Ed.). Retrieved from https://plato.stanford.edu/archives/win2012/entries/relativism/sperber-wilson-2012.html

Yamamoto, S., Tanaka, H., & Suzuki, M. (2015). Multilingual Information Retrieval in Japanese Search Engines: A Comparative Study. International Journal of Multilingualism, 12(1), 65-82. DOI: 10.1080/14790718.2014.899678

Yang, B., Yih, W.-t., He, X., Gao, J., & Deng, L. (2015). Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations.

Zhang, Q., & Liu, Y. (2019). Impact of Deep Learning Models on Information Retrieval in Social Media: A Comparative Analysis. Journal of Information Science, 40(6), 789-803. DOI: 10.1177/0165551519825545

Downloads

Published

2024-03-27

How to Cite

Nabankema, H. . (2024). Evaluation of Natural Language Processing Techniques for Information Retrieval. European Journal of Information and Knowledge Management, 3(1), 38–49. https://doi.org/10.47941/ejikm.1752

Issue

Section

Articles