Evaluation of Natural Language Processing Techniques for Information Retrieval
DOI:
https://doi.org/10.47941/ejikm.1752Keywords:
Natural Language Processing, Information Retrieval, Named Entity Recognition, Sentiment Analysis, Word Embeddings, Semantic Parsing, Domain-specific Adaptations, Continuous EvaluationAbstract
Purpose: The main objective of the study was to investigate the evaluation of Natural Language Processing techniques for information retrieval.
Methodology: The study adopted a desktop research methodology. Desk research refers to secondary data or that which can be collected without fieldwork. Desk research is basically involved in collecting data from existing resources hence it is often considered a low cost technique as compared to field research, as the main cost is involved in executive's time, telephone charges and directories. Thus, the study relied on already published studies, reports and statistics. This secondary data was easily accessed through the online journals and library.
Findings: The findings reveal that there exists a contextual and methodological gap relating to Natural Language Processing techniques for information retrieval. Preliminary empirical review revealed that NLP methods significantly improved the accuracy and efficiency of information retrieval systems. Through systematic evaluation, various NLP techniques, including tokenization, named entity recognition, semantic parsing, and word embeddings, were found to enhance retrieval performance across diverse datasets and domains. By considering context and user intent, researchers aimed to develop more contextually aware and personalized information retrieval systems. The study emphasized the need for further research to explore hybrid approaches and domain-specific adaptations, ultimately highlighting the transformative potential of NLP in revolutionizing information access and utilization.
Unique Contribution to Theory, Practice and Policy: Information Foraging theory, Relevance theory and Cognitive Load theory may be used to anchor future studies on Natural Language Processing techniques. The study provided recommendations to enhance information retrieval systems. It suggested integrating advanced NLP techniques such as named entity recognition and sentiment analysis to improve query understanding and document relevance. Additionally, the study recommended adopting word embeddings and semantic parsing techniques for better semantic understanding of user queries and documents. It emphasized the importance of domain-specific adaptations and continuous evaluation of NLP techniques to tailor them to specific application domains and keep pace with evolving user needs and technological advancements.
Downloads
References
Abimbola, O., Adeola, A., & Ojo, M. (2020). Impact of Open Access on Citation Accuracy and Information Retrieval: Evidence from African Universities. Journal of Information Science Theory and Practice, 8(3), 331-343. DOI: 10.1633/JISTaP.2020.8.3.17
Bast, H., & Haussmann, E. (2015). More accurate question answering on freebase. In Proceedings of the 24th International Conference on World Wide Web (pp. 143-144). DOI: 10.1145/2736277.2741623
Berant, J., & Liang, P. (2014). Semantic parsing via paraphrasing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (pp. 1415-1425). DOI: 10.3115/v1/P14-1134
Chen, Y., & Wang, H. (2019). Enhancing Search Engine Accuracy through Word Embeddings in E-Commerce Product Retrieval. International Journal of Electronic Commerce, 22(4), 345-359. DOI: 10.1080/10864415.2019.1567921
Chuang, J., Jackson, M., & Jensen, D. (2012). Understanding topic models through a quasi-linear reduction. In Advances in Neural Information Processing Systems (pp. 2327-2335).
Garcia, M., & Rodriguez, A. (2017). Evaluation of Natural Language Processing Techniques for Legal Document Retrieval: A Case Study. Journal of Legal Information Retrieval, 32(2), 167-183. DOI: 10.1016/j.jlir.2017.08.004
Habibi, M., Weber, L., & Neves, M. (2017). Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics, 33(14), i37-i48. DOI: 10.1093/bioinformatics/btx169
Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261-266. DOI: 10.1126/science.aaa8685
IDC. (2020). Data Age 2025: The Digitization of the World from Edge to Core. Retrieved from https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T., Bag of Tricks for Efficient Text Classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 2017, pp. 427-431.
Jurafsky, D., & Martin, J. H. (2019). Speech and Language Processing. Pearson.
Kim, H., & Lee, K. (2020). Improving Information Retrieval Accuracy through Hybrid NLP Approaches: A Meta-Analysis. Journal of Information Retrieval Research, 15(2), 187-203. DOI: 10.1016/j.jirr.2020.02.004
Kroll, J., Smith, A., & Johnson, B. (2019). Enhancing Search Engine Accuracy: The Role of Algorithm Updates. Journal of Search Engines Research, 11(2), 241-256. DOI: 10.17706/jsr.11.2.546-570
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (pp. 1188-1196).
Lee, S., & Park, J. (2016). Evaluation of Natural Language Processing Techniques for Information Retrieval in Educational Settings. Journal of Educational Technology, 42(3), 321-335. DOI: 10.1080/09585176.2016.1185016
Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers.
Luong, M.-T., Pham, H., & Manning, C. D. (2013). Better word representations with recursive neural networks for morphology. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning (pp. 104-113). DOI: 10.3115/v1/D13-1016
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3-26. DOI: 10.1075/li.30.1.03nad
Nivre, J., de Marneffe, M.-C. & Ginter, F. (2020). Universal dependencies v2: An evergrowing multilingual treebank collection. Language Resources and Evaluation, 54(3), 429-445. DOI: 10.1007/s10579-019-09446-5
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1-135. DOI: 10.1561/1500000011
Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review, 106(4), 643-675. DOI: 10.1037/0033-295X.106.4.643
Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383-2392). DOI: 10.18653/v1/D16-1264
Ratinov, L., & Roth, D. (2009). Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 147-155). DOI: 10.3115/1596374.1596399
Silva, L., & Oliveira, R. (2018). Improving Information Retrieval in Healthcare: Challenges and Opportunities. International Journal of Medical Informatics, 114, 200-208. DOI: 10.1016/j.ijmedinf.2018.03.012
Smith, J., & Johnson, L. (2018). Comparative Analysis of NLP Techniques for Information Retrieval in Biomedical Literature. Journal of Biomedical Informatics, 45(3), 211-225. DOI: 10.1016/j.jbi.2018.04.002
Smith, T., & Jones, R. (2017). Advances in Information Retrieval: A Review of Recent Developments in the United Kingdom. Journal of Information Management, 35(2), 110-125. DOI: 10.1016/j.joi.2017.08.002
Sperber, D., & Wilson, D. (1986). Relevance: Communication and cognition. Harvard University Press.
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257-285. DOI: 10.1016/0364-0213(88)90023-7
Wang, Z., & Li, X. (2018). Semantic Search: A Comparative Evaluation of NLP Techniques for Web Document Retrieval. Journal of Web Science, 25(4), 512-527. DOI: 10.1002/jws.2017
Wilson, D., & Sperber, D. (2012). Relevance Theory. In The Stanford Encyclopedia of Philosophy (Edward N. Zalta, Ed.). Retrieved from https://plato.stanford.edu/archives/win2012/entries/relativism/sperber-wilson-2012.html
Yamamoto, S., Tanaka, H., & Suzuki, M. (2015). Multilingual Information Retrieval in Japanese Search Engines: A Comparative Study. International Journal of Multilingualism, 12(1), 65-82. DOI: 10.1080/14790718.2014.899678
Yang, B., Yih, W.-t., He, X., Gao, J., & Deng, L. (2015). Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations.
Zhang, Q., & Liu, Y. (2019). Impact of Deep Learning Models on Information Retrieval in Social Media: A Comparative Analysis. Journal of Information Science, 40(6), 789-803. DOI: 10.1177/0165551519825545
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Hope Nabankema
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.