Bangla natural language processing: A comprehensive review of classical machine learning and deep learning based methods

Article


Sen, Ovishake, Fuad, Mohtasim, Islam, Md Nazrul, Rabbi, Jakaria, Masud, Mehedi, Hasan, Md. Kamrul, Awal, Md. Abdul, Fime, Awal Ahmed, Fuad, Md. Tahmid Hasan, Sikder, Delowar and Iftee, Md. Akil Raihan. 2022. "Bangla natural language processing: A comprehensive review of classical machine learning and deep learning based methods." IEEE Access. 10, pp. 38999-39044. https://doi.org/10.1109/ACCESS.2022.3165563
Article Title

Bangla natural language processing: A comprehensive review of classical machine learning and deep learning based methods

ERA Journal ID210567
Article CategoryArticle
AuthorsSen, Ovishake, Fuad, Mohtasim, Islam, Md Nazrul, Rabbi, Jakaria, Masud, Mehedi, Hasan, Md. Kamrul, Awal, Md. Abdul, Fime, Awal Ahmed, Fuad, Md. Tahmid Hasan, Sikder, Delowar and Iftee, Md. Akil Raihan
Journal TitleIEEE Access
Journal Citation10, pp. 38999-39044
Number of Pages46
Year2022
PublisherIEEE (Institute of Electrical and Electronics Engineers)
Place of PublicationUnited States
ISSN2169-3536
Digital Object Identifier (DOI)https://doi.org/10.1109/ACCESS.2022.3165563
Web Address (URL)https://ieeexplore.ieee.org/document/9751052
Abstract

The Bangla language is the seventh most spoken language, with 265 million native and non-native speakers worldwide. However, English is the predominant language for online resources and technical knowledge, journals, and documentation. Consequently, many Bangla-speaking people, who have limited command of English, face hurdles to utilize English resources. To bridge the gap between limited support and increasing demand, researchers conducted many experiments and developed valuable tools and techniques to create and process Bangla language materials. Many efforts are also ongoing to make it easy to use the Bangla language in the online and technical domains. There are some review papers to understand the past, previous, and future Bangla Natural Language Processing (BNLP) trends. The studies are mainly concentrated on the specific domains of BNLP, such as sentiment analysis, speech recognition, optical character recognition, and text summarization. There is an apparent scarcity of resources that contain a comprehensive review of the recent BNLP tools and methods. Therefore, in this paper, we present a thorough analysis of 75 BNLP research papers and categorize them into 11 categories, namely Information Extraction, Machine Translation, Named Entity Recognition, Parsing, Parts of Speech Tagging, Question Answering System, Sentiment Analysis, Spam and Fake Detection, Text Summarization, Word Sense Disambiguation, and Speech Processing and Recognition. We study articles published between 1999 to 2021, and 50% of the papers were published after 2015. Furthermore, we discuss Classical, Machine Learning and Deep Learning approaches with different datasets while addressing the limitations and current and future trends of the BNLP.

KeywordsBangla natural language processing; sentiment analysis; speech recognition; support vector machine; artificial neural network; long short-term memory; gated recurrent unit; convolutional neural network
ANZSRC Field of Research 2020460208. Natural language processing
Byline AffiliationsKhulna University of Engineering and Technology, Bangladesh
Taif University, Saudi Arabia
Khulna University, Bangladesh
Permalink -

https://research.usq.edu.au/item/10093x/bangla-natural-language-processing-a-comprehensive-review-of-classical-machine-learning-and-deep-learning-based-methods

  • 29
    total views
  • 6
    total downloads
  • 5
    views this month
  • 1
    downloads this month

Export as

Related outputs

Fully Quanvolutional Networks for Time Series Classification
Orka, Nabil Anan, Haque, Ehtashamul, Awal, Md Abdul and Moni, Mohammad Ali. 2025. "Fully Quanvolutional Networks for Time Series Classification." 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '25). Toronto, Canada 03 - 07 Aug 2025 United States. Association for Computing Machinery (ACM). https://doi.org/10.1145/3711896.3736972
Quantum deep learning in neuroinformatics: a systematic review
Orka, Nabil Anan, Awal, Md Abdul, Liò, Pietro, Pogrebna, Ganna, Ross, Allen G and Moni, Mohammad Ali. 2025. "Quantum deep learning in neuroinformatics: a systematic review." Artificial Intelligence Review: an international survey and tutorial journal. 58 (5). https://doi.org/10.1007/s10462-025-11136-7
HepNet: Deep neural network for classification of early-stage hepatic steatosis using microwave signals
Hasan, Sazid, Brankovic, Aida, Awal, Md Abdul, Rezaeieh, Sasan Ahdi, Keating, Shelley E., Abbosh, Amin M. and Zamani, Ali. 2025. "HepNet: Deep neural network for classification of early-stage hepatic steatosis using microwave signals." IEEE Journal of Biomedical and Health Informatics. 29 (1), pp. 142-151. https://doi.org/10.1109/JBHI.2024.3489626
Towards non-invasive liver health monitoring: Comprehensive microwave dielectric spectroscopy of freshly excised human abdominal tissues
Awal, Md Abdul, Janani, Azin S., Rezaeieh, Sasan Ahdi, Macdonald, Graeme A. and Abbosh, Amin. 2024. "Towards non-invasive liver health monitoring: Comprehensive microwave dielectric spectroscopy of freshly excised human abdominal tissues." IEEE Journal of Electromagnetics, RF and Microwaves in Medicine and Biology. 9 (1), pp. 2-14. https://doi.org/10.1109/JERM.2024.3416758
Adaptive weighted vector means optimization for healthy and malignant skin modeling at microwave frequencies using clinical data
Awal, Md Abdul, Naqvi, Syed Akbar Raza, Foong, Damien and Abbosh, Amin. 2024. "Adaptive weighted vector means optimization for healthy and malignant skin modeling at microwave frequencies using clinical data." IEEE Journal of Electromagnetics, RF and Microwaves in Medicine and Biology. 8 (2), pp. 170-181. https://doi.org/10.1109/JERM.2024.3374090
A comprehensive bioinformatics approach to identify molecular signatures and key pathways for the Huntington disease
Meem, Tahera Mahnaz, Khan, Umama, Mredul, Md Bazlur Rahman, Awal, Md Abdul, Rahman, Md Habibur and Khan, Md Salauddin. 2023. "A comprehensive bioinformatics approach to identify molecular signatures and key pathways for the Huntington disease." Bioinformatics and Biology Insights. 17, pp. 1-17. https://doi.org/10.1177/11779322231210098
Prevalence and factors associated with chronic school absenteeism among 207,107 in-school adolescents: Findings from cross-sectional studies in 71 low-middle and high-income countries
Rahman, Md Ashfikur, Renzaho, Andre M. N., Kundu, Satyajit, Awal, Md. Abdul, Ashikuzzaman, Md., Fan, Lijun, Ahinkorah, Bright Opoku, Okyere, Joshua, Kamara, Joseph Kihika and Mahumud, Rashidul Alam. 2023. "Prevalence and factors associated with chronic school absenteeism among 207,107 in-school adolescents: Findings from cross-sectional studies in 71 low-middle and high-income countries." PLoS One. 18 (5). https://doi.org/10.1371/journal.pone.0283046
HGSOXGB: Hunger-Games-Search-Optimization-Based Framework to Predict the Need for ICU Admission for COVID-19 Patients Using eXtreme Gradient Boosting
Pinki, Farhana Tazmim, Awal, Md Abdul, Mumenin, Khondoker Mirazu, Hossain, Md. Shahadat, Faysal, Jabed Al, Rana, Rajib, Almuqren, Rajib, Ksibi, Amel and Samad, Md Abdus. 2023. "HGSOXGB: Hunger-Games-Search-Optimization-Based Framework to Predict the Need for ICU Admission for COVID-19 Patients Using eXtreme Gradient Boosting." Mathematics. 11 (18). https://doi.org/10.3390/math11183960
Bioinformatics and system biology techniques to determine biomolecular signatures and pathways of prion disorder
Mredul, Md Bazlur Rahman, Khan, Umama, Rana, Humayan Kabir, Meem, Tahera Mahnaz, Awal, Md Abdul, Rahman, Md Habibur and Khan, Md Salauddin. 2022. "Bioinformatics and system biology techniques to determine biomolecular signatures and pathways of prion disorder." Bioinformatics and Biology Insights. 16, pp. 1-14. https://doi.org/10.1177/11779322221145373
HGSORF: Henry Gas Solubility Optimization-based Random Forest for C-Section prediction and XAI-based cause analysis
Islam, Md Saiful, Awal, Md. Abdul, Laboni, Jinnaton Nessa, Pinki, Farhana Tazmim, Karmokar, Shatu, Mumenin, Khondoker Mirazul, Al-Ahmadi, Saad, Rahman, Md Ashfikur, Hossain, Md Shahadat and Mirjalili, Seyedali. 2022. "HGSORF: Henry Gas Solubility Optimization-based Random Forest for C-Section prediction and XAI-based cause analysis." Computers in Biology and Medicine. 147. https://doi.org/10.1016/j.compbiomed.2022.105671
Development of a smartphone-based expert system for COVID-19 risk prediction at early stage
Raihan, M., Hassan, Md Mehedi, Hasan, Towhid, Bulbul, Abdullah Al-Mamun, Hasan, Md Kamrul, Hossain, Md Shahadat, Roy, Dipa Shuvo and Awal, Md Abdul. 2022. "Development of a smartphone-based expert system for COVID-19 risk prediction at early stage." Bioengineering. 9 (7). https://doi.org/10.3390/bioengineering9070281
Fake news detection of covid-19 using machine learning techniques
Ghosh, Promila, Raihan, M., Hassan, Md Mehedi, Akter, Laboni, Zaman, Sadika and Awal, Md Abdul. 2022. "Fake news detection of covid-19 using machine learning techniques." 4th International Conference on Intelligent Computing and Optimization 2021 (ICO2021). Hua Hin, Thailand 30 - 31 Dec 2021 Switzerland . Springer. https://doi.org/10.1007/978-3-030-93247-3_46
Early Prediction of Diabetes Using an Ensemble of Machine Learning Models
Dutta, Aishwariya, Hasan, Md Kamrul, Ahmad, Mohiuddin, Awal, Md Abdul, Islam, Md Akhtarul, Masud, Mehedi and Meshref, Hossam. 2022. "Early Prediction of Diabetes Using an Ensemble of Machine Learning Models ." International Journal of Environmental Research and Public Health. 19 (19). https://doi.org/10.3390/ijerph191912378
Covid-19 fake news detection on social media
Mumenin, Khondoker Mirazul, Reza, Khondker Jahid, Shathi, Swarna Saha, Akter, Humayra, Raihan, M., Hassan, Md Mehedi, Rahman, Shagoto and Awal, Md Abdul. 2022. "Covid-19 fake news detection on social media." 2021 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2). Rajshahi, Bangladesh 26 - 27 Dec 2021 Bangladesh. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/IC4ME253898.2021.9768523
Understanding world happiness using machine learning techniques
Ibnat, F., Gyalmo, Jigmey, Alom, Zulfikar, Awal, Md Abdul and Azim, Mohammad Abdul. 2022. "Understanding world happiness using machine learning techniques." 2021 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2). Rajshahi, Bangladesh 26 - 27 Dec 2021 Bangladesh. https://doi.org/10.1109/IC4ME253898.2021.9768407
Influential Causes that Affect Largely for the Survival of a Patient with Heart-failure: A Machine Learning Perspective
Talin, Iffat Ara, Abid, Mahmudul Hasan, Awal, Md Abdul and Nahid, Abdullah-Al. 2022. "Influential Causes that Affect Largely for the Survival of a Patient with Heart-failure: A Machine Learning Perspective." 2021 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2). Rajshahi, Bangladesh 26 - 27 Dec 2021 Bangladesh. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/IC4ME253898.2021.9768462
OLGBM: Optuna optimized light gradient boosting machine for intrusion detection
Arifin, Md Mashrur, Based, Md Mashrur, Mumenin, Khondoker Mirazul, Imran, Ali, Azim, Mohammad Abdul, Alom, Zulfikar and Awal, Md Abdul. 2022. "OLGBM: Optuna optimized light gradient boosting machine for intrusion detection." 2021 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2). Rajshahi, Bangladesh 26 - 27 Dec 2021 Bangladesh. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/IC4ME253898.2021.9768555
Machine learning models for classification and identification of significant attributes to detect type 2 diabetes
Howlader, Koushik Chandra, Satu, Md Shahriare, Awal, Md Abdul, Islam, Md Rabiul, Islam, Sheikh Mohammed Shariful, Quinn, Julian MW and Moni, Mohammad Ali. 2022. "Machine learning models for classification and identification of significant attributes to detect type 2 diabetes." Health Information Science and Systems. 10 (1). https://doi.org/10.1007/s13755-021-00168-2
Machine learning approaches for predicting hypertension and its associated factors using population-level data from three South Asian countries
Islam, Sheikh Mohammed Shariful, Talukder, Ashis, Awal, Md Abdul, Siddiqui, Md Muhammad Umer, Ahamad, Md Martuza, Ahammed, Benojir, Rawal, Lal B, Alizadehsani, Roohallah, Abawajy, Jemal, Laranjo, Liliana, Chow, Clara K. and Maddison, Ralph. 2022. "Machine learning approaches for predicting hypertension and its associated factors using population-level data from three South Asian countries." Frontiers in Cardiovascular Medicine. 9. https://doi.org/10.3389/fcvm.2022.839379
Determination of molecular signatures and pathways common to brain tissues of autism spectrum disorder: insights from comprehensive bioinformatics approach
Bristy, Sadia Afrin, Islam, AM Humyra, Andalib, KM Salim, Khan, Umama, Awal, Md Abdul and Rahman, Md Habibur. 2022. "Determination of molecular signatures and pathways common to brain tissues of autism spectrum disorder: insights from comprehensive bioinformatics approach." Informatics in Medicine Unlocked. 29. https://doi.org/10.1016/j.imu.2022.100871
Identification of molecular signatures and pathways common to blood cells and brain tissue based RNA-Seq datasets of bipolar disorder: Insights from comprehensive bioinformatics approach
Islam, AM Humyra, Rahman, Md Habibur, Bristy, Sadia Afrin, Andalib, KM Salim, Khan, Umama, Awal, Md Abdul, Hossain, Md Shahadat and Moni, Mohammad Ali. 2022. "Identification of molecular signatures and pathways common to blood cells and brain tissue based RNA-Seq datasets of bipolar disorder: Insights from comprehensive bioinformatics approach." Informatics in Medicine Unlocked. 29. https://doi.org/10.1016/j.imu.2022.100881
An Improved Machine-Learning Approach for COVID-19 Prediction Using Harris Hawks Optimization and Feature Analysis Using SHAP
Debjit, Kumar, Islam, Md Saiful, Rahman, Md. Abadur, Pinki, Farhana Tazmim, Nath, Rajan Dev, Al-Ahmadi, Saad, Hossain, Md. Shahadat, Mumenin, Khondoker Mirazul and Awal, Md. Abdul. 2022. "An Improved Machine-Learning Approach for COVID-19 Prediction Using Harris Hawks Optimization and Feature Analysis Using SHAP." Diagnostics. 12 (5). https://doi.org/10.3390/diagnostics12051023
Ensemble of Convolutional Neural Networks to diagnose Acute Lymphoblastic Leukemia from microscopic images
Mondal, Chayan, Hasan, Md Kamrul, Ahmad, Mohiuddin, Awal, Md. Abdul, Jawad, Md. Tasnim, Dutta, Aishwariya, Islam, Md Rabiul and Moni, Mohammad Ali. 2021. "Ensemble of Convolutional Neural Networks to diagnose Acute Lymphoblastic Leukemia from microscopic images." Informatics in Medicine Unlocked. 27. https://doi.org/10.1016/j.imu.2021.100794
Deep Bidirectional LSTM Network Learning-Aided OFDMA Downlink and SC-FDMA Uplink
Kadir, Rafiul, Saha, Ritu, Awal, Md Abdul and Kadir, Mohammad Ismat. 2021. "Deep Bidirectional LSTM Network Learning-Aided OFDMA Downlink and SC-FDMA Uplink." 2021 International Conference on Electronics, Communications and Information Technology (ICECIT). Khulna, Bangladesh 14 - 16 Sep 2021 Bangladesh. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICECIT54077.2021.9641123
GWO-XGB: Grey Wolf Optimization-based eXtreme Gradient Boosting for Hypertension Prediction in Bangladesh
Tahsin, Tasfia, Mumenin, Khondoker Mirazul, Pinki, Farhana Tazmim, Tuli, Anamika Biswas, Sikder, Shahriar, Rahman, Md Ashfikur, Bulbul, Abdullah Al-Mamun and Awal, Md Abdul. 2021. "GWO-XGB: Grey Wolf Optimization-based eXtreme Gradient Boosting for Hypertension Prediction in Bangladesh." 2021 International Conference on Electronics, Communications and Information Technology (ICECIT). Khulna, Bangladesh 14 - 16 Sep 2021 Bangladesh. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICECIT54077.2021.9641256
LDPC coded hybrid discrete cosine transform and Fejér–Korovkin wavelet transform-based SC-FDMA for image communication
Kadir, Rafiul, Saha, Ritu, Akhter, Md Mueid, Awal, Md Abdul and Kadir, Mohammad Ismat. 2021. "LDPC coded hybrid discrete cosine transform and Fejér–Korovkin wavelet transform-based SC-FDMA for image communication." Array. 12. https://doi.org/10.1016/j.array.2021.100107
EEG channel correlation based model for emotion recognition
Islam, Md Rabiul, Islam, Md Milon, Rahman, Md Mustafizur, Mondal, Chayan, Singha, Suvojit Kumar, Ahmad, Mohiuddin, Awal, Abdul, Islam, Md Saiful and Moni, Mohammad Ali. 2021. "EEG channel correlation based model for emotion recognition." Computers in Biology and Medicine. 136. https://doi.org/10.1016/j.compbiomed.2021.104757
Voiceless Bangla vowel recognition using sEMG signal
Mostafa, S.S., Awal, M.A., Ahmad, M. and Rashid, M.A.. 2016. "Voiceless Bangla vowel recognition using sEMG signal." SpringerPlus. 5 (1). https://doi.org/10.1186/s40064-016-3170-9
Performance analysis of different m-ary modulation techniques in fading channels using different diversity
Ahmed, Mohammad Riaz, Ahmed, Md Rumen, Robin, Md Ruhul Amin, Asaduzzaman, Md, Hossain, Md Mahbub and Awal, Md Abdul. 2010. "Performance analysis of different m-ary modulation techniques in fading channels using different diversity." Journal of Theoretical and Applied Information Technology. 15 (1), pp. 23-28.