Detecting Depression Using Twitter Data by Incorporating Hybrid Feature Representation

A Comparative Machine Learning Approach

Authors

  • Parveen Kumari Sushant University, Gurugram, Haryana, India 122003
  • Alpana Jijja Sushant University, Gurugram, Haryana, India 122003

DOI:

https://doi.org/10.57159/jcmm.4.6.25240

Keywords:

NLP, Depression Detection, Logistic Regression, Social Media, Text Analytics

Abstract

Depression is a critical global mental health challenge that often remains undiagnosed due to the limitations and subjectivity of conventional screening techniques. The growing use of social media platforms offers new avenues for understanding human emotions, as individuals increasingly share their thoughts, moods, and experiences online. Leveraging this vast digital footprint, the present study introduces a machine learning (ML)-driven approach for the automated detection of depression using Twitter data. A comprehensive dataset comprising 205,271 posts was collected and carefully preprocessed through multiple natural language processing (NLP) techniques, including tokenization, stop-word elimination, lemmatization, and sentiment polarity assessment, to extract meaningful textual features. Six distinct ML models were trained and evaluated: Support Vector Classifier (SVM), Logistic Regression, Decision Tree, AdaBoost, Na"{i}ve Bayes, and K-Nearest Neighbors (KNN). Various performance metrics, including accuracy, precision, recall, and F1-score, were employed to assess the efficiency of each developed model. Among the tested models, Logistic Regression achieved the highest accuracy (92%), followed by SVM with 90%, while KNN performed comparatively lower with 70%. The results indicate that linear and ensemble-based classifiers are more effective than distance-based models in managing high-dimensional text data. Overall, this study offers a robust comparative evaluation of ML algorithms for depression detection and underscores the transformative potential of NLP and social media analytics in scalable, data-driven mental health monitoring systems.

References

R. A. Tuhin, B. K. Paul, F. Nawrine, M. Akter, and A. K. Das, “An automated system of sentiment analysis from Bangla text using supervised learning techniques,” in 2019 IEEE 4th International Conference on Computer and Communication Systems, pp. 360–364, IEEE, 2019.

S. Ghosal and A. Jain, “Research journey of hate content detection from cyberspace,” in Natural Language Processing for Global and Local Business, IGI Global, 2021, pp. 200–225.

E. D. Klonsky, A. M. May, and B. Y. Saffer, “Suicide, suicide attempts, and suicidal ideation,” Annual Review of Clinical Psychology, vol. 12, no. 1, pp. 307–330, 2016.

M. Deshpande and V. Rao, “Depression detection using emotion artificial intelligence,” in 2017 International Conference on Intelligent Sustainable Systems, pp. 858–862, IEEE, 2017.

S. Kim, S. Woo, N. Kim, H. Lee, J. Park, T. Kim, G. Fond, L. Boyer, M. Rahmati, L. Smith, et al., “Global, regional and national trends in suicide mortality rates across 102 countries from 1990 to 2021 with projections up to 2050,” 2025.

M. A. Mansoor and K. H. Ansari, “Early detection of mental health crises through artificial-intelligence-powered social media analysis: A prospective observational study,” Journal of Personalized Medicine, vol. 14, no. 9, p. 958, 2024.

M. J. Vioules, B. Moulahi, J. Azé, and S. Bringay, “Detection of suicide-related posts in Twitter data streams,” IBM Journal of Research and Development, vol. 62, no. 1, pp. 7–1, 2018.

J. Gao, Q. Cheng, and P. L. Yu, “Detecting comments showing risk for suicide in YouTube,” in Proceedings of the Future Technologies Conference, pp. 385–400, Springer, 2018.

F. Sadeque, D. Xu, and S. Bethard, “Measuring the latency of depression detection in social media,” in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 495–503, 2018.

C. Lin, P. Hu, H. Su, S. Li, J. Mei, J. Zhou, and H. Leung, “SenseMood: Depression detection on social media,” in Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 407–411, 2020.

M. Trotzek, S. Koitka, and C. M. Friedrich, “Utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 3, pp. 588–601, 2018.

B. O’Dea, S. Wan, P. J. Batterham, A. L. Calear, C. Paris, and H. Christensen, “Detecting suicidality on Twitter,” Internet Interventions, vol. 2, no. 2, pp. 183–188, 2015.

S. G. Burdisso, M. Errecalde, and M. Montes-y Gómez, “A text classification framework for simple and effective early depression detection over social media streams,” Expert Systems with Applications, vol. 133, pp. 182–197, 2019.

T. S. Roy, P. Basu, A. Priyanshu, and R. Naidu, “Interpretability of fine-grained classification of sadness and depression,” arXiv preprint arXiv:2203.10432, 2022.

A. Haque, V. Reddi, and T. Giallanza, “Deep learning for suicide and depression identification with unsupervised label correction,” in Artificial Neural Networks and Machine Learning 2021, Springer, 2021.

S. Almouzini and A. Alageel, “Detecting Arabic depressed users from Twitter data,” Procedia Computer Science, vol. 163, pp. 257–265, 2019.

N. J. Ria, S. A. Khushbu, M. A. Yousuf, A. K. M. Masum, S. Abujar, and S. A. Hossain, “Toward an enhanced Bengali text classification using SAINT and common form,” in 2020 11th International Conference on Computing, Communication and Networking Technologies, pp. 1–5, IEEE, 2020.

N. Al Asad, M. A. M. Pranto, S. Afreen, and M. M. Islam, “Depression detection by analyzing social media posts of users,” in 2019 IEEE International Conference on Signal Processing, Information, Communication & Systems, pp. 13–17, IEEE, 2019.

M. R. Hasan, M. Maliha, and M. Arifuzzaman, “Sentiment analysis with NLP on Twitter data,” in 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), IEEE, 2019.

S. Kulasinghe, A. Jayasinghe, R. Rathnayaka, P. Karunarathne, P. S. Silva, and J. A. Jayakodi, “AI-based depression and suicide prevention system,” in 2019 International Conference on Advancements in Computing, pp. 73–78, IEEE, 2019.

K. Katchapakirin, K. Wongpatikaseree, P. Yomaboot, and Y. Kaewpitakkun, “Facebook social media for depression detection in the Thai community,” in 2018 15th International Joint Conference on Computer Science and Software Engineering, pp. 1–6, IEEE, 2018.

G. Coppersmith, R. Leary, P. Crutchley, and A. Fine, “Natural language processing of social media as screening for suicide risk,” Biomedical Informatics Insights, vol. 10, p. 1178222618792860, 2018.

R. W. A. Caicedo, J. M. G. Soriano, and H. A. M. Sasieta, “Assessment of supervised classifiers for the task of detecting messages with suicidal ideation,” Heliyon, vol. 6, no. 8, 2020.

L. He and C. Cao, “Automated depression analysis using convolutional neural networks from speech,” Journal of Biomedical Informatics, vol. 83, pp. 103–111, 2018.

A. Priya, S. Garg, and N. P. Tigga, “Predicting anxiety, depression and stress in modern life using machine learning algorithms,” Procedia Computer Science, vol. 167, pp. 1258–1267, 2020.

W. Zaghouani, “A large-scale social media corpus for the detection of youth depression (project note),” Procedia Computer Science, vol. 142, pp. 347–351, 2018.

M. Sharma, B. Pant, V. Singh, and S. Kumar, “STP: Suicidal tendency prediction among the youth using social network data,” in Next Generation Information Processing System: Proceedings of ICCET 2020, vol. 2, pp. 161–169, Springer, 2020.

S. Nigam, A. K. Das, and R. Chandra, “Machine learning based approach to sentiment analysis,” in 2018 International Conference on Advances in Computing, Communication Control and Networking, IEEE, 2018.

E. Laoh, I. Surjandari, and N. I. Prabaningtyas, “Enhancing hospitality sentiment reviews analysis performance using SVM n-grams method,” in 2019 16th International Conference on Service Systems and Service Management, IEEE, 2019.

M. M. Tadesse, H. Lin, B. Xu, and L. Yang, “Detection of depression-related posts in Reddit social media forum,” IEEE Access, vol. 7, pp. 44883–44893, 2019.

F. Rustam, M. Khalid, W. Aslam, V. Rupapara, A. Mehmood, and G. S. Choi, “A performance comparison of supervised machine learning models for COVID-19 tweets sentiment analysis,” PLOS One, vol. 16, no. 2, p. e0245909, 2021.

A. U. Hassan, J. Hussain, M. Hussain, M. Sadiq, and S. Lee, “Sentiment analysis of social networking sites (SNS) data using machine learning approach for the measurement of depression,” in 2017 International Conference on Information and Communication Technology Convergence, pp. 138–140, IEEE, 2017.

V. Malik and A. Kumar, “Analysis of Twitter data using deep learning approach: LSTM,” International Journal on Recent and Innovation Trends in Computing and Communication, vol. 6, no. 4, pp. 144–149, 2018.

H. Thakkar and D. Patel, “Approaches for sentiment analysis on Twitter: A state-of-the-art study,” arXiv preprint arXiv:1512.01043, 2015.

P. Turney, “Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews,” pp. 417–424, 2002.

B. Tang, S. Kay, and H. He, “Toward optimal feature selection in Naïve Bayes for text categorization,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 9, pp. 2508–2521, 2016.

P. Kaviani and S. Dhotre, “Short survey on Naïve Bayes algorithm,” International Journal of Advance Engineering and Research Development, vol. 4, no. 11, 2017.

Downloads

Published

31-12-2025

How to Cite

Kumari, P., & Jijja, A. (2025). Detecting Depression Using Twitter Data by Incorporating Hybrid Feature Representation: A Comparative Machine Learning Approach. Journal of Computers, Mechanical and Management, 4(6), 30–47. https://doi.org/10.57159/jcmm.4.6.25240