You're using an outdated browser. Please upgrade to a modern browser for the best experience.
Peer Reviewed
Origins, Styles, and Applications of Text Analytics in Social Science Research
Textual analysis is grounded in conceptual schemes of traditional qualitative and quantitative content analysis techniques that have led to the hybridization of methodological styles widely used across social scientific fields. This paper delivers an extensive review of the origins and evolution of text analysis within the domains of traditional content analysis. Emphasis is given to the conceptual schemas and operational structure of latent semantic analysis, and its capacity to detect topical clusters of large corpora. Further, I describe the operations of Entity–Aspect Sentiment Analysis which are designed to measure and assess sentiments/opinions within specific contextual domains of textual data. Then, I conceptualize and elaborate on the potential of streamlining latent semantic and Entity–Aspect Sentiment Analysis complemented by Correspondence Analysis, generating an integrated operational scheme that would detect the topic structure, assess the contextual sentiment/opinion for each detected topic, test for statistical dependence of sentiments/opinions across topical domains, and graphically display conceptual maps of sentiments in topics space.
origins of content analysis text analytics latent semantic analysis sentiment analysis topic maps social sciences
The rapid development of data mining and text analytics, especially in the fields of communications, linguistics, sociology, and psychology, has significantly contributed to the evolution of various families of textual analysis techniques in social science research [1][2][3][4][5][6]. In its early stages, a generic form of communication content analysis was used in academic works [7][8] that focused on the comparative or controversial narratives of historical incidents [9], the expressions of nationalism in children’s books [10], inaugural presidential speeches [11][12], the topic discovery of published academic articles [13][14][15][16], and more recently the analysis of textbooks via natural language processing [17]. Quantitative textual analysis was often identified as the most prominent content analysis technique in testing empirical hypotheses using textual data to evaluate social theories [5][18][19][20][21].
The main function of textual content analysis is to systematically analyze the content of different types of communication [22][23]. The conceptual and operational challenges of the technique have been explicitly discussed in the methodological literature. Indicatively, Shapiro and Markoff [24] outlined the framework of definitional peculiarities of content analysis classifying them based on (a) the scientific purpose (descriptive vs. inferential vs. taxonomic), (b) the methodological orientations (quantitative vs. qualitative), (c) the extraction of contextual meaning (manifest vs. latent), (d) the unit of analysis (words, sentences, etc.), and (e) measurement quality (issues of validity and reliability).
Content analysis is a family of formal methodological techniques that systematically convert textual materials into numeric representations of manifest and latent meanings. In the last few decades, despite the obstacles to the social context of digital texts and the reliability issues of computer-assisted coding schemes, there has been an increasing interest in examining the applicability of text mining in sociological studies [25][26][27][28]. The need for hybridization of content analysis techniques was extensively discussed by Roberts [19] who recognized the need for a methodological synthesis ascribing cultural meanings derived from textual data and validating coding schemes suitable for statistical analysis.
The scope of this paper is to trace the historical origins of traditional content analysis and exemplify the importance of epistemological and methodological syntheses consisting of traditional text analytics and mainstream multivariate techniques that have significantly contributed to the analysis of socio-cognitive and socio-cultural studies. Explicitly, I conceptually streamline the operations of the foundational models of latent semantic analysis (LSA), Entity–Aspect Sentiment Analysis (EASA), and Correspondence Analysis (CA) and conceptually assert that such operational synthesis not only detects topics, describes sentiments, and classifies sentiments within the context of topic domains, but also assesses the degree of dependence of sentiments across topic domains. Overall, this paper provides insightful information about the origins and evolution of content analysis and, grounded in socio-cognitive frameworks, presents the operational structure of a streamlined integrating model.

References

  1. Németh, R.; Koltai, J. The potential of automated text analytics in social knowledge building. In Pathways Between Social Science and Computational Social Science: Theories, Methods, and Interpretations; Springer International Publishing: Cham, Switzerland, 2021; pp. 49–70.
  2. Stone, P.J. Thematic text analysis: New agendas for analyzing text content. In Text Analysis for the Social Sciences; Routledge: London, UK, 2020; pp. 35–54.
  3. Krippendorff, K. Content Analysis: An Introduction to Its Methodology; Sage Publications: Thousand Oaks, CA, USA, 2018.
  4. Neuendorf, K.A.; Kumar, A. Content analysis. In The International Encyclopedia of Political Communication; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2015; pp. 1–10.
  5. Franzosi, R. (Ed.) Content Analysis; Sage Publications: New York, NY, USA, 2008.
  6. Berelson, B. Content Analysis in Communication Research; Media studies: A reader; Free Press: Los Angeles, CA, USA, 2000; pp. 200–209.
  7. Lippmann, W. Public Opinion; Mcmillan: New York, NY, USA, 1922.
  8. Simpson, G.E. The Negro in the Philadelphia Press; University of Pennsylvania Press: Philadelphia, PA, USA, 1936.
  9. Walworth, A. School Histories at War: A Study of the Treatment of Our Wars in the Secondary School History Books of the United States and in Those of Its Former Enemies; Harvard University Press: Cambridge, MA, USA, 1938.
  10. Martin, H. Nationalism in children’s literature. Libr. Q. 1936, 6, 405–418.
  11. Janis, I. The problem of validating content analysis. In The Content Analysis Reader; SAGE: Atlanta, GA, USA, 1965; pp. 358–366.
  12. Janis, I.L.; Fadner, R.H. A coefficient of imbalance for content analysis. Psychometrika 1943, 8, 105–119.
  13. Rainoff, T.J. Wave-like fluctuations of creative productivity in the development of West-European physics in the eighteenth and nineteenth centuries. Isis 1929, 12, 287–319.
  14. Becker, H.P. Distribution of space in the American Journal of Sociology, 1895–1927. Am. J. Sociol. 1930, 36, 461–466.
  15. Shanas, E. The American Journal of Sociology through fifty years. Am. J. Sociol. 1945, 50, 522–533.
  16. Tannenbaum, P.H.; Greenberg, B.S. Mass communication. Annu. Rev. Psychol. 1968, 19, 351–386.
  17. Esfahani, M.N. Content Analysis of Textbooks via Natural Language Processing. Am. J. Educ. Pract. 2024, 8, 36–54.
  18. Frade, C. Social theory and the digital: The institutionalisation of digital sociology. Acta Sociol. 2025, 68, 41–56.
  19. Roberts, C.W. A conceptual framework for quantitative text analysis. Qual. Quant. 2000, 34, 259–274.
  20. Lasswell, H.D. Propaganda Technique in the World War; Peter Smith: New York, NY, USA, 1938.
  21. Lasswell, H.D.; Lerner, D.; de Sola Pool, I. The Comparative Study of Symbols: An Introduction; Stanford University Press: Stanford, CA, USA, 1952.
  22. Belsey, C. Textual analysis as a research method. Res. Methods Engl. Stud. 2013, 2, 160–178.
  23. Berelson, B. Content Analysis. In Handbook of Social Psychology; Lindzey, G., Ed.; Addison-Wesley: Reading, MA, USA, 1954; pp. 488–522.
  24. Shapiro, G.; Markoff, J. A matter of definition. In Text Analysis for the Social Sciences; Roberts, C.W., Ed.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 1997; pp. 9–31.
  25. Ignatow, G. Theoretical Foundations for Digital Text Analysis. J. Theory Soc. Behaviour. 2015, 46.
  26. Bail, C.A. The cultural environment: Measuring culture with big data. Theory Soc. 2014, 43, 465–482.
  27. Mohr, J.W.; Bogdanov, P. Introduction—Topic models: What they are and why they matter. Poetics 2013, 41, 545–569.
  28. Ignatow, G.; Evangelopoulos, N.; Zougris, K. Sentiment analysis of polarizing topics in social media: News site readers’ comments on the Trayvon Martin controversy. In Communication and Information Technologies Annual; Media Cultures; Emerald Group Publishing: Bradford, UK, 2016; pp. 259–284.
More
Upload a video for this entry
Information
Contributor MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :
View Times: 80
Online Date: 06 Jun 2025
Academic Video Service