António Pedro Costa
Text originally published in Content Analysis Supported by Software
Content Analysis is a data analysis technique, collected from a variety of sources, but preferably expressed in text or images. The nature of these documents can be varied, such as archival material, literary texts, reports, news, evaluative comments of a given situation, diaries and autobiographies, articles selected through the method of literature review, transcripts of interviews, texts requested on a specific subject, field notes, etc. The same can be said about the nature of the images: photographs, films, book illustrations, etc.
1.1. History of Content Analysis Technique
We have already mentioned that Content Analysis is a natural, spontaneous process that we all use when we underline ideas in a text and try to organize them. But the history of Content Analysis, as a scientific method, therefore subject to controlled and systematic procedures, goes back to the times of the First World War, as an instrument for the study of the political propaganda disseminated in the mass media, having as main reference of that time the work by Harold Lasswell, Propaganda in the World War, 1927.
In World War II it was used in the analysis of newspapers, with the purpose of detecting signs of Nazi propaganda in the North American media. It is also worth noting the work of collective responsibility of Lasswell and Leites, entitled Language of Politics (1949).
Since then, with more or less hesitations of epistemological and methodological nature (often seeking to reinforce the quantitative character), Content Analysis has been applied in many fields of the human sciences, such as linguistics (discourse analysis), anthropology (thematic analyses of the discourses of the mentally ill), history (systematic analysis of documents), etc., a tendency that was consecrated and developed after the Congress of Alberton House, which took place in 1955 ( Krippendorff, 1990; Vala, 1986). The need for a congress dedicated to this theme was felt due to the fact that the technique began to subside in the face of criticism and attacks from various origins. The most compelling criticism referred in particular to one of its ‘constitutive defects’, namely ‘the encoder’s intervention in establishing the meaning of the text’ (Ghiglione & Matalon, 1992, p.180). However, in our days it is rare to find research that, exclusively or combined with other data collection and analysis techniques (e.g. questionnaires), as a means of constructing other instruments (still the questionnaire), or as a central methodology, does not make any use of it.
Currently, the use of software to support this technique allows for faster, more rigorous and highly complex processes that can be safely performed; there are already over two dozen software packages, such as NVivo (www.qsrinternational.com), Atlas.ti (www.atlasti.com), MaxQDA (www.maxqda.com). More recently, programs that work in cloud computing have begun to emerge, such as webQDA (www.webqda.net). One of the advantages of this innovation is to enable collaborative work in small or large groups, large data analysis, in a way that was not possible before (Costa, 2016). Next, we will describe the evolution of all these instruments and discuss their advantages in more detail.
1.2. The concept of Content Analysis
What is meant by this technique, in general terms, is to ‘arrange’ in an organized, systematic set, as quantified as possible, of categories of signification, the ‘manifest content’ of the most diverse types of communications, so as to be able to interpret them taking into account the diverse factors that led to their production.
The concept of Content Analysis has undergone an evolution over time; in a first phase, under the influence of Berelson (1952, apud Krippendorf, 1990), one of the classics of this technique, the major concern was to describe and quantify the manifest contents of the documents under analysis; in this positivist perspective, the technique focused on the denotations, that is, on the very surface meaning of discourse.
In addition to its descriptive function and its incidence on denotations, Content Analysis assumes an “inferential function, in search of a meaning that is far beyond what is immediately apprehensible, and which awaits the opportunity to be uncovered” (Amado, Costa, & Crusoe, 2017, p. 303). It is also interested, therefore, on the connotations of discourses, which often have more to do with what is between the lines, ellipsis, implied, and the tone itself, than with what is explicit (Esteves, 2006; Morgado, 2012).
The inferential process, however, needs to obey rules and to be subject to some control, so as not to let the analysts’ imagination make them fall into “naive or wild inferences” (Vala 1986: 103). This concern raises the need for mechanisms that confer reliability and allow the validation of the entire analysis process; the reflection of this new step is the definition of Content Analysis offered by Krippendorff (1990), one of the most recognized authors in this field: “a research technique that allows to make valid and replicable inferences of the data for its context” (p.28).
Replicability thus emerges as fundamental, so that one can offer confidence in the process developed from a technical point of view, in the identification of categories. As Lima says (2013, p.8) “it is important that the classification procedures be consensual so that different people can carry out this classification in a similar way. It is equally essential that the Content Analysis process be transparent, public and verifiable.” It can be said then that rigor is attained by applying the appropriate procedures accompanied by a clear and adjusted description of them, where the definition of each category or subcategory is not lacking, as well as becoming patent, in tables or matrices, some of the moments and intermediate or final results; therefore, it is fair to say that rigor is not confused with statistical analysis.
On the other hand, the production of inferences is based on the establishment of relationships, based on logical and pertinent deductions, between four differentiated poles:
1. The data. These, in turn, can be analysed according to certain perspectives, based (among other aspects to take into account in the previous questions and the objectives of the research) on:
a. what is said (in this case, it is a thematic analysis, the most common in Content Analysis, and that can focus on the distinction of themes, the delimiting of categories and subcategories within these themes, and the calculation of their relative frequency in the documental corpus as a whole);
b. by whom it is said (for example, the affinities between the message and the statute or the psychological state of the subject);
c. to whom it is said (analysis of relations, establishing the affinities between the message and its recipients);
d. for what purpose (analysis of the objectives of a particular message);
e. with what results (evaluative analysis, for example, of recipients’ responses to the communication).
2. The frames of reference of those who produced the communication (intentions, social representations, presuppositions, ‘states of mind’, values and symbols, as well as biographical aspects and personality traits of the author of the communication, etc.);
3. The conditions of production or the context of the emergence of the data in question (the local context and the social, cultural, historical, political and historical circumstances in which the document was produced and reflected therein);
4. The reference frames of the analysts, requiring that they be prepared theoretically and methodologically to make their interpretations. That is, the analysts must know and mobilize frames of reference absorbed, in large part, from one or more theories of human and social sciences, they must know how to use intuition and creativity in the identification and clipping of topics, categories and subcategories, equipped with a know-how-to-do and a know-how-to-be that allows them to make adequate decisions in the face of data and escape from uncontrolled subjectivity and lack of ethics.
The definition offered by Robert and Bouillaguet (1997) seems to us to be one of the most comprehensive, encompassing the descriptive, objective perspective and the subjective, inferential perspective: “Content Analysis stricto sensu is defined as a technique that enables methodological examination, objective, and sometimes quantitative, content of certain texts in order to classify and interpret their constituent elements and which are not fully accessible for immediate reading” (p.4).
In a previous text (Amado, Costa, & Crusoe, 2017) we summarized all these considerations in the following terms: “We can therefore say that the most important aspect of Content Analysis is that it allows, in addition to a rigorous and objective representation (discourse, interview, text, article, etc.) through its codification and classification by categories and subcategories, progress (fruitful, systematic, verifiable and to some extent replicable) in the sense of capturing its meaning (at the cost of interpretive inferences derived or inspired by the theoretical frameworks of the researcher), by less obvious areas constituted by the said ‘context’ or ‘conditions’ of production. We believe that it is this aspect that allows us to creatively apply Content Analysis to a wide range of documents (communications), especially those that translate subjective views of the world, so that the researcher can “adopt” the role of the actor and see the world from his/her place, as proposed by the research of an interactionist and phenomenological nature” (p. 306).