Thursday 4 June 2020

Using automated transcription software for qualitative research: an example (part 2)


This blog post contains some examples of speech-to-text transcription apps which could be useful for qualitative researchers, e.g. to quickly transcribe and summarise meetings, interviews, or conversations. After an overview of some key features, I reflect on some key considerations for using these apps – for example the use of digital/automated methods and ethics, inclusion, and privacy. This is the second post in a 2-part introduction to using automated speech-to-text apps – see part 1 for an overview and background information.

In this post, I reflect on my experiences of using Otter.ai to record, take notes, and embed photos during the Talking Maps exhibition at the Weston Library in Oxford (I’ll write a separate blog post about this exhibition, as it was fantastic!). At this exhibition, we joined a large group as part of a guided tour with Stewart Ackland from the Map Department at the Bodleian Library. With permission, this tour was recorded using the Otter.ai app on my smartphone (Samsung Galaxy s10). 

Of course, you can do a lot of the following tasks manually (or by using Natural Language Processing features in a programming language/environment, NVivo, or similar). However, these in-built features in Otter.ai could be very useful for those who are new to automatic ways of transcribing, summarising, and displaying qualitative data (or would benefit from having these features in an accessible, engaging, and free mobile/computer app).

Automatic word frequencies



Otter.ai automatically finds key words, i.e. the most frequently mentioned words. It displays these as a list at the top of the transcription, once it has finished processing the conversation after recording. These words are ordered in terms of the frequency that they are mentioned, and you can click on any of these words to highlight it throughout the transcript. Otter can also generate a word cloud from these frequent words, with the size of the word proportional to its frequency.


Test title

Word clouds are by no means a sophisticated way to analyse text, however they do provide a quick, easy, and engaging way to see which words are most prevalent in your transcript. For example, the photos at the beginning of this blog (parts 1 and 2) are word clouds created from the text in the post - including words I've frequently used like 'transcription', 'otter.ai' and 'example' (see bottom of article for citation). In the word cloud above, you can see that our conversation at the map exhibition was (unsurprisingly!) about maps, and things that can be related to maps (country, area, land, ocean, world, Europe, people, etc.). It’s important to note that the transcript has been automatically cleaned so that common English words (e.g. “so”, “if”, “and”) have been removed for you (so they don’t affect the frequency of the words you might be most interested in).

Exploring word frequencies

In the previous example, you can see that ‘field’ was one of the key words in this conversation about maps. If you want to quickly find out more about this word, you can do this by clicking on the key word and it will highlight all those words in the transcript (like doing CTRL + f in a document). Let’s have a look at where ‘field’ is mentioned in the Talking Maps transcript.

 


When we navigate to the mentions of ‘field’ in the transcript, we can see that this is clustered around 17 minutes in – the exhibition guide is talking about a very interesting map from the 1600s, which depicts common agricultural practices at the time.

As you might be able to tell from the excerpt above, one downside to Otter.ai is that it transcribes almost everything that is being said. This can be an issue because naturally, humans tend to not always speak in coherent and flowing sentences and can change the direction of what they are saying mid-sentence (and pause, ‘umm’ and ‘err’ a lot). You can end up with a lot of repetition, breaks in sentences, and some sentences that don’t make sense. Therefore, it’s useful to listen to your recording as you edit (you can do this easily within the Otter.ai application, or elsewhere). When you edit your transcript in Otter.ai, it automatically realigns your text with the audio, which is useful. You can also see that it has highlighted the position of the word ‘field’ throughout your transcript along the time bar at the bottom, which makes it easy to skip to the word you are interested in.

Editing, photos, and speaker assignment

As I mentioned before, no transcription software perfectly understands every word that is being said – particularly if there are different accents, speeds and tones of speaking, multiple people trying to speak at once (as was the case with our conversation in the museum), or if acronyms and unusual place names are used. However, you can easily edit any mistakes while listening to the recording, before you export the file for further analysis. After time, Otter.ai will learn to pick up when you say some words, and you can also teach Otter names, words, acronyms, and phrases to improve the accuracy of the transcription (you can teach it up to 5 words for free, or 200 if you upgrade).

The examples below also show how you can easily integrate photos within the flow of text. This can be done by taking a photo on your smartphone, for example, while also recording on Otter.ai (on the mobile app). This is quite useful to refer to, so you know exactly what the speaker is referring to in the conversation (in this case, unsurprisingly, it’s maps again!). It's also a nice feature for researchers interested in mobile research methods (particularly those involving walking interviews, smartphones, and/or human-technology interactions), however background noise and the recording of multiple participants might be an issue here.


You might have noticed that in the pictures above, the person who is speaking is labelled as ‘Speaker 1’. At first, the speaker’s name was blank. Once I had labelled this, the computer will begin to scan through and automatically label ‘Speaker 1’ whenever it picks up that they are saying something. This is mostly accurate (ish), but you might want to double check by listening back through your recording. You can also save the names (or code names for) ‘suggested speakers’ in the Otter app. I’ve found this useful when recording regularly occurring meetings, for example those with my PhD supervisors.

Is there anything I should consider before using it?

Otter.ai is not 100% accurate and it might not be the best, most reliable (and most time or cost-effective) choice for everyone. Otter can struggle with recognising the voices of different speakers, picking up some accents, and is also quite limited to what languages it recognises (however this is something that the company is improving). It also requires a clear recording with little/no background noise and can struggle to transcribe multiple voices when people speak at once (however, it did work rather well for me in a museum with lots of people talking in the background!). Further to this, Otter can miss out quite a bit of punctuation (or, on the other hand, overuses punctuation and puts unexpected full stops in place of a natural pause), which requires further edits. Finally, particularly if you are using your mobile phone to record meetings and interviews, it is worth noting where the microphones are on your device to ensure that you can record two or more voices (e.g. most smartphones have mics on the top and bottom of the handset).

As with any digital research tool, you might want to critically evaluate the ways that technology includes (and excludes) individuals and groups of people. Ethics, inclusivity, and power relations are all important considerations here, including how this affects the knowledge produced by the research encounter. If you’re interested in digital research methods and ethics, this is a topic of interest in digital geographies, for example – the RGS-IBG Digital Geographies Research Group hosts and promotes some great events and resources. Considering the explosion of the use of digital tools during the coronavirus pandemic and social distancing measures, this LSE Impact Blog post outlines some practical and ethical considerations of carrying out qualitative research under lockdown (this Google Docs on ‘doing fieldwork in a pandemic’, edited by Deborah Lupton, also contains some excellent resources).

Importantly, the use of speech-to-text applications (including Otter.ai) for research purposes comes with important concerns regarding privacy and security. This is because sections of your recorded information could be used for training and quality testing purposes - see Otter.ai FAQs on “Is my data safe?” for more information on this, and view their full privacy policy here. It is important to carefully consider the privacy and security of any application or service you use for transcription, particularly if you are responsible for handling sensitive data. It is also important to think about how using apps like Otter.ai fit in with your institution’s GDPR and ethics guidelines, and/or the guidelines of the organisation you are collecting data for. As best practice, you should consider gaining informed consent from anyone you wish to record using Otter.ai (or similar apps). You should also think about whether your institution’s ethics council need to be aware that you are intending to use this method of recording and data storage.

Conclusion: are automated speech-to-text apps useful for qualitative research?

Automated speech-to-text applications have the potential to be incredibly useful, if used with consideration and for suitable applications. Apps like Otter.ai can save you a large amount of time by allowing a computer to perform the labour-intensive task of transcription for you. They can also help by identifying emerging themes, highlighting key words, embedding photographs, and visualising your text (such as word clouds).

However, speech-to-text apps are not 100% there yet in terms of the accuracy and reliability of transcription (thus require a certain level of manual editing after the transcript has been generated). However, some manual editing isn’t necessarily a bad thing, as listening through recordings again can help gain a better understanding of the data you have collected. As with many digital methods, these apps may also provoke concerns regarding the ethics, privacy, and security of data collection, processing, and storage.

In sum, artificial intelligence and machine learning in speech recognition has certainly come a long way, and apps like Otter.ai are getting there and will only continue to improve. Speech-to-text transcription is a very exciting and continuously developing area, with great potential to improve working conditions for social scientists and other researchers. I’d definitely recommend looking at Otter.ai and testing different speech-to-text transcription apps for yourself, to see what works best for you and your research!

Links to some useful resources:

Wordcloud blog title image created from the text in this article in R. R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.r-project.org/. The tm (vo.7-7; Feinerer & Hornik, 2019), readtext (v0.76; Benoit & Obeng, 2020), wordcloud2 (v0.2.2; Lang, 2020), RColorBrewer (v1.1-2; Neuwirth, 2014), wordcloud (v2.6; Fellows, 2018) packages were used.

No comments:

Post a Comment

Blending online and offline tools for community engagement

This post was originally written for Commonplace and published in their blog on Tuesday 6th October 2020. You can view the full article her...