Focus groups

This chapter provides a brief analysis of data collected from focus groups conducted across four groups of interest, with three rounds of focus groups organized for each.

Focus Groups Structure

The first round focused on citizens of different age groups, including those aged 25–35, 35–45, 45–65, and 66 and older. The second round addressed citizens with varying income levels—low, medium, and high. The third round concentrated on citizens living in areas with different access to public transportation, comparing individuals in urban areas who have easy access to public transport with those in rural areas located far from public transport stops. Participants were asked about their regular modes of transportation and why they choose them, whether they feel limited in how they travel, their relationship with technology, their experiences with automated transportation, and their views on the advantages and concerns of self-driving buses. They also discussed the minimum requirements needed for a perfect CCAM solution. Each focus group comprised 6 to 10 participants; four groups were formed for each age category during the first round, three for each income level in the second round, and two for each living-area category in the third round. Unfortunately, focus groups with different income levels from Hamburg were not available.

The methodology

The methodology used for this analysis is based on natural language processing (NLP) to uncover themes within textual data, using the python scripting languange. The objective is to identify and analyze key topics segmented by demographic groups, providing insights into recurring patterns and themes. The process began with cleaning the text by removing special characters, excessive spaces, and irrelevant elements, leaving only alphanumeric characters and spaces.

The datasets

Next, the datasets were grouped by demographic categories, and text from each group was analyzed independently. A vectorization step is performed using a Bag-of-Words approach, specifically the CountVectorizer from the scikit-learn library, which converts the text data into a document-term matrix. This matrix represents the frequency of words across documents and serves as input for topic modeling.

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA), as scikit-learn module, is then applied to the document-term matrix to uncover latent topics within the text. The model identifies key themes by associating clusters of words with specific topics. For each demographic group, the top keywords representing these topics are extracted, highlighting the most significant patterns. Finally, the results are summarized in plain text, providing interpretable descriptions of the topics discovered for each column and demographic group. This major advantages of this methodology is that it ensures a systematic and reproducible approach.