Data Science Institute builds AI technologies to support book project about the Black experience at Vanderbilt

Vanderbilt Aerial

Rosevelt L. Noble, BS’98, PhD’03, director of the Bishop Joseph Johnson Black Cultural Center and senior lecturer in sociology, is participating in an ongoing collaboration with the Data Science Institute to support the research and writing of his book, Lost in the Ivy. The book, to be published by Vanderbilt University Press in fall 2023, draws on interviews with more than 500 Black students and alumni about their experiences at Vanderbilt. 

Rosevelt Noble

Noble’s interest in the project was spurred after attending the 2007 dedication of Murray House on The Martha Rivers Ingram Commons, named in honor of Walter Murray Jr., BA’70, MM’74, the university’s first African American Board of Trust member. After getting to know the Murray family, Noble began looking deeper into the Black experience at the university. 

“Beyond the historical facts, I also became fascinated by the manner in which students coped with life at Vanderbilt and how this experience impacted their lives beyond college,” Noble said in 2014 

Following a suggestion by Vice Provost for Faculty Affairs Tracey George, Noble and Vanderbilt University Press Director Gianna Mosser contacted the DSI team about ways to support Lost in the Ivy, particularly through the use of technology to uncover common themes and topics from myriad interviews.  

“We were very excited by the possibility to work together because there had just been tremendous advances in using AI to explore text through natural language processing at the time,” said Jesse Spencer-Smith, chief data scientist at the DSI. “This was at the very beginning of the wider use of large language models and transformers, which are now commonly referred to as AI Foundation models. We are excited to partner with Professor Noble and VUP in some of the very first applications of these models to real-world problems.” 

“As an inherently quantitative researcher and thinker, the Lost in the Ivy project gave me a newfound appreciation for the depth and richness of information that one can glean from qualitative methods,” Noble said. “Working with the DSI provided a means of synthesizing the information that otherwise would have been a tremendously daunting and time-consuming task.” 

Automated identification of potential quotes
Potential quotes for use in the book were automatically created once interview transcripts were fed into the AI model created by the Bell and her team. (Vanderbilt Data Science Institute)

Deploying natural language processing analysis, Senior Data Scientist Charreau Bell, BE’09, PhD’18, and undergraduate student researcher Immanual John Milton worked with Noble to identify themes across interviews, such as invisibility and Opportunity Vanderbilt. They also were able to connect these themes to statements with the highest relevance for fast identification of potential quotes to include.  

“I’ve always loved looking for and finding connections between people and data. Within this project, I found those connections and, more importantly, I knew the project would have a clear positive impact on the Vanderbilt community,” Milton said. “Through our work, we delved into interviews and stumbled across shared experiences that were unexpected, but insightful. One such theme was of invisibility. It was heartbreaking to discover that Black students throughout Vanderbilt’s history have that shared experience. I hope our work can spark future research in this area and hopefully lead to change.” 

Visualizations for further insights
Visualizing data gave Rosevelt and his team the ability to develop further insights from the interviews. (Vanderbilt Data Science Institute)

Cutting down on manual work, the team built the back-end technology to extract demographics from interviews and visualizations of theme intersectionality. These visualizations make gleaning insights based on this intersectionality—including graduate year, student affiliations, Greek life and others—simpler and more accurate.  

“I was utterly amazed at the power of AI and more specifically, natural language processing. The processing speed of the algorithms and the graphics generated took an inherently qualitative project and strangely made it feel more quantitative,” Noble said. “Working with the DSI elevated my perspective on the usefulness of the data and the overall project.”        

“Leveraging data science modeling approaches within this project has been powerful in revealing shared experiences for Black Vanderbilt students,” said Bell, who is also director of the data science minor at Vanderbilt. “Dr. Noble’s ongoing vision to provide an accessible and holistic view into the Black experience at Vanderbilt has generated catalogued photos, stories and video interviews of more than 500 students and alums. This provides a rich set of mixed media data to understand, learn from and plan a more inclusive future for our university.”

topic modeling
Theme identification was established through topic modeling. The bubble plot graph was generated with pyLDAvis. (Vanderbilt Data Science Institute)