New ‘text mining’ tech tools boon for Vanderbilt researchers

(L to R) Catherine Lee and Michael Stuart, assistant professors of accounting, and Clifford Anderson, director for scholarly communications at Vanderbilt Libraries (Steve Green/Vanderbilt University)

Vanderbilt University scholars can now take advantage of new technological tools to extract and analyze huge amounts of text, with the potential for increased research opportunities across disciplines.

For example, Owen faculty members Catherine Lee, Michael Stuart and Richard Willis have been working with Vanderbilt Libraries to conduct a semantic analysis of historical earnings conference calls of publicly traded firms. They are using a new application program interface (API) or “back door portal” to the current and historical information in the LexisNexis Academic database, according to Hilary Craiglow, director of the Walker Management Library at the Owen Graduate School of Management.

Access to the LexisNexis API was made possible by an Owen and library partnership, and is a benefit to the entire Vanderbilt research community. “LexisNexis Academic content, with thousands of newspapers, journals, transcripts and wire services, covers a wide breadth of disciplines including law, business, political science, education, government and many others,” says Jody Combs, interim dean of libraries. “This project also demonstrates the increasingly critical role for libraries in facilitating this type of analysis — not just in the sense of acquiring access to content, but in providing expertise with analytical tools and platforms.”

The Owen project utilizes a small portion of the database, which provides documents in XML format. “Earnings conference calls of publicly traded firms contain a wealth of information that is just beginning to be explored,” says Stuart, assistant professor of management. “These tools put us on the cutting edge of accounting research.”

Jonathan Gilligan (Vanderbilt University)

Meanwhile, Jonathan Gilligan, associate professor of earth and environmental sciences, and graduate student John Nay, working under a National Science Foundation grant to study water conservation in U.S. cities, are using the library’s Television News Archive and the new LexisNexis API to complete a comprehensive study of newspaper and television coverage of water issues.

“We have access to hundreds of thousands of television news stories and millions of newspaper articles,” says Gilligan, who serves as associate director for research, Vanderbilt Climate Change Research Network. “Advances in topic modeling and sentiment analysis will allow us to study patterns and trends in news coverage more comprehensively than we could if we had to search for and read each article individually. Our tools also will complement traditional search tools in helping researchers identify specific articles to read closely. While our work focuses on environmental and water issues, the topic and sentiment database we will generate will cover a broad range of topics over the past several decades and is likely to be useful to researchers in all disciplines.”

“As the demand for digital scholarship support rises across campus, libraries are building on their deep knowledge of databases with new programming skills, like XQuery.” —Clifford Anderson

 

Integral to the Owen research project and others is the campus’s growing XQuery expertise. “As the demand for digital scholarship support rises across campus, libraries are building on their deep knowledge of databases with new programming skills, like XQuery,” says Clifford Anderson, director for scholarly communications at the library. “In particular, XQuery is a good match for the digital humanities; digital humanists frequently look for patterns among large quantities of loosely structured documents.” Anderson was project director of the National Endowment for the Humanities-sponsored XQuery Summer Institute at Vanderbilt last year.

Other faculty members utilizing XQuery for their research projects include: Steve Wernke, associate professor of anthropology, Linked Open Gazetteer of the Andean Region (LOGAR); David Michelson, assistant professor of the history of Christianity, Syriaca.org: The Syriac Reference Portal; Steve Baskauf, senior lecturer in biological sciences, Bioimages; Todd Hughes, director for instructional technologies, Center for Second Language Studies, Irish Place Names in Tennessee Project; and the library’s Corpus Baudelaire project.

Access to the LexisNexis Academic API will be available through May 2016. Campus community members who would like to learn more about XQuery are invited to a working group that meets from 3 to 4:30 p.m. every Friday at the Central Library. Newcomers may join anytime.

For more information, email Clifford Anderson or Hilary Craiglow.