Vanderbilt Bytes into Data Science: New programs in a revolutionary field will foster discovery, collaboration and learning across campusSep. 6, 2018, 12:04 PM
By Ryan Underwood, BA’96
The robot-assisted future vividly depicted in pop culture a half-century ago—from Stanley Kubrick’s film 2001: A Space Odyssey to The Jetsons—looks more like reality every day.
Artificial intelligence (AI) systems are helping solve everything from making hospitals run more smoothly to suggesting effective courses of cancer treatment. Similarly, machine learning continues to improve areas like financial investing, transportation, and voice-activated services like Siri and Alexa.
Data science, as the umbrella field is called, likely will revolutionize a number of other disciplines in the coming years. Yet Vanderbilt experts say we’ve barely reached the beginning stages of unlocking the full potential of data science—or understanding how to use it responsibly.
“Over the next decade, data science is estimated to have a significant impact across all sectors of the economy, from health care to transportation, manufacturing, construction and urban living,” reads the final report of Vanderbilt’s Data Science Visions group, a faculty-led initiative formed through a 2017 Trans-Institutional Projects award. “Investing in data science expertise that is broadly available to, and adoptable by, Vanderbilt researchers will yield dramatic advances in academic discovery.”
That report was released in May, and out of it came recommendations to launch a new Data Science Institute at Vanderbilt, develop a professional master’s degree in the field, and add courses on the subject for undergraduates. Those recommendations are now being implemented, with the institute formally launching this fall as planning for the new master’s program gets underway.
“Data science forms a natural hub for collaboration across fields and schools,” Provost and Vice Chancellor for Academic Affairs Susan R. Wente said in a news release announcing the recommendations in the spring. “This initiative is designed to ensure our research and scholarship are competitive and innovative.”
REVEALING THE ESSENCE
Data science is a field that combines statistical methods, mathematics and computer technology to extract patterns and predictions from what seems like random numerical noise. “In research, there’s often a problem of recognition. The answer may be in the data, but you don’t recognize it,” says Padma Raghavan, Vanderbilt’s vice provost for research and professor of computer science and computer engineering.
Raghavan compares data science to the work of surrealist painter Octavio Ocampo, whose portrait of Don Quixote consists of multiple images such as windmills and donkeys that ultimately form a likeness of the adventurer’s face. “You have to shift your attention from one scale to another,” she says. “Data science can filter out the distracting information and reveal the essential.”
The field also lends itself to a wide range of academic disciplines. Areas like genetics and astrophysics are already using artificial intelligence and machine learning to explore massive data sets, fueling new discoveries about everything from cell mechanics to the search for extraterrestrial life. But less obvious examples are taking place on campus as well, such as analyzing satellite imagery to map ancient civilizations; identifying fraud signals in the text of regulatory disclosures issued by publicly traded companies; and modeling how networks of medieval religious scholars formed.
Perhaps most tantalizing of all is the prospect of sparking meaningful collaboration among different disciplines across campus through the language and methods of data science.
Andreas Berlind, associate professor of astrophysics, who co-chaired the Data Science Visions group and will co-lead the new institute, says he and other researchers got a taste of the interdisciplinary work to come earlier this year during a series of data-science workshops. Faculty from various Vanderbilt departments presented their work to data specialists from astrophysics, computer science, biostatistics and other technical fields, brainstorming about new methods and avenues for investigation. Berlind says one particular topic that seemed to offer a rich vein of exploration is a longitudinal data set with information about student, teacher and school performance compiled by Peabody researchers in partnership with the Tennessee Department of Education (known as the Tennessee Education Research Alliance). The data scientists in the room beamed at the prospect of applying AI and machine learning to such a rich trove of information.
“But the big obstacle always comes down to who is going to do the work!” Berlind exclaims. “That’s the challenge. You need personnel who are trained in data science and can spend enough time going deep into a specific domain. Interdisciplinary collaborations are not going to happen magically.”
That’s where the new Data Science Institute and the master’s degree program come in. Douglas Schmidt, associate provost for research development and technologies and Cornelius Vanderbilt Professor of Computer Science and Computer Engineering, says Vanderbilt has made great strides in incentivizing faculty to work across disciplines. As prime examples, he points to Vanderbilt’s Trans-Institutional Program awards and to the university’s center for innovation, the Wond’ry, where the Data Science Institute initially will be housed.
“The university has established a precedent for incentivizing faculty to collaborate,” says Schmidt, who will co-lead the Data Science Institute alongside Berlind. “The programs we hope to establish around data science will take that collaboration to the next level.”
As Schmidt envisions it, Vanderbilt faculty from various schools and departments would propose collaborative research projects and then apply to have a data-science researcher from the new institute dedicated to the endeavor. On a smaller scale, Schmidt says data scientists housed within the institute would hold regular office hours, so that faculty and other researchers can explore ongoing questions in their work.
The institute also will become the epicenter for the data-science master’s students, and interested undergraduates, to learn about the field while gaining hands-on experience. Jeffrey Blume, associate professor of biostatistics and biomedical informatics, will lead the master’s program. Plans call for the first students to be admitted starting in 2019.
“We want to find ways for faculty, for researchers and for students to hang around,” Schmidt says. “We want to build a whole ecosystem around data science.”
THE PROMISE OF PERSONALIZED MEDICINE
One area at Vanderbilt that’s primed to benefit from data science is personalized medicine, says Yu Shyr, chair of Vanderbilt’s Department of Biostatistics and a specialist in applying data-science methodologies to cancer research.
Shyr, who co-chaired the Data Science Visions group with Berlind, says starting in February, the U.S. Food and Drug Administration began approving a handful of diagnostic tools that use artificial intelligence, with plenty more in the pipeline. For example, one application helps doctors pinpoint strokes found in CT scans—either discovering ones that may have been overlooked or confirming a correct reading of a scan while offering an even deeper level of understanding about the event.
“These applications are not going to replace doctors,” says Shyr, who also holds the Harold L. Moses Chair in Cancer Research. “They offer another tool, another piece of data that helps doctors treat patients according to their individual needs.”
Data science has already helped scientists hone their understanding of how genetics plays a role in an individual’s health. Now, Shyr says data science is being used to map patients’ bacterial profiles as well.
“In the future we will link these warehouses of ‘-omics’ data—genomics, microbiomics, radiomics, which is imaging data—with all the biomarkers from a person’s electronic health record,” Shyr says. “That interaction will really provide a solid foundation for precision medicine.”
While personalized medicine holds much promise, Shyr says data science is already playing an important role in preventative care. For example, patients who get an annual checkup at Vanderbilt University Medical Center may see a line on their reports telling them the likelihood of developing heart-related illnesses. That probability, he says, is calculated using inputs like cholesterol levels, blood pressure, age and body mass index. “Already, you can see how we apply that information to daily life.”
Shyr says another major area of health care that will benefit from the rise of data science is drug research and development. He cites Swiss pharmaceutical company Roche’s recent $1.9 billion acquisition of digital health startup Flatiron Health as an example. He explains that Roche scientists can use information from Flatiron’s expansive network of electronic health records from cancer patients to speed up clinical trials, as well as quickly abandon ineffective treatments. One way to do this, he says, is by replacing traditional control groups needed for pharmaceutical studies with information gleaned from Flatiron’s databases of clinical records, reducing the amount of time and money spent on trials.
“We are still in the very early stages of data science,” Shyr says. “Today we say ‘big data,’ but what does this really mean? Maybe in 10 years it will look like small data.”
The new Data Science Institute will play a major role in Vanderbilt’s health and medical research, Schmidt says. But many other areas stand to benefit from data science, such as transportation and several smart-city initiatives. In fact, Vanderbilt already has a partnership in place with the Nashville mayor’s office to help the city navigate its rapid expansion during the past decade.
Schmidt also sees wide data-science applications in the near term for economists and finance researchers from the Owen Graduate School of Management, pointing to potential partnerships with Alliance Bernstein, the Wall Street investment firm moving its headquarters to Nashville. And, as mentioned, Peabody’s longitudinal data about Tennessee public schools offer immediate opportunities for the institute’s involvement.
“One of the most important things we will be doing is teaching people how to tap into and use data sets wherever they reside,” Schmidt says. “There are so many cool tools out there for data analytics, for AI, for machine learning—we want to train people how to use them effectively.”
Berlind says beyond using data science to fuel new discoveries, it’s important to house experts within Vanderbilt who study the field’s impact on society.
In April the Data Science Visions group hosted a forum at which experts debated the ethics of using data-driven algorithms in the criminal justice system. One example is a new system in Chicago that assigns scores to residents, assessing their likelihood of being a victim of violence. But critics have argued that the system is instead being used to target potential attackers before any crimes take place, reminiscent of the dystopian storyline in the film Minority Report.
Schmidt says it’s easy to misuse data since the numbers often have a ring of authority about them.
“We fool ourselves into thinking that data sets are objective,” he says. “But they always must be understood in context.”
Yet, examining the risks—and rewards—of data science is part of a larger whole. Similar to the way Vanderbilt invested broadly in neuroscience in the late 1990s as a field that tied together many disciplines on campus, university administrators think data science holds the same promise.
“Data science as an interdisciplinary field is only in its infancy, but we know that across all disciplines there will be more and more data and that they will be increasingly complex,” Raghavan says. “The Data Science Institute is really about taking away the tedium of dealing with data, making it easier to answer the exciting ‘what if’ questions that spur innovation.”
Arranged by Algorithm: Drew Silverstein’s Amper is redefining music composition through artificial intelligence
Computers may be able to perform a variety of tasks once reserved only for humans, but composing music—an inherently creative and emotive endeavor—would seem unlikely to be among them. That impression might change, though, if Drew Silverstein, BMus’11, has his way. Silverstein is the co-founder and CEO of Amper, a New York–based startup that uses artificial intelligence, rather than human composers, to create original music.
Formed in 2014, Amper fills a need for licensing music for commercial purposes. Finding production music—what’s heard in advertisements and online videos, for example—requires hunting through an online service’s massive catalog of recordings or hiring a composer to create a bespoke piece of music. As a result, production music can be either inexpensive with high search costs or expensive with low search costs.
“We used our background and expertise as composers to create a novel way to solve the problem,” says Silverstein, whose résumé includes composing, producing and songwriting for film, television and video games. Last year he was named to the Forbes “30 under 30” list and already has raised more than $9 million in early funding for Amper.
The product is remarkably fast and dynamic. A person can build a song from start to finish in less than a minute using the simple interface. The process begins with the choice of a genre such as hip-hop and an accompanying mood, like “cool,” “chill” or “reflective.” A cinematic song might be “inspirational,” “suspenseful,” “gloomy,” or a dozen or so other moods. The pro version of Amper goes even deeper, allowing the user to select instruments and their specific style, the tempo, and the song’s duration.
At the smallest level, Amper works like the technology that produces the kind of files purchased on iTunes. An MP3 file uses an audio format that’s been stitched together from millions of samples of an analog recording. Individually, each snippet is too short for a human to recognize: An audio CD has 1.41 million samples, or bits, per second, while a high-quality MP3 has 320,000. But when laid end to end, the samples sound like seamless audio. Amper creates songs in a similar fashion using what Silverstein calls a “massive palette” of audio samples arranged according to the algorithms.
An Atlanta native, Silverstein graduated from the Blair School of Music with a degree in music theory and composition. Music is in his DNA, one might say. One of his sisters also studied music in college.
“There’s no better training for life than to be a music major or a student of music because of the emphasis on personal responsibility, practice and attention to detail,” he says.
Recognizing a need for business knowledge to complement his music experience, Silverstein earned an MBA from Columbia University in 2016. The knowledge gained has helped him navigate the complicated, and at times uneasy, relationship between artificial intelligence and the business world. He acknowledges that AI has the potential to change the role of some human composers and songwriters, but argues it’s also opening up new markets, making music production more accessible and more affordable for everyday people.
“We believe that anyone, by virtue of being a human being,” Silverstein says, “is internally creative.”
—GLENN PEOPLES, MBA’08