EECS Seminar: GPU Acceleration of Peta-Scale Semantic Search in the AI Era

Senior Distinguished Research Scientist and Senior Director of Research, NVIDIA
Abstract: Compute devices have traditionally relied on CPU OS services to bring data into the memory in bulk before performing algorithmic computation on data-structure elements. This approach is well-suited for small data sets that fit into the physical memory or applications with known data access patterns that allow their dataset to be partitioned and processed in a pipelined fashion. However, trending applications such as graph and data analytics, semantic databases, semantic search engines and generative AI inferences require data-dependent, sparse access to vast datasets. Traditional CPU OS services are unsuitable for these applications due to high synchronization overheads, I/O traffic amplification and/or low CPU software throughput. Recent Large Language Models attempt to circumvent this deficiency by training the massive data into their model parameters for recalls during inference without being accessed through OS services. However, such workaround incurs huge number of model parameters, long model training time and inability for the models to keep up with data updates. I will present BaM, a new system architecture for accelerating compute-directed sparse access to massive datasets, and the BaM software stack that efficiently supports trending data-intensive applications on existing and upcoming GPUs. We envision that BaM will enable a new generation of AI models, search engines and databases to much more efficiently extract value from massive data.
Bio: Wen-mei Hwu is a Senior Distinguished Research Scientist at NVIDIA. He is also a professor emeritus and the Sanders-AMD Endowed Chair Emeritus of electrical and computer engineering at the University of Illinois at Urbana-Champaign. His research is in architecture, algorithms and infrastructure software for data intensive and computational intelligence applications. He served as the Illinois director of the IBM-Illinois Center for Cognitive Computing Systems Research Center (c3sr.com) from 2016 to 2020. He was a PI of the NSF Blue Waters supercomputer project that pioneered the use of GPUs in large-scale scientist applications. He co-authored the “Programming Massively Parallel Processors” whose four editions to date have educated more than one hundred thousand students in parallel programming. For his contributions, he received the ACM-IEEE Eckert-Mauchly Award, the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the IEEE Computer Society Charles Babbage Award, the ISCA Influential Paper Award, the MICRO Test-of-Time Award, the IEEE Computer Society B. R. Rau Award, the CGO Test-of-Time Award, numerous best paper awards, numerous teaching awards and the Distinguished Alumni Award in CS of the University of California, Berkeley. He is a Fellow of IEEE and ACM.