NSF’s Harnessing the Data Revolution (HDR) Big Idea is a national-scale activity founded in 2016 to enable new modes of data-driven discovery that address fundamental questions at the frontiers of science and engineering. By engaging NSF's research community in the pursuit of fundamental research in data science and engineering, the ecosystem of HDR Institutes, Data Science Corps (DSC), and Transdisciplinary Research in Principles of Data Science (TRIPODS) strive to provide a cohesive, federated, national-scale approach to research data infrastructure, and the development of a 21st-century data-capable workforce.
The 3rd annual conference of the NSF HDR Ecosystem will be held on the campus of the University of Illinois Urbana-Champaign from the 9th to the 12th of September 2024.
The main goals workshop are:
Build and strengthen productive, inclusive, and positive relationships within the HDR ecosystem and the broader data-intensive research community.
Provide a forum to share accomplishments, goals and plans of HDR ecosystem entities.
Assess current and planned cross-cutting activities that address shared challenges and seek new opportunities to sustain and grow the HDR ecosystem and advance data-intensive research
Assess the landscape of institutes, projects and activities within data science, AI, industry and infrastructure to advance a vision of how the HDR ecosystem is an integral part of a broader coherent ecosystem of AI and data-intensive research.
The registration desk is in the Illinois Conference Center atrium, right in front of you when you enter through the main entrance via the parking lot.
Moderator: Tanya Berger-Wolf (The Ohio State University)
A panel of the NSF program directors bringing the NSF perspective of the history and the future of the data revolution, particularly in the context of AI revolution.
Panelists primarily assembled from Cognizant Program Officers of NSF 21-519 HDR Institute and NSF24-560 DSC:
Climate change, loss of bio-diversity, food/water/energy security for the growing population of the world are some of the greatest environmental challenges that are facing the humanity. These challenges have been traditionally studied by science and engineering communities via process-guided models that are grounded in scientific theories. Motivated by phenomenal success of Machine Learning (ML) in advancing areas such as computer vision and language modeling, there is a growing excitement in the scientific communities to harness the power of machine learning to address these societal chal-lenges. In particular, massive amount of data about Earth and its environment is now continuously be-ing generated by a large number of Earth observing satellites, in-situ sensors as well as physics-based models. These information-rich datasets in conjunction with recent ML advances offer huge potential for understanding how the Earth's climate and ecosystem have been changing, how they are being impacted by humans actions, and for devising policies to manage them in a sustainable fashion. However, capturing this potential is contingent on a paradigm shift in data-intensive scientific discovery since the “black box” ML models often fail to generalize to scenarios not seen in the data used for training and produce results that are not consistent with scientific understanding of the phenomena.
This talk presents an overview of a new generation of machine learning algorithms, where scientific knowledge is deeply integrated in the design and training of machine learning models to accelerate scientific discovery. These knowledge-guided machine learning (KGML) techniques are fundamental-ly more powerful than standard machine learning approaches, and are particularly relevant for scien-tific and engineering problems that are traditionally addressed via process-guided (also called mecha-nistic or first principle-based) models, but whose solutions are hampered by incomplete or inaccurate knowledge of physics or underlying processes. While this talk will illustrate the potential of the KGML paradigm in the context of environmental problems (e.g., Ecology, Hydrology, Agronomy, climate sci-ence), the paradigm has the potential to greatly advance the pace of discovery in any discipline where mechanistic models are used.
The data revolution has been replaced by the AI revolution. It seems that every day, we hear about a new area of human endeavor that has been conquered by AI. There are AI lawyers, AI doctors, AI artists, poets, and mathematicians. AI will predict what jobs we get, what medicines we should be treated with, and even how we should be educated.
If that's true, then AI policy should consist of getting the government out of the way and letting innovation bloom. Indeed, that is what some advocate. But in fact, sound AI policy rests on the same principles that sound science rests on: openness, transparency, and evidence. In order to ensure that all of us can benefit from innovations in AI, the best ideas in AI policy emphasize openness in research, transparency and accountability in the claims made by those seeking to deploy it, and above all a requirement that claims -- bold claims at that -- of efficacy are backed up by evidence that can be independently evaluated.
In this talk, I'll talk about how AI policy is trying to harness the AI revolution so that all of us, including those who have been traditionally left behind by tech innovation, can lead lives enriched rather than controlled by technology.
At UIUC, there is strong ongoing collaboration between academia and industry, reflected in large industry-sponsored centers and institutes such as the IBM-Illinois Discovery Accelerator Institute (IIDAI) and the Center for Networked Intelligent Components and Environments (C-NICE), as well as in smaller and medium-sized partnerships like the AMD Center of Excellence and the Amazon-Illinois Center on AI for Interactive Conversational Experiences (AICE). The research activities in these centers are closely aligned with the strategic goals of industry partners, driving innovation, performance improvements, and market competitiveness across various high-tech sectors. Through close collaboration between UIUC students, faculty, and industry professionals, we aim to publish technical papers in top conferences and journals, while also transferring valuable knowledge back to industry. Additionally, many of these projects are open-source, with code freely available to benefit both the research community and industry. In this talk, Prof. Chen will highlight these topics, with a focus on AI workload acceleration.
A panel conversation on the future of AI in relation to technology, computing, education, and inclusion, including experts across interdisciplinary domains spanning academia and industry.
Panelists:
Abstract:
Effective science communication is crucial for bridging the gap between complex scientific concepts and diverse audiences. In this talk, I will explore strategies to convey scientific ideas clearly and engagingly, ensuring accessibility for individuals with varying levels of background knowledge. We will discuss the importance of storytelling in science communication, as well as techniques for simplifying technical jargon without compromising the integrity of the information. This talk aims to empower scientists, educators, and communicators to effectively share their work with broader, more diverse audiences, fostering a greater public understanding and appreciation of science.
Bio:
Sara Ayman Metwalli has a Ph.D. in Quantum Computing at Keio University, Japan, where she focused on developing and optimizing quantum algorithms and debugging tools for Noisy Intermediate-Scale Quantum (NISQ) devices. Sara has made significant contributions to quantum information science, particularly in the areas of quantum error correction and fault-tolerance. Her work has been published in leading journals and presented at international conferences, highlighting her role as an emerging leader in the quantum computing field.
In addition to her research, Sara has a strong commitment to STEM education and outreach. She has extensive experience as an educator, teaching programming and quantum computing to a wide range of students, from K-12 to university graduates. Sara is passionate about increasing diversity in STEM and has actively worked to create inclusive educational environments that empower underrepresented groups to pursue careers in science and technology."
During the ideation expo at the previous HDR Ecosystem Conference in Colorado (Oct 2023), the creation of an HDR-wide educational and outreach materials repository was proposed, with a committee of cross-institutional members formed to build this repository. The long-term vision for the repository is to promote and facilitate data fluency in domain specific contexts for communities ranging from K-12th grade classrooms to the general public. In this talk, I will describe our efforts to design and implement a centralized repository to host educational and outreach materials collected throughout the HDR institutes. I will also report on a recent workshop (with cross-institutional participation) on the development of data fluency learning modules targeting 6-12th grade STEM classrooms, which we have included in the repository. Finally, I will describe our plans moving forward with the eventual goal of public promotion and release.
This panel features representatives from five NSF-funded Data Science Corps (DSC) projects, each focused on harnessing data science to address real-world challenges while advancing education and workforce development. The panelists will share their experiences and insights on how their projects are making a significant impact in both academic and community settings.
Panelists: