2024 NSF HDR Ecosystem Conference Harvesting the Data Revolution

US/Central
Illinois Conference Center (University of Illinois at Urbana-Champaign)

Illinois Conference Center

University of Illinois at Urbana-Champaign

111 St Marys Rd, Champaign, IL 61820
Mark Neubauer (Univ. Illinois at Urbana Champaign (US))
Description

2024 NSF HDR Ecosystem Conference

NSF’s Harnessing the Data Revolution (HDR) Big Idea is a national-scale activity founded in 2016 to enable new modes of data-driven discovery that address fundamental questions at the frontiers of science and engineering.  By engaging NSF's research community in the pursuit of fundamental research in data science and engineering, the ecosystem of HDR Institutes, Data Science Corps (DSC), and Transdisciplinary Research in Principles of Data Science (TRIPODS) strive to provide a cohesive, federated, national-scale approach to research data infrastructure, and the development of a 21st-century data-capable workforce.

The 3rd annual conference of the NSF HDR Ecosystem will be held on the campus of the University of Illinois Urbana-Champaign from the 9th to the 12th of September 2024.

The main goals workshop are:

  • Build and strengthen productive, inclusive, and positive relationships within the HDR ecosystem and the broader data-intensive research community.

  • Provide a forum to share accomplishments, goals and plans of HDR ecosystem entities.

  • Assess current and planned cross-cutting activities that address shared challenges and seek new opportunities to sustain and grow the HDR ecosystem and advance data-intensive research

  • Assess the landscape of institutes, projects and activities within data science, AI, industry and infrastructure to advance a vision of how the HDR ecosystem is an integral part of a broader coherent ecosystem of AI and data-intensive research.

  • Formally launch the HDR Machine Learning Challenge.
    • 1
      Registration Atrium (Illinois Conference Center)

      Atrium

      Illinois Conference Center

      111 St Marys Rd, Champaign, IL 61820

      The registration desk is in the Illinois Conference Center atrium, right in front of you when you enter through the main entrance via the parking lot.

    • 2
      Welcome Reception Chancellor Ballroom (Illinois Conference Center)

      Chancellor Ballroom

      Illinois Conference Center

    • 3
      Welcome and Introduction Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      Speaker: Mark Neubauer (Univ. Illinois at Urbana Champaign (US))
    • 4
      Welcome from the Dean of the College of Engineering Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      Speaker: Rashid Bashir (University of Illinois at Urbana-Champaign)
    • 5
      NSF Welcome and HDR Overview Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      Speaker: Amy Walton (National Science Foundation)
    • 10:00
      Coffee Break Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • HDR Institute Overview, Activities and Accomplishments Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • 11
      NSF Panel Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      Moderator: Tanya Berger-Wolf (The Ohio State University)

      A panel of the NSF program directors bringing the NSF perspective of the history and the future of the data revolution, particularly in the context of AI revolution.

      Panelists primarily assembled from Cognizant Program Officers of NSF 21-519 HDR Institute and NSF24-560 DSC:

      • Amy Walton Deputy Director for the Office of Advanced Cyberinfrastructure (OAC) at the National Science Foundation
      • Sylvia Spengler, Program Director, Division of Information and Intelligent System (IIS) within the Computer and Information Science and Engineering (CISE) Directorate
      • Chaitanya K. Baru, Senior Advisor, Directorate for Technology, Innovation and Partnerships (TIP)
      • Raleigh Martin Program Director, Directorate for Geosciences (GEO) Division of Earth Sciences (EAR) Integrated Activities Section
      • Cheryl L. Eavey Directorate for Social, Behavioral and Economic Sciences (SBE) Division of Social and Economic Sciences (SES) Methodology, Measurement, and Statistics (MMS)
    • 12:30
      Working Lunch: Networking Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • 12
      Keynote talk: Knowledge-Guided Machine Learning: A New Framework for Accelerating Scientific Discovery and Addressing Global Environmental Challenges Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      Climate change, loss of bio-diversity, food/water/energy security for the growing population of the world are some of the greatest environmental challenges that are facing the humanity. These challenges have been traditionally studied by science and engineering communities via process-guided models that are grounded in scientific theories. Motivated by phenomenal success of Machine Learning (ML) in advancing areas such as computer vision and language modeling, there is a growing excitement in the scientific communities to harness the power of machine learning to address these societal chal-lenges. In particular, massive amount of data about Earth and its environment is now continuously be-ing generated by a large number of Earth observing satellites, in-situ sensors as well as physics-based models. These information-rich datasets in conjunction with recent ML advances offer huge potential for understanding how the Earth's climate and ecosystem have been changing, how they are being impacted by humans actions, and for devising policies to manage them in a sustainable fashion. However, capturing this potential is contingent on a paradigm shift in data-intensive scientific discovery since the “black box” ML models often fail to generalize to scenarios not seen in the data used for training and produce results that are not consistent with scientific understanding of the phenomena.

      This talk presents an overview of a new generation of machine learning algorithms, where scientific knowledge is deeply integrated in the design and training of machine learning models to accelerate scientific discovery. These knowledge-guided machine learning (KGML) techniques are fundamental-ly more powerful than standard machine learning approaches, and are particularly relevant for scien-tific and engineering problems that are traditionally addressed via process-guided (also called mecha-nistic or first principle-based) models, but whose solutions are hampered by incomplete or inaccurate knowledge of physics or underlying processes. While this talk will illustrate the potential of the KGML paradigm in the context of environmental problems (e.g., Ecology, Hydrology, Agronomy, climate sci-ence), the paradigm has the potential to greatly advance the pace of discovery in any discipline where mechanistic models are used.

      Speaker: Vipin Kumar (University of Minnesota)
    • Unconference "Elevator" Pitches Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • 15:30
      Coffee Break Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • Transdisciplinary Research: Continuous ML and Human-in-the-Loop Decision Making Excellence (Illinois Conference Center)

      Excellence

      Illinois Conference Center

      Convener: Sharad Sharma (University of North Texas)
    • Transdisciplinary Research: Knowledge-Guided ML / Physics-Informed Neural Networks Loyalty (Illinois Conference Center)

      Loyalty

      Illinois Conference Center

      Conveners: Jianwu Wang (University of Maryland, Baltimore County), Zhijian Liu (University of California San Diego)
    • Transdisciplinary Research: LLMs / Foundation Models for Research Alma Mater (Illinois Conference Center)

      Alma Mater

      Illinois Conference Center

      Conveners: Jiawei Han (University of Illinois at Urbana-Champaign), Pan Li (Georgia Tech)
    • Transdisciplinary Research: Responsible AI / Ethical AI Lincoln (Illinois Conference Center)

      Lincoln

      Illinois Conference Center

      Conveners: Peter Darch (University of Illinois at Urbana-Champaign), Savannah Thais
    • Unconference Determined: Fair and Open Science Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • Lightning Talks for Posters Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • Poster Session Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      • 13
        Imageomics: FAIR ML Products for Biological Knowledge Discovery
        Speaker: Elizabeth Campolongo
      • 14
        Incorporating phenotypic similarity into trait description embeddings​
        Speaker: Soumyashree Kar
      • 15
        Latent Space Phenotyping for Measuring Complex Evolutionary Traits
        Speaker: Caleb Charpentier
      • 16
        Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution
        Speaker: Mridul Khurana
      • 17
        What Do You See in Common? Learning Hierarchical Prototypes over Tree-of-Life to Discover Evolutionary Traits
        Speaker: Harish Babu Manogaran
      • 18
        BioCLIP: A Vision Foundation Model for the Tree of Life
        Speaker: Sam Stevens
      • 19
        Education and Outreach in Imageomics: Engaging Communities to Advance Science
        Speaker: Diane Boghrat
      • 20
        Using Deep Learning to Quantify Phenotypic Similarities in Mimic Butterfly Species using Human, Bird, and Butterfly Acuities
        Speaker: Michelle Ramirez
      • 21
        Dynamic Network Classification
        Speaker: Namrata Banerji
      • 22
        Tulane Center for Community-Engaged Artificial Intelligence
        Speaker: Aron Culotta
      • 23
        Understanding of impact of training size on animal re-identification
        Speaker: Ekaterina Nepovinnykh
      • 24
        Practical Leadership for Team Science: Experiences from the Imageomics Institute
        Speaker: Diane Boghrat
      • 25
        VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
        Speaker: Anuj Karpatne
      • 26
        What Do You See in Common? Learning Hierarchical Prototypes over Tree-of-Life to Discover Evolutionary Traits
        Speaker: Anuj Karpatne
      • 27
        National Data Mine Network
        Speaker: Mark Daniel Ward
      • 28
        Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
        Speaker: Kazi Sajeed Mehrab
      • 29
        Low-cost Efficient Wireless Intelligent Sensors (LEWIS) for Engineering Education: Native American Knowledge for Data Science Education
        Speaker: Fernando Moreu
      • 30
        Facilitating Knowledge Sharing and Discovery: Search Functionality and API Design for the I-GUIDE
        Speaker: Yunfan Kang
      • 31
        CLV: A Novel Framework for Enhanced Anomaly Detection and Attribution in Multivariate Time Series Data
        Speaker: Tolulope Ale
      • 32
        Battling Misinformation through Interdisciplinary Collaboration
        Speaker: Zahra Khanjani
      • 33
        CMAD: Advancing Understanding of Anomalous Melt Events over the Antarctic Sea Ice
        Speaker: Maloy Kumar Devnath
      • 34
        HDR DSC: The Metropolitan Chicago Data-science Corps (MCDC)
        Speaker: Lizhen Shi
      • 35
        Neural Network Efficiency Evaluation on the AMD Versal AI Engine
        Speaker: Yilin Shen
      • 36
        BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos
        Speaker: Maksim Kholiavchenko
      • 37
        Assessing Annotation AccAssessing Accuracy in Ice Sheets Using Quantitative Metrics
        Speaker: Bayu Tama
      • 38
        Genotype to Phenotype Mapping via Deep Learning
        Speaker: David Carlyn
      • 39
        Variance Analysis of Brightness Temperature using High-resolution DYAMOND simulations and CRTM in Digital Twin Systems
        Speaker: Chhaya Kulkarni
      • 40
        Predicting Sea ice extent over Antarctica using Patch CNN
        Speaker: Sai Vikas Amaraneni
      • 41
        HDR DSC: Collaborative Research: Transforming Data Science Education through a Portable and Sustainable Anthropocentric Data Analytics for Community Enrichment (ADACE) Program
        Speaker: Yu Liang
      • 42
        Physics-Informed Sea Ice Thickness Prediction
        Speaker: Akila Sampath
      • 43
        Probabilistic Prediction of Material Stability: Integrating Convex Hulls into Active Learning
        Speaker: Andrew Novick
      • 44
        Cyberinfrastructure for Scientific Data Preservation and Image Similarity Search
        Speaker: Joshua Agar
    • 45
      Public Lecture: Making AI Safe, Effective, and Trustworthy 0027/1025 Auditorium (Campus Instructional Facility)

      0027/1025 Auditorium

      Campus Instructional Facility

      1405 Springfield Ave, Urbana, IL

      The data revolution has been replaced by the AI revolution. It seems that every day, we hear about a new area of human endeavor that has been conquered by AI. There are AI lawyers, AI doctors, AI artists, poets, and mathematicians. AI will predict what jobs we get, what medicines we should be treated with, and even how we should be educated.

      If that's true, then AI policy should consist of getting the government out of the way and letting innovation bloom. Indeed, that is what some advocate. But in fact, sound AI policy rests on the same principles that sound science rests on: openness, transparency, and evidence. In order to ensure that all of us can benefit from innovations in AI, the best ideas in AI policy emphasize openness in research, transparency and accountability in the claims made by those seeking to deploy it, and above all a requirement that claims -- bold claims at that -- of efficacy are backed up by evidence that can be independently evaluated.

      In this talk, I'll talk about how AI policy is trying to harness the AI revolution so that all of us, including those who have been traditionally left behind by tech innovation, can lead lives enriched rather than controlled by technology.

      Speaker: Suresh Venkatasubramanian
    • AI & Data Infrastructure Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      • 46
        NSF OAC and the National Artificial Intelligence Research Resource (NAIRR) Pilot
        Speaker: Katerina Antypas (National Science Foundation)
      • 47
        Frontier AI for Science Security and Technology (FASST) initiative
        Speaker: Franck Cappello (Argonne National Laboratory)
    • 10:15
      Coffee Break Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • 48
      Knowledge Transfer in AI Workload Acceleration: from Academia to Industry Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      At UIUC, there is strong ongoing collaboration between academia and industry, reflected in large industry-sponsored centers and institutes such as the IBM-Illinois Discovery Accelerator Institute (IIDAI) and the Center for Networked Intelligent Components and Environments (C-NICE), as well as in smaller and medium-sized partnerships like the AMD Center of Excellence and the Amazon-Illinois Center on AI for Interactive Conversational Experiences (AICE). The research activities in these centers are closely aligned with the strategic goals of industry partners, driving innovation, performance improvements, and market competitiveness across various high-tech sectors. Through close collaboration between UIUC students, faculty, and industry professionals, we aim to publish technical papers in top conferences and journals, while also transferring valuable knowledge back to industry. Additionally, many of these projects are open-source, with code freely available to benefit both the research community and industry. In this talk, Prof. Chen will highlight these topics, with a focus on AI workload acceleration.

      Speaker: Deming Chen (University of Illinois at Urbana-Champaign)
    • Panel: The Role of AI, Computing and Data Infrastructure in Community Engagement Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      A panel conversation on the future of AI in relation to technology, computing, education, and inclusion, including experts across interdisciplinary domains spanning academia and industry.

      Panelists:

      • George Percivall GeoRoundtable: Spatial Web Foundation; IEEE Standards
      • John Fonner Director of Special Programs, The Texas Advanced Computing Center
      • Ashley Page Atkins Chief of Staff, San Diego Supercomputer Center
      • Brett Bode Assistant Director, National Center for Supercomputing Applications
      • Nhan Tran, Director of Real-Time Processing Systems Division, Fermi National Accelerator Laboratory
      Convener: Anand Padmanabhan (University of Illinois at Urbana-Champaign)
    • Machine Learning Challenges Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      • 49
        Machine Learning Challenges Overview
        Speaker: Philip Coleman Harris (Massachusetts Inst. of Technology (US))
      • 50
        Detecting Novel Astrophysical Phenomena with Gravitational Waves (A3D3)
        Speaker: Philip Coleman Harris (Massachusetts Inst. of Technology (US))
      • 51
        Detecting Anomalous Climate Phenomena (iHARP)
        Speaker: Subhankar Ghosh (University of Minnesota)
      • 52
        Anomaly Detection: Hybrid Butterflies (Imageomics)
        Speaker: Elizabeth Campolongo (The Ohio State University)
    • 12:30
      Lunch Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • 53
      Pre-tenure faculty & postdoc mentoring lunch Humanities (Illinois Conference Center)

      Humanities

      Illinois Conference Center

    • 54
      Communicating Science: Effective Delivery To Any Audience Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      Abstract:
      Effective science communication is crucial for bridging the gap between complex scientific concepts and diverse audiences. In this talk, I will explore strategies to convey scientific ideas clearly and engagingly, ensuring accessibility for individuals with varying levels of background knowledge. We will discuss the importance of storytelling in science communication, as well as techniques for simplifying technical jargon without compromising the integrity of the information. This talk aims to empower scientists, educators, and communicators to effectively share their work with broader, more diverse audiences, fostering a greater public understanding and appreciation of science.

      Bio:
      Sara Ayman Metwalli has a Ph.D. in Quantum Computing at Keio University, Japan, where she focused on developing and optimizing quantum algorithms and debugging tools for Noisy Intermediate-Scale Quantum (NISQ) devices. Sara has made significant contributions to quantum information science, particularly in the areas of quantum error correction and fault-tolerance. Her work has been published in leading journals and presented at international conferences, highlighting her role as an emerging leader in the quantum computing field.

      In addition to her research, Sara has a strong commitment to STEM education and outreach. She has extensive experience as an educator, teaching programming and quantum computing to a wide range of students, from K-12 to university graduates. Sara is passionate about increasing diversity in STEM and has actively worked to create inclusive educational environments that empower underrepresented groups to pursue careers in science and technology."

      Speaker: Sara Metwalli (Argonne National Laboratory)
    • 55
      Promoting Educational and Outreach Activities throughout the HDR Institutes via a Centralized Repository Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      During the ideation expo at the previous HDR Ecosystem Conference in Colorado (Oct 2023), the creation of an HDR-wide educational and outreach materials repository was proposed, with a committee of cross-institutional members formed to build this repository. The long-term vision for the repository is to promote and facilitate data fluency in domain specific contexts for communities ranging from K-12th grade classrooms to the general public. In this talk, I will describe our efforts to design and implement a centralized repository to host educational and outreach materials collected throughout the HDR institutes. I will also report on a recent workshop (with cross-institutional participation) on the development of data fluency learning modules targeting 6-12th grade STEM classrooms, which we have included in the repository. Finally, I will describe our plans moving forward with the eventual goal of public promotion and release.

      Speaker: Alex Pak (Colorado School of Mines)
    • Panel: Workforce Development and Education from Data Science Corps Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      This panel features representatives from five NSF-funded Data Science Corps (DSC) projects, each focused on harnessing data science to address real-world challenges while advancing education and workforce development. The panelists will share their experiences and insights on how their projects are making a significant impact in both academic and community settings.

      Panelists:

      • Francisco Iacobelli Associate Professor, Department of Computer Science, Loyola University Chicago
      • Fernando Moreu Associate Professor, Department of Civil, Construction and Environmental Engineering, University of New Mexico
      • Yu Liang Associate Professor, Department of Computer Science and Engineering, University of Tennessee, Chattanooga
      • Amanda Phillips de Lucas Director, Baltimore Neighborhood Indicators Alliance, Jacob France Institute
      • Mark Daniel Ward Professor of Statistics and (by courtesy) Mathematics, Purdue University
      Convener: Eric Sokol (National Ecological Observatory Network (NEON), Battelle)
    • 15:30
      Coffee Break Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • AI & Data Infrastructure: Challenges and Opportunities for HDR Institutes and NAIRR Integration Lincoln (Illinois Conference Center)

      Lincoln

      Illinois Conference Center

    • HDR Student-Focused Programming Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

      Convener: Diane Boghrat (The Ohio State University)
    • Machine Learning Challenges: Future ML Alma Mater (Illinois Conference Center)

      Alma Mater

      Illinois Conference Center

      Convener: Philip Coleman Harris (Massachusetts Inst. of Technology (US))
    • Unconference Determined: Model Validation, Interpretability, and Uncertainty Quantification Excellence (Illinois Conference Center)

      Excellence

      Illinois Conference Center

    • Workforce Development: Interdisciplinary Careers Loyalty (Illinois Conference Center)

      Loyalty

      Illinois Conference Center

      Convener: Josephine Namayanja (UMBC)
    • National Petascale Computing Facility Tour National Petascale Computing Facility

      National Petascale Computing Facility

      1725 S Oak St, Champaign, IL 61820
    • Conference Banquet Herritage Hall (Illinois Conference Center)

      Herritage Hall

      Illinois Conference Center

    • 56
      Breakout session summary Chancellor Ballroom (Illinois Conference Center)

      Chancellor Ballroom

      Illinois Conference Center

    • 10:00
      Coffee Break Chancellor Ballroom (Illinois Conference Center)

      Chancellor Ballroom

      Illinois Conference Center

    • 57
      White Paper Planning Chancellor Ballroom (Illinois Conference Center)

      Chancellor Ballroom

      Illinois Conference Center

      Speakers: Mark Neubauer (Univ. Illinois at Urbana Champaign (US)), Mark Stephen Neubauer (Univ. Illinois at Urbana-Champaign)
    • 58
      Conference Summary Chancellor Ballroom (Illinois Conference Center)

      Chancellor Ballroom

      Illinois Conference Center

      Speaker: Philip Coleman Harris (Massachusetts Inst. of Technology (US))
    • 59
      Closeout & Advertisement of 4th HDR Ecosystem Conference Chancellor Ballroom (Illinois Conference Center)

      Chancellor Ballroom

      Illinois Conference Center

      Speakers: Mark Neubauer (Univ. Illinois at Urbana Champaign (US)), Diane Boghrat (The Ohio State University)
    • 11:40
      Grab Boxed Lunches Chancellor Ballroom (Illinois Conference Center)

      Chancellor Ballroom

      Illinois Conference Center

    • Report Writing: Breakout #1 Alma Mater (Illinois Conference Center)

      Alma Mater

      Illinois Conference Center

    • Report Writing: Breakout #2 Lincoln (Illinois Conference Center)

      Lincoln

      Illinois Conference Center

    • Report Writing: Breakout #3 Loyalty (Illinois Conference Center)

      Loyalty

      Illinois Conference Center

    • Report Writing: Breakout #4 Excellence (Illinois Conference Center)

      Excellence

      Illinois Conference Center

    • National Petascale Computing Facility Tour National Petascale Computing Facility

      National Petascale Computing Facility

      1725 S Oak St, Champaign, IL 61820