The Foundations of Data Science Institute (FODSI) brings together a large and diverse team of researchers from UC Berkeley, MIT, Boston University, Harvard University, Northeastern University, Bryn Mawr College and Howard University, with the aim of laying the theoretical foundations for the field of data science across the full breadth of scientific issues that arise in the rich and complex processes by which data can be used to make decisions – modeling issues, inferential issues, computational issues, and societal issues. Research in the institute is organized around eight themes. Four of these themes focus on key challenges arising from strategic, sequential, combinatorial and experimental interactions (Learning and Economics, Reinforcement Learning, Networks and Graphical Models, and Causal Inference), and others represent opportunities for major impacts across disciplinary boundaries: on elucidating the algorithmic landscape of statistical problems (Computational Complexity of Statistical Estimation), on scalability in data science problems (Sketching, Sampling, and Sub-Linear Time Algorithms); on exploiting statistical methodology in the service of algorithms (Machine Learning for Algorithms); and on using breakthroughs in applied mathematics to address computational and inferential challenges (Geometry of Sampling and Optimization). Societal issues in data science will feature throughout this set of themes. The institute aims to educate and mentor a diverse cohort of future leaders in data science, and to broaden participation and diversity in the data science workforce, and it hosts a range of public activities such as research workshops, collaborative research programs and summer schools.
University of Washington, University of Wisconsin–Madison, University of California–Santa Cruz, University of Chicago
Data science is making an enormous impact on science and society, but its success is uncovering pressing new challenges that stand in the way of further progress. Outcomes and decisions arising from many machine learning processes are not robust to errors and corruption in the data; data science algorithms are yielding biased and unfair outcomes, as concerns about data privacy continue to mount; and machine learning systems suited to dynamic, interactive environments are less well developed than corresponding tools for static problems. Only by an appeal to the foundations of data science can we understand and address challenges such as these.
Building on the work of three TRIPODS Phase I institutes, the Institute for Foundations of Data Science (IFDS) brings together researchers from the Universities of Washington, Wisconsin-Madison, California-Santa Cruz, and Chicago, with the goal of tackling these critical issues. IFDS organizes its research around four core themes: complexity, robustness, closed-loop data science, and ethics and algorithms. By making concerted progress on these fundamental fronts, IFDS aims to lower several of the barriers to better understanding of data science methodology and to its improved effectiveness and wider relevance to application areas. In concert with its research agenda, IFDS engages the data science community through workshops, summer schools, and hackathons, and is committed to equity and inclusion through extensive plans for outreach to traditionally underrepresented groups.