We expect to continue to host two events each month. Going forward, we expect more events to be focused on Fellows' projects, issues, data collection strategies, and discussions.
How to think like a data reporter
Data journalists are the alley mechanics of data science. We haven't spent our lives as engineers or always know the theory. At the same time, we have a lot of practical experience that's highly relevant to day-to-day data science. This session will cover many tricks of data reporting like developing and interviewing sources, acquiring data via records requests and scraping, and finding hidden data sources. We'll discuss the technical environment we work in and tools we use, and how our processes are designed to minimize technical debt. Finally, we'll talk about how we get our work in front of an audience and how we try to present it effectively.
[taken from the marshall project]
David Eads has been trying to make "good internet" for over twenty years. Eads is a co-founder of the Invisible Institute, where he also built The View From The Ground. He helped create FreeGeek Chicago, a community-based computer recycling organization. He has built editorial products as a reporter and editor at the Chicago Tribune, NPR Visuals, ProPublica Illinois, and The Chicago Reporter. Eads was a member of the team of independent journalists who won the 2019 Premio Gabo for general coverage for their reporting on mass graves in Mexico.
September 24, 2021
Hemanshu Kaul, Illinois Tech
Ethics and Equity in Mathematical Modeling
information available shortly
September 10, 2021
Panel discussion with Fellows, Faculty, and Advisors
Discussion will be focused on data collection, surveys, and related diversity questions, with applications to public health and education. Inspired by fellows initiatives, interests, and projects.
August 27, 2021
Fall 2021 Kick-off event
Introduction to new project proposals
Learn about the new class of Fellows, their projects, and their backgrounds!
March 8, 2021
An End-to-End Security and Privacy Framework for Big Data and Machine Learning
Recent cyberattacks have shown that the leakage/stealing of big data may result in enormous monetary loss and damage to organizational reputation, and increased identity theft risks for individuals. Furthermore, in the age of big data, protecting the security and privacy of stored data is paramount for maintaining public trust, accountability and getting the full value from the collected data. Therefore, we need to address security and privacy challenges ranging from allowing access to big data to building novel data analytics model using the privacy sensitive data. In this talk, we will provide an overview of our end-to-end solution framework that tries to address these challenges.
We start the talk by discussing the unique security and privacy challenges arise due to big data and the recent systems designed to analyze big data. Later on, we discuss how to add additional security layer for protecting big data using encryption techniques. Especially, we discuss our work on leveraging the modern hardware based trusted execution environments such as Intel SGX for secure encrypted data processing. We focus on how to provide a simple, secure and high level language based framework that is suitable for enabling generic data analytics for non-security experts.
Also, we discuss our work on addressing the security and privacy issues with respect to the resulting data analytics/machine learning (ML) models. First, we discuss how these learned machine ML models could be attacked, how a game theoretic solution concept could be used to learn more robust ML models resistant to various attacks. In addition, we discuss how to build more robust models for federated learning systems. Finally, we discuss why the perceived fragility of the ML models against certain attacks is useful for enhancing individual privacy by showing how to look smarter to a ML model by modifying your social media profile.
Dr. Murat Kantarcioglu is a Professor in the Computer Science Department and Director of the Data Security and Privacy Lab at The University of Texas at Dallas (UTD). He received a PhD in Computer Science from Purdue University in 2005 where he received the Purdue CERIAS Diamond Award for Academic excellence. He is also a visiting scholar at Harvard Data Privacy Lab. Dr. Kantarcioglu's research focuses on the integration of cyber security, data science and blockchains for creating technologies that can efficiently and securely process and share data.
His research has been supported by grants including from NSF, AFOSR, ARO, ONR, NSA, and NIH. He has published over 170 peer reviewed papers in top tier venues such as ACM KDD, SIGMOD, ICDM, ICDE, PVLDB, NDSS, USENIX Security and several IEEE/ACM Transactions as well as served as program co-chair for conferences such as IEEE ICDE, ACM SACMAT, IEEE Cloud, ACM CODASPY. Some of his research work has been covered by the media outlets such as the Boston Globe, ABC News, PBS/KERA, DFW Television, and has received multiple best paper awards. He is the recipient of various awards including NSF CAREER award, the AMIA (American Medical Informatics Association) 2014 Homer R Warner Award and the IEEE ISI (Intelligence and Security Informatics) 2017 Technical Achievement Award presented jointly by IEEE SMC and IEEE ITS societies for his research in data security and privacy. He is also a fellow of AAAS and distinguished scientist of ACM.
The road toward trustworthy AI
One of the latest and most relevant trends in Artificial Intelligence (AI) research and industry is the proliferation of ethical principles and guidelines for the design and assessment of AI systems. An example of these efforts is the European Ethics Guidelines for Trustworthy AI delivered in 2019 by a group of experts under the mandate of the European Commission. In this presentation I will outline the key requirements proposed by these guidelines and discuss some challenges underlying their implementation such as the development of meaningful interdisciplinary collaborations.
Teresa is a post-doc at the European Centre for Living Technology (ECLT), Ca’ Foscari University (Italy), working on the AI4EU project. Previously she worked on the ThinkBIG project at the University of Bristol (UK). Her research interests lie at the intersection of Philosophy and Artificial Intelligence. Currently she is interested in the social and ethical impacts of AI, in particular, on human decision-making and social regulation.
Teresa received her PhD in Computer Science from Ca’ Foscari University (Venice, Italy) under the supervision of professor Marcello Pelillo. Her PhD thesis explored the philosophical foundation of machine learning and pattern recognition. Teresa is the co-editor of the forthcoming MIT press book on "Machines We Trust".
The human side of data science
"The data is the data" relieves us from considering where most of our data comes from: people. This phrase abstracts away the complexities of how data are collected, and the biases in the structures generating those data. Instead, the focus of data science education is often placed on the technical data science pipeline and its successes: data are ingested and cleaned, and then modeled and visualized for prediction and decision-making. These data science efforts intersect with so many parts of our lives—both directly and indirectly. Some of these points of intersection are more obvious: when we shop online, stream a TV show or movie, or look up directions in an app. Some are less obvious: advertising and marketing, epidemiology, climate change, and health. Because these data come from (and are about) people—people with plans, hopes, fears, and concerns—it’s critical for compassion, ethics, and social education to be a core component of the data science pipeline. In this talk I explore the foundations of data science, how it's being leveraged in my research field of neuroscience, and how we approach undergraduate data science education at UC San Diego.
Bradley Voytek is an Associate Professor in the Department of Cognitive Science, the Halıcıoğlu Data Science Institute, and the Neurosciences Graduate Program at UC San Diego. He is both an Alfred P. Sloan Neuroscience Research Fellow and a Kavli Fellow of the National Academies of Sciences, as well as a founding faculty member of the UC San Diego Halıcıoğlu Data Science Institute and the Undergraduate Data Science program, where he serves as Vice-Chair. After his PhD at UC Berkeley he joined Uber as their first data scientist, when it was a 10-person startup, where he helped build their data science strategy and team. His neuroscience research lab combines large-scale data science and machine learning to study how brain regions communicate with one another, and how that communication changes with development, aging, and disease. He is an advocate for promoting science to the public, and speaks extensively with students at all grade levels about the joys of scientific research and discovery. In addition to his academic publications, his outreach work has appeared in outlets ranging from Scientific American and NPR to the San Diego Comic-Con. His most important contribution to science though is his book with fellow neuroscientist Tim Verstynen, "Do Zombies Dream of Undead Sheep?", by Princeton University Press.
May 24, 2021
Olga Isupova, Bath University
Isla Duporge, Oxford University
Using satellite imagery and deep learning to detect and count African elephants
Satellites allow large‐scale surveys to be conducted in short time periods with repeat surveys possible at intervals of <24 h. Very‐high‐resolution satellite imagery has been successfully used to detect and count a number of wildlife species in open, homogeneous landscapes and seascapes where target animals have a strong contrast with their environment. However, no research to date has detected animals in complex heterogeneous environments or detected elephants from space using very‐high‐resolution satellite imagery and deep learning. In this talk, we will discuss how we applied a deep learning model to automatically detect and count African elephants in a woodland savanna ecosystem in South Africa. We have shown that with the current state-of-the-art machine learning object detection models we can achieve the same level of accuracy as human observers of satellite imagery despite the fact that adult elephant can occupy only 35 pixels in the image. The deep learning model can generalize to detect elephants in a different geographical location and from a lower resolution satellite. Our study demonstrates the feasibility of applying state‐of‐the‐art satellite remote sensing and deep learning technologies for detecting and counting African elephants in heterogeneous landscapes. The study showcases the feasibility of using high resolution satellite imagery as a promising new wildlife surveying technique.