Data-Efficiency and Robustness in Machine Learning

Speaker: Shiwei Zeng

Date: Mar 27, 11:45am–12:45pm

Abstract: Machine learning has been a powerful tool in the modern world. In the past decades, due to the explosion of unverified data sources, and the increasing interaction between human and computer, it is of concern whether machine learning algorithms are robust to data corruption or even adversarial attacks. On the other hand, in many real-world scenarios, it is hardly the case that we can gather enough data for training a good model, making data-efficiency another important aspect of algorithmic designs.

In this talk, I will address these concerns by presenting how to design machine learning algorithms that are provably robust to noise and have efficiency guarantees in terms of the computation, amount of labels, queries, and total number of samples. Several challenging learning settings are considered. Concretely, under the crowdsourced learning setting, where the unlabeled instances are given to the learner and the learner can choose to make queries from a pool of crowd workers, I will present algorithms that can generalize from the noisy crowd even when the majority is incorrect. Moreover, the algorithms are query and label-efficient, i.e. they make a constant number of queries on each unlabeled instance and in total only a logarithmic number of labels. Under the list-decodable learning setting, I will discuss the fundamental problem of mean estimation and present an attribute-efficient algorithm that can recover a list of hypotheses at least one of which is close enough to the ground truth. For the problem of learning polynomial threshold functions under the nasty noise, I will present an attribute-efficient algorithm that outputs a function that enjoys PAC guarantees with dimension-independent noise rate. When the underlying models have sparse structures, an attribute-efficient algorithm enjoys a sample complexity that depends polynomially on the sparsity parameter and poly-logarithmically on the ambient dimension.

Biographical Sketch: Shiwei Zeng is a Ph.D. candidate in the Department of Computer Science at Stevens Institute of Technology, working with Professor Jie Shen. She studies the theoretical foundations of machine learning. Specifically, her work is about algorithmic design and analysis for machine learning problems when faced with modern challenges such as malicious adversaries, data deficiency, and unreliable human annotations. She received the Early-career AMS-NSF-Simons-ICM Travel Grant in 2022 and the Stevens Excellence Doctoral Fellowship in 2023. She anticipates completing her doctoral study in the spring of 2024..

Location and Zoom link: 307 Love, or https://fsu.zoom.us/j/3195217545