Cohere For AI Community Talks Randall Balestriero

>> YOUR LINK HERE: ___ http://youtube.com/watch?v=YRYNSlssDZU

The Fair Language Paradox • Large Language Models (LLMs) are widely deployed in real-world applications, yet little is known about their training dynamics at the token level. Evaluation typically relies on aggregated training loss, measured at the batch or dataset level, which overlooks subtle per-token biases arising from (i) varying token-level dynamics and (ii) structural biases introduced by hyperparameters. While weight decay is commonly used to stabilize training, we reveal that it silently introduces strong biases--easily measured through token-level metrics. That bias seems to emerge for varying dataset and model sizes: as weight decay increases as low-frequency tokens are disproportionately disregarded by the model. This finding is concerning as these neglected low-frequency tokens altogether represent the vast majority of the token distribution in most languages. We conclude by proving how the current LLM pre-training strategy--classification task on highly imbalance classes--is misaligned with producing fair generative models. • abs: https://arxiv.org/abs/2410.11985 • Dr. Randall Balestriero is an Assistant Professor at Brown University. He has been doing research in learnable signal processing since 2013, in particular with learnable parametrized wavelets which have then been extended for deep wavelet transforms. The latter has found many applications, e.g., in the NASA's Mars rover for marsquake detection. In 2016 when joining Rice University for a PhD with Prof. Richard Baraniuk, He broadened my scope to explore Deep Networks from a theoretical persepective by employing affine spline operators. This led him to revisit and improve state-of-the-art methods, e.g., batch-normalization or generative networks. In 2021 when joining Meta AI Research (FAIR) for a postdoc with Prof. Yann LeCun, He further enlarged his research interests e.g. to include self-supervised learning or biases emerging from data-augmentation and regularization leading to many publications and conference tutorials. In 2023, he have joined GQS, Citadel, to work on highly noisy and nonstationnary financial time-series and to provide AI solutions for prediction and representation learning. Such industry exposure is driving my research agenda to provide practical solutions from first principles which he has been pursuing every day for the last 10 years. • This session is brought to you by the Cohere For AI Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. Thank you to our Community Leads for organizing and hosting this event. • If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker. • Join the Cohere For AI Open Science Community to see a full list of upcoming events: https://tinyurl.com/C4AICommunityApp.

#############################

New on site