Dongyang Fan
I'm a PhD student at Machine Learning and Optimization Lab at EPFL, supervised by Prof. Martin Jaggi.
My research interests are:
- Modular and Collaborative Machine Learning: Mixture of Experts, co-distillation, collaborator selection.
- Data Valuation and Data Markets: quantification of the impact of pre-training data on downstream performance and the fair compensation of content generators.
- Post-training of LLMs: reasoning, alignment and steering.
I am also happy to branch out my research. If you want to reach out, do not hesitate to drop me an email!
Email /
CV /
Google Scholar /
Twitter /
Github /
LinkedIn
|
|
|
Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
Dongyang Fan,
Vinko Sabolčec,
Matin Ansaripour,
Ayush Kumar Tarun,
Martin Jaggi,
Antoine Bosselut,
Imanol Schlag
arXiv, 2025
arXiv
We introduce the concept of Data Compliance Gap (DCG), to quantify the performance difference due to ethical compliance
We demonstrate that DCG is close to 0% for general knowledge acquisition, however, compliance gap exists in knowledge of veracity and in structural formats.
A noticeable DCG also exists for non-compliant medical domain data.
|
|
From Fairness to Truthfulness: Rethinking Data Valuation Design
Dongyang Fan,
Tyler J. Rotello,
Sai Praneeth Karimireddy
ICLR Workshop Data Problems, 2025
arXiv
Data are important for LLMs and they are of different intrinsic costs
Applying popular data valuation methods as pricing rules may encourage data sellers to misreport their true costs
We revisit classical truthful methods from the game theory literature
|
|
On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
Dongyang Fan*,
Bettina Messmer*,
Nikita Doikov,
Martin Jaggi
arXiv, 2024
codes
/
arXiv
We propose a novel collaborative language modeling framework -- CoMiGS
Our bi-level reformulation of MoE objectives efficiently achieves token-level personalization
Our method enjoys convergence guarantee and demonstrates strong empirical performances
|
|
Towards an empirical understanding of MoE design choices
Dongyang Fan*,
Bettina Messmer*,
Martin Jaggi
ICLR ME-FoMo Workshop, 2024
arXiv
We ablate the design choices of Mixture-of-Experts models
Preferred design choices differ when routing at a token or sequence level
Weak topic specialization can stem from sequence-level routing
|
|
Personalized Collaborative Fine-Tuning for On-Device Large Language Models
Nicolas Wagner,
Dongyang Fan,
Martin Jaggi
Conference on Language Modeling, 2024
codes
/
arXiv
Logits are more effective in collaborator selection than gradients in language modeling
A better understanding of statistical heterogeneity in language data is needed.
|
|
Ghost Noise for Regularizing Deep Neural Networks
Atli Kosson,
Dongyang Fan,
Martin Jaggi
AAAI, 2024
arXiv
We propose a novel regularizer--ghost noise, that can improve the generalization of DNNs
Ghost noise can be applied to noise-free layer-normalized networks.
|
|
Collaborative Learning via Prediction Consensus
Dongyang Fan,
Celestine Mendler-Dünner,
Martin Jaggi
NeurIPS, 2023
codes
/
arXiv
/
poster
We propose a novel co-distillation method based on consensus reaching
Models iteratively determining labels on the target data can be modeled by a Markov Process
|
Miscellaneous
In general I like arts and cultural stuff. I am also an outdoorsy person and I do hiking skiing and sailing.
I paint from my hiking trips. For example...
|
Source codes of the website are from here.
|
|