Dongyang Fan
I'm a PhD student at Machine Learning and Optimization Lab at EPFL, supervised by Prof. Martin Jaggi.
My research interests are:
- Modular and Collaborative Machine Learning: Mixture of Experts, co-distillation, collaborator selection.
- Data Valuation and Data Markets: fairness and truthfulness of data valuation methods.
- Post-training of LLMs: reasoning, alignment and steering.
I am also happy to branch out my research. If you want to reach out, do not hesitate to drop me an email!
Email /
CV /
Google Scholar /
Twitter /
Github /
LinkedIn
|
|
|
On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
Dongyang Fan*,
Bettina Messmer*,
Nikita Doikov,
Martin Jaggi
arXiv, 2024
codes
/
arXiv
We propose a novel collaborative language modeling framework -- CoMiGS
Our bi-level reformulation of MoE objectives efficiently achieves token-level personalization
Our method enjoys convergence guarantee and demonstrates strong empirical performances
|
|
Towards an empirical understanding of MoE design choices
Dongyang Fan*,
Bettina Messmer*,
Martin Jaggi
ICLR ME-FoMo Workshop, 2024
arXiv
We ablate the design choices of Mixture-of-Experts models
Preferred design choices differ when routing at a token or sequence level
Weak topic specialization can stem from sequence-level routing
|
|
Personalized Collaborative Fine-Tuning for On-Device Large Language Models
Nicolas Wagner,
Dongyang Fan,
Martin Jaggi
COLM, 2024
codes
/
arXiv
Logits are more effective in collaborator selection than gradients in language modeling
A better understanding of statistical heterogeneity in language data is needed.
|
|
Ghost Noise for Regularizing Deep Neural Networks
Atli Kosson,
Dongyang Fan,
Martin Jaggi
AAAI, 2024
arXiv
We propose a novel regularizer--ghost noise, that can improve the generalization of DNNs
Ghost noise can be applied to noise-free layer-normalized networks.
|
|
Collaborative Learning via Prediction Consensus
Dongyang Fan,
Celestine Mendler-Dünner,
Martin Jaggi
Neurips, 2023
codes
/
arXiv
/
poster
We propose a novel co-distillation method based on consensus reaching
Models iteratively determining labels on the target data can be modeled by a Markov Process
|
Miscellaneous
In general I like arts and cultural stuff. I am also an outdoorsy person and I do hiking skiing and sailing.
I paint from my hiking trips. For example...
|
Source codes of the website are from here.
|
|