Dongyang Fan

I'm a PhD student at Machine Learning and Optimization Lab at EPFL, supervised by Prof. Martin Jaggi.

My research interests are:

  • Modular and Collaborative Machine Learning: Mixture of Experts, co-distillation, collaborator selection.
  • Data Valuation and Data Markets: fairness and truthfulness of data valuation methods.
  • Post-training of LLMs: reasoning, alignment and steering.
I am also happy to branch out my research. If you want to reach out, do not hesitate to drop me an email!

Email  /  CV  /  Google Scholar  /  Twitter  /  Github  /  LinkedIn

profile photo

Research

On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
Dongyang Fan*, Bettina Messmer*, Nikita Doikov, Martin Jaggi
arXiv, 2024
codes / arXiv

  • We propose a novel collaborative language modeling framework -- CoMiGS
  • Our bi-level reformulation of MoE objectives efficiently achieves token-level personalization
  • Our method enjoys convergence guarantee and demonstrates strong empirical performances
  • Towards an empirical understanding of MoE design choices
    Dongyang Fan*, Bettina Messmer*, Martin Jaggi
    ICLR ME-FoMo Workshop, 2024
    arXiv

  • We ablate the design choices of Mixture-of-Experts models
  • Preferred design choices differ when routing at a token or sequence level
  • Weak topic specialization can stem from sequence-level routing
  • Personalized Collaborative Fine-Tuning for On-Device Large Language Models
    Nicolas Wagner, Dongyang Fan, Martin Jaggi
    COLM, 2024
    codes / arXiv

  • Logits are more effective in collaborator selection than gradients in language modeling
  • A better understanding of statistical heterogeneity in language data is needed.
  • Ghost Noise for Regularizing Deep Neural Networks
    Atli Kosson, Dongyang Fan, Martin Jaggi
    AAAI, 2024
    arXiv

  • We propose a novel regularizer--ghost noise, that can improve the generalization of DNNs
  • Ghost noise can be applied to noise-free layer-normalized networks.
  • Collaborative Learning via Prediction Consensus
    Dongyang Fan, Celestine Mendler-Dünner, Martin Jaggi
    Neurips, 2023
    codes / arXiv / poster

  • We propose a novel co-distillation method based on consensus reaching
  • Models iteratively determining labels on the target data can be modeled by a Markov Process
  • Miscellaneous

    In general I like arts and cultural stuff. I am also an outdoorsy person and I do hiking skiing and sailing.

    I paint from my hiking trips. For example...

    Figure 1
    Figure 2

    Source codes of the website are from here.