Dongyang Fan

I'm a PhD student at Machine Learning and Optimization Lab at EPFL, supervised by Prof. Martin Jaggi.

My research interests are:

  • Modular and Collaborative Machine Learning: Mixture of Experts, co-distillation, collaborator selection.
  • Data Valuation and Data Markets: quantification of the impact of pre-training data on downstream performance and the fair compensation of content generators.
  • Post-training of LLMs: reasoning, alignment and steering.
I am also happy to branch out my research. If you want to reach out, do not hesitate to drop me an email!

Email  /  CV  /  Google Scholar  /  Twitter  /  Github  /  LinkedIn

profile photo

Research

Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
Dongyang Fan, Vinko Sabolčec, Matin Ansaripour, Ayush Kumar Tarun, Martin Jaggi, Antoine Bosselut, Imanol Schlag
arXiv, 2025
arXiv

  • We introduce the concept of Data Compliance Gap (DCG), to quantify the performance difference due to ethical compliance
  • We demonstrate that DCG is close to 0% for general knowledge acquisition, however, compliance gap exists in knowledge of veracity and in structural formats.
  • A noticeable DCG also exists for non-compliant medical domain data.
  • From Fairness to Truthfulness: Rethinking Data Valuation Design
    Dongyang Fan, Tyler J. Rotello, Sai Praneeth Karimireddy
    ICLR Workshop Data Problems, 2025
    arXiv

  • Data are important for LLMs and they are of different intrinsic costs
  • Applying popular data valuation methods as pricing rules may encourage data sellers to misreport their true costs
  • We revisit classical truthful methods from the game theory literature
  • On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists
    Dongyang Fan*, Bettina Messmer*, Nikita Doikov, Martin Jaggi
    arXiv, 2024
    codes / arXiv

  • We propose a novel collaborative language modeling framework -- CoMiGS
  • Our bi-level reformulation of MoE objectives efficiently achieves token-level personalization
  • Our method enjoys convergence guarantee and demonstrates strong empirical performances
  • Towards an empirical understanding of MoE design choices
    Dongyang Fan*, Bettina Messmer*, Martin Jaggi
    ICLR ME-FoMo Workshop, 2024
    arXiv

  • We ablate the design choices of Mixture-of-Experts models
  • Preferred design choices differ when routing at a token or sequence level
  • Weak topic specialization can stem from sequence-level routing
  • Personalized Collaborative Fine-Tuning for On-Device Large Language Models
    Nicolas Wagner, Dongyang Fan, Martin Jaggi
    Conference on Language Modeling, 2024
    codes / arXiv

  • Logits are more effective in collaborator selection than gradients in language modeling
  • A better understanding of statistical heterogeneity in language data is needed.
  • Ghost Noise for Regularizing Deep Neural Networks
    Atli Kosson, Dongyang Fan, Martin Jaggi
    AAAI, 2024
    arXiv

  • We propose a novel regularizer--ghost noise, that can improve the generalization of DNNs
  • Ghost noise can be applied to noise-free layer-normalized networks.
  • Collaborative Learning via Prediction Consensus
    Dongyang Fan, Celestine Mendler-Dünner, Martin Jaggi
    NeurIPS, 2023
    codes / arXiv / poster

  • We propose a novel co-distillation method based on consensus reaching
  • Models iteratively determining labels on the target data can be modeled by a Markov Process
  • Miscellaneous

    In general I like arts and cultural stuff. I am also an outdoorsy person and I do hiking skiing and sailing.

    I paint from my hiking trips. For example...

    Figure 1
    Figure 2

    Source codes of the website are from here.