Reproducibility Study Of Learning Fair Graph Representations Via Automated Data Augmentations
Project Overview
This research undertakes a comprehensive reproducibility analysis of "Learning Fair Graph Representations Via Automated Data Augmentations" by Ling et al. (2022). We assess the validity of the original claims focused on node classification tasks and explore the performance of the Graphair framework in link prediction tasks across various real-world datasets.
Key Features
- Replication of original experiments to assess reproducibility of three main claims
- Extension of Graphair framework from node classification to link prediction tasks
- Cross-dataset evaluation with NBA, Pokec-n, Pokec-z, Citeseer, Cora, and PubMed datasets
- Implementation of dyadic-level fairness metrics for link prediction
- Comparative analysis with baseline models (FairAdj, FairDrop)
- Ablation studies on model components and hyperparameter sensitivity
Technical Implementation
Implemented using the DIG library's Graphair module with modifications for link prediction tasks. Employed adversarial training for fairness, contrastive learning for informativeness, and reconstruction regularization. Adapted the framework to handle dyadic fairness metrics by computing Hadamard products of node embeddings and implementing subgroup/mixed dyadic group classifications. Conducted extensive grid search hyperparameter tuning and utilized high-performance computing with NVIDIA A100 GPUs.
Key Findings & Impact
Successfully reproduced two of three original claims and partially reproduced the third, with discrepancies attributed to differences in experimental setup and training epochs. Extended Graphair to link prediction, demonstrating superior trade-off for subgroup dyadic-level fairness compared to baseline models. The study validates Graphair's adaptability across different downstream tasks and provides insights into the challenges of reproducing graph representation learning research.
Publication
Co-authored with Thijmen Nijdam, Juell Sprott, and Jurgen de Heus from the University of Amsterdam. Code and data publicly available at: https://github.com/juellsprott/graphair-reproducibility