In this paper, we introduce a novel framework following an upstream-downstream paradigm to construct user and item (Pin) embeddings from diverse data sources, which are essential for Pinterest to deliver personalized Pins and ads effectively. Our upstream models are trained on extensive data sources featuring varied signals, utilizing complex architectures to capture intricate relationships between users and Pins on Pinterest. To ensure scalability of the upstream models, entity embeddings are learned, and regularly refreshed, rather than real-time computation, allowing for asynchronous interaction between the upstream and downstream models. These embeddings are then integrated as input features in numerous downstream tasks, including ad retrieval and ranking models for CTR and CVR predictions. We demonstrate that our framework achieves notable performance improvements in both offline and online settings across various downstream tasks. This framework has been deployed in Pinterest’s production ad ranking systems, resulting in significant gains in online metrics.
@inproceedings{liu2025decoupled,title={Decoupled Entity Representation Learning for Pinterest Ads Ranking},author={Liu, Jie and Li, Yinrui and Sun, Jiankai and Li, Kungang and Sun, Han and Wang, Sihan and Wu, Huasen and Gao, Siyuan and Soares, Paulo and Li, Nan and Liu, Zhifang and Li, Haoyang and Ji, Siping and Leng, Ling and Deshikachar, Prathibha},booktitle={ACM Conference on Recommender Systems (RecSys)},year={2025},}
2023
CIKM
Optimizing for Member Value in an Edge Building Marketplace
Ayan Acharya, Siyuan Gao, Ankan Saha, and 6 more authors
In ACM International Conference on Information and Knowledge Management (CIKM), 2023
Social networks are prosperous marketplaces where creators and consumers congregate to share and consume various content. In general, products that rank content for distribution (such as newsfeeds, stories, and notifications) and are related to edge recommendations (such as connect to members, follow celebrities or groups or hashtags) optimize the experience of active users. Typically, such users generate ample interaction data amenable to accurate model training and prediction. In contrast, we prioritize enhancing the experience of inactive members (IMs) who do not have a rich connection network. We formulate strategies for recommending superior edges to help members grow their connection network. Adapting the recommendations provides enormous value to the IMs and can significantly influence their future behaviour and engagement with the ecosystem. To that end, we propose a general and scalable multi-objective optimization (MOO) framework to provide more value to IMs as invitation recipients on LinkedIn, a professional network with over 900M members. To deal with the enormous scale, we formulate the problem as a massive constrained linear optimization involving billions of variables and millions of constraints and efficiently solve it using accelerated gradient descent, making this the largest deployment of LP-based recommender systems worldwide. Furthermore, the proposed MOO paradigm can solve the general problem of matching different types of entities in an m-sided marketplace. Finally, we discuss the challenges and benefits of implementing and ramping our method in production at scale at LinkedIn and report our findings about the core business metrics related to users’ engagement and network health.
@inproceedings{acharya2023optimizing,title={Optimizing for Member Value in an Edge Building Marketplace},author={Acharya, Ayan and Gao, Siyuan and Saha, Ankan and Ocejo, Borja and Basu, Kinjal and Selvaraj, Keerthi and Mazumder, Rahul and Gupta, Aman and Agrawal, Parag},booktitle={ACM International Conference on Information and Knowledge Management (CIKM)},year={2023},}
2021
NeuroImage
Smooth graph learning for functional connectivity estimation
Functional connectivity (FC) estimated from functional magnetic resonance imaging (fMRI) signals is important in understanding neural representation and information processing in cortical networks. However, due to a lack of "ground truth" FC pattern, the reliability and robustness of FC estimates are usually examined in downstream FC analysis tasks, such as performing participant’s identification (also known as "fingerprinting"). In this paper, we propose to learn FC via a smooth graph learning framework. In particular, we treat each time frame of the fMRI time series as a graph signal on an underlying functional brain graph, and estimate the smooth graph functional connectivity (SGFC) by learning the weighted graph adjacency matrix based on graph signal smoothness assumption. We demonstrate that our approach gives rise to a natural and sparse graph representation of FC from which reliable graph measures can be extracted. Reliability of SGFC is evaluated in the context of fingerprinting and compared to correlation FC (CFC). SGFC achieves higher fingerprinting accuracy across several different experiment settings; the improvement is even more significant when a shorter fMRI scanning length is used for FC estimation. In addition to being reliable, we also validate the cognitive relevance of SGFC by using it to predict fluid intelligence. Finally, in evaluating topological measures of the sparse graph, SGFC reveals a more small-world and modular structure compared to CFC. Together, our results suggest that the smooth graph learning framework produces a naturally sparse, reliable, and cognitive-relevant representation of functional connectivity.
@article{gao2021smooth,title={Smooth graph learning for functional connectivity estimation},author={Gao, Siyuan and Xia, Xinyue and Scheinost, Dustin and Mishne, Gal},journal={NeuroImage},year={2021},}
HBM
Non-linear manifold learning in fMRI uncovers a low-dimensional space of brain dynamics
Large-scale brain dynamics are believed to lie in a latent, low-dimensional space. Typically, the embeddings of brain scans are derived independently from different cognitive tasks or resting-state data, ignoring a potentially large—and shared—portion of this space. Here, we establish that a shared, robust, and interpretable low-dimensional space of brain dynamics can be recovered from a rich repertoire of task-based functional magnetic resonance imaging (fMRI) data. This occurs when relying on nonlinear approaches as opposed to traditional linear methods. The embedding maintains proper temporal progression of the tasks, revealing brain states and the dynamics of network integration. We demonstrate that resting-state data embeds fully onto the same task embedding, indicating similar brain states are present in both task and resting-state data. Our findings suggest analysis of fMRI data from multiple cognitive tasks in a low-dimensional space is possible and desirable.
@article{gao2021nonlinear,title={Non-linear manifold learning in fMRI uncovers a low-dimensional space of brain dynamics},author={Gao, Siyuan and Mishne, Gal and Scheinost, Dustin},journal={Human Brain Mapping},year={2021},}
Brain Behav.
Using functional connectivity models to characterize relationships between working and episodic memory
@article{stark2021using,title={Using functional connectivity models to characterize relationships between working and episodic memory},author={Stark, Gigi F. and Avery, Emily W. and Rosenberg, Monica D. and Greene, Abigail S. and Gao, Siyuan and Scheinost, Dustin and Constable, R. Todd and Chun, Marvin M. and Yoo, Kwangsun},journal={Brain and Behavior},year={2021},}
Cereb. Cortex
Transdiagnostic, Connectome-Based Prediction of Memory Constructs Across Psychiatric Disorders
Daniel S. Barron, Siyuan Gao, Javid Dadashkarimi, and 7 more authors
@article{barron2021transdiagnostic,title={Transdiagnostic, Connectome-Based Prediction of Memory Constructs Across Psychiatric Disorders},author={Barron, Daniel S. and Gao, Siyuan and Dadashkarimi, Javid and Greene, Abigail S. and Spann, Marisa N. and Noble, Stephanie and Lake, Evelyn M. R. and Krystal, John H. and Constable, R. Todd and Scheinost, Dustin},journal={Cerebral Cortex},year={2021},}
Nat. Hum. Behav.
A hitchhiker’s guide to working with large, open-source neuroimaging datasets
Large datasets that enable researchers to perform investigations with unprecedented rigor are growing increasingly common in neuroimaging. Due to the simultaneous increasing popularity of open science, these state-of-the-art datasets are more accessible than ever to researchers around the world. While analysis of these samples has pushed the field forward, they pose a new set of challenges that might cause difficulties for novice users. Here we offer practical tips for working with large datasets from the end-user’s perspective. We cover all aspects of the data lifecycle: from what to consider when downloading and storing the data to tips on how to become acquainted with a dataset one did not collect and what to share when communicating results. This manuscript serves as a practical guide one can use when working with large neuroimaging datasets, thus dissolving barriers to scientific discovery.
@article{horien2021hitchhiker,title={A hitchhiker's guide to working with large, open-source neuroimaging datasets},author={Horien, Corey and Noble, Stephanie and Greene, Abigail S. and Lee, Kangjoo and Barron, Daniel S. and Gao, Siyuan and O'Connor, David and Salehi, Mehraveh and Dadashkarimi, Javid and Shen, Xilin and Lake, Evelyn M. R. and Constable, R. Todd and Scheinost, Dustin},journal={Nature Human Behaviour},year={2021},}
2020
MICCAI
Poincaré embedding reveals edge-based functional networks of the brain
@inproceedings{gao2020poincare,title={Poincar{\'e} embedding reveals edge-based functional networks of the brain},author={Gao, Siyuan and Mishne, Gal and Scheinost, Dustin},booktitle={International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)},year={2020},}
AISTATS
Inference of Dynamic Graph Changes for Functional Connectome
Dingjue Ji, Junwei Lu, Yiliang Zhang, and 2 more authors
In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
@inproceedings{ji2020inference,title={Inference of Dynamic Graph Changes for Functional Connectome},author={Ji, Dingjue and Lu, Junwei and Zhang, Yiliang and Gao, Siyuan and Zhao, Hongyu},booktitle={International Conference on Artificial Intelligence and Statistics (AISTATS)},year={2020},}
Cell Rep.
How tasks change whole-brain functional organization to reveal brain-phenotype relationships
@article{greene2020tasks,title={How tasks change whole-brain functional organization to reveal brain-phenotype relationships},author={Greene, Abigail S. and Gao, Siyuan and Noble, Stephanie and Scheinost, Dustin and Constable, R. Todd},journal={Cell Reports},year={2020},}
JoCN
Distributed patterns of functional connectivity predict working memory performance in novel healthy and memory-impaired individuals
@article{avery2020distributed,title={Distributed patterns of functional connectivity predict working memory performance in novel healthy and memory-impaired individuals},author={Avery, Emily W. and Yoo, Kwangsun and Rosenberg, Monica D. and Greene, Abigail S. and Gao, Siyuan and Na, Duk L. and Scheinost, Dustin and Constable, R. Todd and Chun, Marvin M.},journal={Journal of Cognitive Neuroscience},year={2020},}
2019
CNI
A Mass Multivariate, Edge-wise Approach for Combining Multiple Connectomes to Improve the Detection of Group Differences
Javid Dadashkarimi, Siyuan Gao, Erin Yeagle, and 2 more authors
In 3rd Workshop on Connectomics in NeuroImaging (CNI), 2019
@inproceedings{dadashkarimi2019mass,title={A Mass Multivariate, Edge-wise Approach for Combining Multiple Connectomes to Improve the Detection of Group Differences},author={Dadashkarimi, Javid and Gao, Siyuan and Yeagle, Erin and Noble, Stephanie and Scheinost, Dustin},booktitle={3rd Workshop on Connectomics in NeuroImaging (CNI)},year={2019},}
MICCAI
Combining Multiple Behavioral Measures and Multiple Connectomes via Multiway Canonical Correlation Analysis
Siyuan Gao, Xilin Shen, Todd Constable, and 1 more author
In International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2019
@inproceedings{gao2019combining_mcca,title={Combining Multiple Behavioral Measures and Multiple Connectomes via Multiway Canonical Correlation Analysis},author={Gao, Siyuan and Shen, Xilin and Constable, Todd and Scheinost, Dustin},booktitle={International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)},year={2019},}
IPMI
A hierarchical manifold learning framework for high-dimensional brain imaging data
@inproceedings{gao2019hierarchical,title={A hierarchical manifold learning framework for high-dimensional brain imaging data},author={Gao, Siyuan and Mishne, Gal and Scheinost, Dustin},booktitle={International Conference on Information Processing in Medical Imaging (IPMI)},year={2019},}
NeuroImage
Combining Multiple Connectomes Improves Predictive Modeling of Phenotypic Measures
Resting-state and task-based functional connectivity matrices, or connectomes, are powerful predictors of individual differences in phenotypic measures. However, most of the current state-of-the-art algorithms only build predictive models based on a single connectome for each individual. This approach neglects the complementary information contained in connectomes from different sources and reduces prediction performance. In order to combine different task connectomes into a single predictive model in a principled way, we propose a novel prediction framework, termed multidimensional connectome-based predictive modeling. Two specific algorithms are developed and implemented under this framework. Using two large open-source datasets with multiple tasks—the Human Connectome Project and the Philadelphia Neurodevelopmental Cohort, we validate and compare our framework against performing connectome-based predictive modeling (CPM) on each task connectome independently, CPM on a general functional connectivity matrix created by averaging together all task connectomes for an individual, and CPM with a naive extension to multiple connectomes where each edge for each task is selected independently. Our framework exhibits superior performance in prediction compared with the other competing methods. We found that different tasks contribute differentially to the final predictive model, suggesting that the battery of tasks used in prediction is an important consideration.
@article{gao2019combining,title={Combining Multiple Connectomes Improves Predictive Modeling of Phenotypic Measures},author={Gao, Siyuan and Greene, Abigail and Constable, R. Todd and Scheinost, Dustin},journal={NeuroImage},year={2019},}
NeuroImage
Ten Simple Rules for Predictive Modeling of Individual Differences in Neuroimaging
@article{scheinost2019ten,title={Ten Simple Rules for Predictive Modeling of Individual Differences in Neuroimaging},author={Scheinost, Dustin and Noble, Stephanie and Horien, Corey and Greene, Abigail S. and Lake, Evelyn and Salehi, Mehraveh and Gao, Siyuan and Shen, Xilin and O'Connor, David and Barron, Daniel S. and Yip, Sarah W. and Rosenberg, Monica D. and Constable, R. Todd},journal={NeuroImage},year={2019},}
2018
CCN
Hierarchical nonlinear embedding reveals brain states and performance differences during working memory tasks
@inproceedings{gao2018hierarchical_ccn,title={Hierarchical nonlinear embedding reveals brain states and performance differences during working memory tasks},author={Gao, Siyuan and Mishne, Gal and Scheinost, Dustin},booktitle={Conference on Cognitive Computational Neuroscience (CCN)},year={2018},}
MICCAI
Combining Multiple Connectomes via Canonical Correlation Analysis Improves Predictive Models
@inproceedings{gao2018combining,title={Combining Multiple Connectomes via Canonical Correlation Analysis Improves Predictive Models},author={Gao, Siyuan and Greene, Abigail and Constable, R. Todd and Scheinost, Dustin},booktitle={International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)},year={2018},}
OHBM
Task-induced brain state manipulation improves prediction of individual traits
@inproceedings{greene2018ohbm,title={Task-induced brain state manipulation improves prediction of individual traits},author={Greene, Abigail and Gao, Siyuan and Constable, R. Todd and Scheinost, Dustin},booktitle={Organization for Human Brain Mapping (OHBM) Annual Meeting, Singapore (poster)},year={2018}}
ISBI
Task Integration For Connectome-based Prediction Via Canonical Correlation Analysis
@inproceedings{gao2018task,title={Task Integration For Connectome-based Prediction Via Canonical Correlation Analysis},author={Gao, Siyuan and Greene, Abigail and Constable, R. Todd and Scheinost, Dustin},booktitle={IEEE International Symposium on Biomedical Imaging (ISBI)},year={2018},}
Nat. Commun.
Task-induced brain state manipulation improves prediction of individual traits
Recent work has begun to relate individual differences in brain functional organization to human behaviors and cognition, but the best brain state to reveal such relationships remains an open question. In two large, independent data sets, we here show that cognitive tasks amplify trait-relevant individual differences in patterns of functional connectivity, such that predictive models built from task fMRI data outperform models built from resting-state fMRI data. Further, certain tasks consistently yield better predictions of fluid intelligence than others, and the task that generates the best-performing models varies by sex. By considering task-induced brain state and sex, the best-performing model explains over 20% of the variance in fluid intelligence scores, as compared to <6% of variance explained by rest-based models. This suggests that identifying and inducing the right brain state in a given group can better reveal brain-behavior relationships, motivating a paradigm shift from rest- to task-based functional connectivity analyses.
@article{greene2018task,title={Task-induced brain state manipulation improves prediction of individual traits},author={Greene, Abigail and Gao, Siyuan and Constable, R. Todd and Scheinost, Dustin},journal={Nature Communications},year={2018},}
2017
SfN
Brain state perturbation improves connectome-based predictive modeling of related behaviors
@inproceedings{greene2017brain,title={Brain state perturbation improves connectome-based predictive modeling of related behaviors},author={Greene, Abigail and Gao, Siyuan and Constable, R. Todd and Scheinost, Dustin},booktitle={Society for Neuroscience (SfN)},year={2017}}
Flux
Connectome-based predictive modeling: the impact of brain state and sex in a developmental cohort
@inproceedings{greene2017connectome,title={Connectome-based predictive modeling: the impact of brain state and sex in a developmental cohort},author={Greene, Abigail and Gao, Siyuan and Constable, R. Todd and Scheinost, Dustin},booktitle={Flux Congress},year={2017}}
IEEE TVCG
RCLens: Interactive Rare Category Exploration and Identification
Hanfei Lin, Siyuan Gao, David Gotz, and 3 more authors
IEEE Transactions on Visualization and Computer Graphics, 2017
@article{lin2017rclens,title={RCLens: Interactive Rare Category Exploration and Identification},author={Lin, Hanfei and Gao, Siyuan and Gotz, David and Du, Fan and He, Jingrui and Cao, Nan},journal={IEEE Transactions on Visualization and Computer Graphics},year={2017},}
IEEE T-ITS
Adaptively Exploring Population Mobility Patterns in Flow Visualization
Fei Wang, Wei Chen, Ye Zhao, and 3 more authors
IEEE Transactions on Intelligent Transportation Systems, 2017
@article{wang2017adaptively,title={Adaptively Exploring Population Mobility Patterns in Flow Visualization},author={Wang, Fei and Chen, Wei and Zhao, Ye and Gu, Tianyu and Gao, Siyuan and Bao, Hujun},journal={IEEE Transactions on Intelligent Transportation Systems},year={2017},}