ICPP 2022 Accepted Papers
The following is a list of the contributed papers accepted for presentation as part of the main conference program to be held virtually.
Updated June 29, 2022SHE: A Generic Framework for Data Stream Mining over Sliding Windows | Yuhan Wu, Zhuochen Fan, Qilong Shi, Yixin Zhang, Tong Yang, Cheng Chen, Zheng Zhong, Junnan Li, Ariel Shtul and Yaofeng Tu |
SMEGA2: Distributed Asynchronous Deep Neural Network Training With a Single Momentum Buffer | Refael Cohen, Ido Hakimi and Assaf Schuster |
EmbRace: Accelerating Communication for Efficient Training of Sparse Deep Neural Networks | Shengwei Li, Zhiquan Lai, Dongsheng Li, Yiming Zhang, Xiangyu Ye and Yabo Duan |
ADSTS: Automatic Distributed Storage Tuning System Using Deep Reinforcement Learning | Kai Lu, Jiguang Wan, Guokuan Li, Ruixiang Ma and Wei Zhao |
Aperiodic Local SGD: Beyond Local SGD | Hao Zhang, Tingting Wu, Siyao Cheng and Jie Liu |
Exploiting Parallelism of Disk Failure Recovery via Partial Stripe Repair for an Erasure-Coded High-Density Storage Server | Lin Wang, Yuchong Hu, Qian Du, Dan Feng, Ray Wu, Ingo He and Kevin Zhang |
Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline Training | Shengyuan Ye, Liekang Zeng, Qiong Wu, Ke Luo, Qingze Fang and Xu Chen |
ParallelDualSPHysics: Supporting Efficient Parallel Fluid Simulations through MPI-enabled SPH method | Sifan Long, Xiao-Wei Guo, Chao Li, Xiaokang Fan, Kelvin Wong, Ran Zhao, Yi Liu, Sen Zhang and Canqun Yang |
UA-Sketch: An Accurate Approach to Detect Heavy Flow based on Uninterrupted Arrival | Jin Ye, Lin Li, Wenlu Zhang, GuiHao Chen, Yuanchao Shan, Yijun Li, Weihe Li and Jiawei Huang |
DeepCAT: A Cost-Efficient Online Configuration Auto-Tuning Approach for Big Data Frameworks | Hui Dou, Yilun Wang, Yiwen Zhang and Pengfei Chen |
Regularizing Sparse and Imbalanced Communications for Voxel-based Brain Simulations on Supercomputers | Yuhao Liu, Xin Du, Zhihui Lu, Qiang Duan, Jianfeng Feng, Minglong Wang and Jie Wu |
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks | Zining Zhang, Bingsheng He and Zhenjie Zhang |
BULB: Lightweight and Automated Load Balancing for Fast Datacenter Networks | Yuan Liu, Wenxin Li, Wenyu Qu and Heng Qi |
Characterizing and Optimizing Transformer Inference on ARM Many-core Processor | Jiazhi Jiang, Jiangsu Du and Dan Huang |
DC4: Reconstructing Data-Credit-Coupled Congestion Control for Data Centers | Shan Huang, Dezun Dong, Lingbin Zeng, Zejia Zhou, Yukun Zhou and Xiangke Liao |
HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep Learning | Yijun Li, Jiawei Huang, Zhaoyi Li, Shengwen Zhou, Wanchun Jiang and Jianxin Wang |
Learning Mean-Field Control for Delayed Information Load Balancing in Large Queuing Systems | Anam Tahir, Kai Cui and Heinz Koeppl |
Characterizing Job Microarchitectural Profiles at Scale: Dataset and Analysis | Kangjin Wang, Cheng Wang, Tong Jia, Kingsum Chow, Yang Wen, Yaoyong Dou, Guoyao Xu, Chuanjia Hou, Jie Yao, Liping Zhang and Ying Li |
Adaptive and Efficient GPU Time Sharing for Hyperparameter Tuning in Cloud | Liu Liu, Zhijun Ding and Jian Yu |
TileSpMSpV : A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs | Haonan Ji, Huimin Song, Shibo Lu, Zhou Jin, Guangming Tan and Weifeng Liu |
Boosting Cross-rack Multi-Stripe Repair in Heterogeneous Erasure-Coded Clusters | Hai Zhou and Dan Feng |
Spread: Decentralized Model Aggregation for Scalable Federated Learning | Chuang Hu, Huanghuang Liang, Xiaoming Han, Boan Liu, Dazhao Cheng and Dan Wang |
Automatically Generating High-performance Matrix Multiplication Kernels on the Latest Sunway Processor | Xiaohan Tao, Yu Zhu, Boyang Wang, Jinlong Xu, Jianmin Pang and Jie Zhao |
IATF: An Input-Aware Tuning Framework for Compact BLAS Based on ARMv8 CPUs | Cunyang Wei, Haipeng Jia, Yunquan Zhang, Liusha Xu and Ji Qi |
BWA-MEM-SCALE: Accelerating Genome Sequence Mapping on Commodity Servers | Changdae Kim, Kwangwon Koh, Taehoon Kim, Daegyu Han and Jiwon Seo |
Mlog: Multi-log Write Buffer upon Ultra-fast SSD RAID | Shucheng Wang, Qiang Cao, Ziyi Lu and Jie Yao |
Highly Parallel Linear Forest Extraction from a Weighted Graph on GPUs | Christoph Klein and Robert Strzodka |
Scheduling fork-join task graphs with communication delay and equal processing time | huijun wang and Oliver Sinnen |
EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers | Lijuan Jiang |
Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming Distance | Shuang Ma, Si Wu, Cheng Li and Yinlong Xu |
GraphSD: A State and Dependency aware Out-of-Core Graph Processing System | Xianghao Xu, Hong Jiang, Fang Wang, Yongli Cheng and Peng Fang |
FAIR-BFL: Flexible and Incentive Redesign for Blockchain-based Federated Learning | Rongxin Xu, Shiva Pokhrel, Qiujun Lan and Gang Li |
Acuerdo: Fast Atomic Broadcast over RDMA | Joseph Izraelevitz, Gaukas Wang, Rhett Hanscom, Kayli Silvers, Tamara Silbergleit Lehman, Gregory Chockler and Alexey Gotsman |
Vectorizing SpMV by Exploiting Dynamic Regular Patterns | Xin You, Changxi Liu, Hailong Yang, Pengbo Wang, Zhongzhi Luan and Depei Qian |
ROWE-tree: A Read-Optimized and Write-Efficient B+-tree for Persistent Memory | Xiaomin Zou, Fang Wang, Tianjin Guan, Dan Feng and Nan Su |
SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems | Qinzhe Wu, Ashen Ekanayake, Ruihao Li, Jonathan Beard and Lizy John |
Transparent load balancing of MPI programs using OmpSs-2@cluster and DLB | Jimmy Aguilar Mena, Omar Ibrahim, Victor Lopez, Marta Garcia, Paul Carpenter, Eduard Ayguade and Jesus Labarta |
ElastiSim: A Batch-System Simulator for Malleable Workloads | Taylan Özden, Tim Beringer, Arya Mazaheri, Hamid Mohammadi Fard and Felix Wolf |
Penelope: Peer-to-peer Power Management | Tapan Srivastava, Huazhe Zhang and Henry Hoffmann |
Parallel Algorithms for Masked Sparse Matrix-Matrix Products | Srđan Milaković, Oguz Selvitopi, Israt Nisa, Zoran Budimlić and Aydin Buluc |
Online Scheduling of Moldable Task Graphs under Common Speedup Models | Anne Benoit, Lucas Perotin, Yves Robert and Hongyang Sun |
Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly | Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine Yelick and Aydin Buluç |
NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database | Liang Liu, Mingzhu Shen, Ruihao Gong, Fengwei Yu and Hailong Yang |
TCB: Accelerating Transformer Inference Services with Request Concatenation | Boqian Fu, Fahao Chen, Peng Li and Deze Zeng |
Mentha: Enabling Sparse-Packing Computation on Systolic Arrays | Minjin Tang, Mei Wen, Yasong Cao, Junzhong Shen, Jianchao Yang, Jiawei Fei, Yang Guo and Sheng Liu |
Exploiting CXL-based Memory for Distributed Deep Learning | Moiz Arif, Kevin Assogba, M. Mustafa Rafique and Sudharshan Vazhkudai |
Postmortem Graph Analysis on the Temporal Graphs. | Md Maruf Hossain and Erik Saule |
Atos: A Task-Parallel GPU Scheduler for Graph Analytics | Yuxin Chen, Benjamin Brock, Serban Porumbescu, Aydin Buluc, Katherine Yelick and John Owens |
LDPP: A Learned Directory Placement Policy in Distributed File Systems | yuanzhang wang, fengkui yang, ji zhang, ke zhou, chunhua li, chong liu, zhuo cheng, wei fang and jinhu liu |
On the Parallelization of MCMC for Community Detection | Frank Wanye, Vitaliy Gleyzer, Edward Kao and Wu-chun Feng |
Semi-Online Multi-Machine with Restart Scheduling for Integrated Edge and Cloud Computing Systems | Liming Ge, Zizhao Wang, Wei Bao, Dong Yuan, Nguyen H. Tran, Bing B. Zhou and Albert Y. Zomaya |
Towards Fast Large-scale Graph Analysis via Two-dimensional Balanced Partitioning | Shuai Lin, Rui Wang, Yongkun Li, Yinlong Xu, John C.S. Lui, Fei Chen, Pengcheng Wang and Lei Han |
A Dynamic and Recoverable BMT scheme for Secure Non-Volatile Memory | Mengya Lei, Fang Wang, Dan Feng, Xiaoyu Shuai and Yuchao Cao |
An Online Learning Approach for Client Selection in Federated Edge Learning under Budget Constraint | Lina Su, Ruiting Zhou, Ne Wang, Guang Fang and Zongpeng Li |
Online Resource Optimization for Elastic Stream Processing with Regret Guarantee | Yang Liu, Huanle Xu and Wing Cheong Lau |
Themis: Fair Memory Subsystem Resource Sharing with Differentiated QoS in Public Clouds | Wenda Tang, Senbo Fu, Yutao Ke, Qian Peng and Feng Gao |
FedHiSyn: A Hierarchical Synchronous Federated Learning Framework for Resource and Data Heterogeneity | Guanghao Li, Yue Hu, Miao Zhang, Ji Liu, Quanjun Yin, Yong Peng and Dejing Dou |
Tensor-Accelerated Fourth-Order Epistasis Detection on GPUs | Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez and Leonel Sousa |
Accelerating Random Forest Classification on GPU and FPGA | Milan Shah, Reece Neff, Hancheng Wu, Marco Minutoli, Antonino Tumeo and Michela Becchi |
Simmer: Rate proportional scheduling to reduce packet drops in vGPU based NF chains | Avinash Kumar Chaurasia, Anshuj Garg, Bhaskaran Raman, Uday Kurkure, Hari Sivaraman, Lan Vu and Sairam Veeraswamy |
ParaGraph: An application-simulator interface and toolkit for hardware-software co-design | Mikhail Isaev, Nic McDonald, Jeff Young and Richard Vuduc |
Parallel Network Slicing for Multi-SP Services | Rongxin Han, Dezhi Chen, Song Guo, Xiaoyuan Fu, Jingyu Wang, Qi Qi and Jianxin Liao |
Enabling Latency-Sensitive DNN Inference via Joint Optimization of Model Surgery and Resource Allocation in Heterogeneous Edge | Zhaowu Huang, Fang Dong, Dian Shen, Huitian Wang, Xiaolin Guo and Shucun Fu |
FLOPs as a Discriminant for Dense Linear Algebra Algorithms | Francisco López Sánchez, Lars Karlsson and Paolo Bientinesi |
FedClassAvg: Local representation learning for personalized federated learning on heterogeneous neural networks | Jaehee Jang, Heoneok Ha, Dahuin Jung and Sungroh Yoon |
NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks | Haoyu Wang, Kevin Zheng, Charles Reiss and Haiying Shen |
Dynamic Strategies for High Performance Training of Knowledge Graph Embeddings | Anwesh Panda and Sathish Vadhiyar |
Counting Induced 6-Cycles in Bipartite Graphs | Jason Niu, Jaroslaw Zola and Ahmet Erdem Sarıyüce |
A Data-aware Learned Index scheme for Efficient Writes | Chunhua Li, Zhou Zhang, Yuhan Liu, Li Liu, Ke Zhou and Ji Zhang |
Formulating Interference-aware Data Delivery Strategies in Edge Storage Systems | Xiaoyu Xia, Feifei Chen, Qiang He, Guangming Cui, John Grundy, Mohamed Abdelrazek and Fang Dong |
Energy-efficient Edge Server Management for Edge Computing: A Game-theoretical Approach | Guangming Cui, Qiang He, Xiaoyu Xia, Feifei Chen and Yun Yang |
Efficient Phase-Functioned Real-time Character Control in Mobile Games: A TVM Enabled Approach | Haidong Lan, Wenxi Zhu, Qian Qiu, Dou Wu, Honglin Zhu, Jingjing Zhao, Xinghui Fu, Minwen Deng and Jintao Meng |
DRAM Cache Management with Request Granularity for NAND-based SSDs | Haodong Lin, Zhibing Sha, Jun Li, Zhigang Cai, Balazs Gerofi, Jianwei Liao and Yuanquan Shi |
MG-GCN: Scalable Multi-GPU Full Batch GCN Training Framework | Muhammed Fatih Balin, Kaan Sancak and Umit V. Catalyurek |
Cache-Poll: Containing Pollution in Non-Inclusive Caches Through Cache Partitioning | Lucia Pons, Julio Sahuquillo, Salvador Petit and Julio Pons |
Analyzing Performance and Power-Efficiency Variations among NVIDIA GPUs | Kohei Yoshida, Rio Sageyama, Shinobu Miwa, Hayato Yamaki and Hiroki Honda |
FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning | Nang Hung Nguyen, Phi Le Nguyen, Thuy Dung Nguyen, Trung Thanh Nguyen, Duc Long Nguyen, Thanh Hung Nguyen, Huy Hieu Pham and Thao Nguyen Truong |
DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training | Zhengbo Chen, Qi Yu, Fang Zheng, Feng Guo and Zuoning Chen |
Tesseract: Parallelize the Tensor Parallelism Efficiently | Boxiang Wang, Qifan Xu, Zhengda Bian and Yang You |
Micro-Benchmarking MPI Partitioned Point-to-Point Communication | Yiltan Hassan Temucin, Ryan Grant and Ahmad Afsahi |
Automatic Differentiation of Parallel Loops with Formal Methods | Jan Hückelheim and Laurent Hascoët |
Lobster: Load Balance-Aware I/O for Distributed DNN Training | Jie Liu, Bogdan Nicolae and Dong Li |
From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus | Dian-Lun Lin, Haoxing Ren, Yanqing Zhang, Brucek Khailany and Tsung-Wei Huang |
Accelerating Parallel First-Principles Excited-State calculation by Low-Rank Approximation with K-Means Clustering | Qingcai Jiang |
Multi Resource Scheduling with Task Cloning in Heterogeneous Clusters | Huanle Xu, Yang Liu and Wing Cheong Lau |
Performance Modeling for Short-Term Cache Allocation | Christopher Stewart, Nathaniel Morris, Lydia Chen and Robert Birke |
BSCache: A Brisk Semantic Caching Scheme for Cloud-based Performance Monitoring Timeseries Systems | Kai ZHANG, Zhiqi WANG and Zili SHAO |
A single-tree algorithm to compute Euclidean minimum spanning tree on GPU | Andrey Prokopenko, Piyush Sao and Damien Lebrun-Grandié |