ICPP 2022 Accepted Papers

The following is a list of the contributed papers accepted for presentation as part of the main conference program to be held virtually.

Updated June 29, 2022
SHE: A Generic Framework for Data Stream Mining over Sliding Windows Yuhan Wu, Zhuochen Fan, Qilong Shi, Yixin Zhang, Tong Yang, Cheng Chen, Zheng Zhong, Junnan Li, Ariel Shtul and Yaofeng Tu
SMEGA2: Distributed Asynchronous Deep Neural Network Training With a Single Momentum Buffer Refael Cohen, Ido Hakimi and Assaf Schuster
EmbRace: Accelerating Communication for Efficient Training of Sparse Deep Neural Networks Shengwei Li, Zhiquan Lai, Dongsheng Li, Yiming Zhang, Xiangyu Ye and Yabo Duan
ADSTS: Automatic Distributed Storage Tuning System Using Deep Reinforcement Learning Kai Lu, Jiguang Wan, Guokuan Li, Ruixiang Ma and Wei Zhao
Aperiodic Local SGD: Beyond Local SGD Hao Zhang, Tingting Wu, Siyao Cheng and Jie Liu
Exploiting Parallelism of Disk Failure Recovery via Partial Stripe Repair for an Erasure-Coded High-Density Storage Server Lin Wang, Yuchong Hu, Qian Du, Dan Feng, Ray Wu, Ingo He and Kevin Zhang
Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline Training Shengyuan Ye, Liekang Zeng, Qiong Wu, Ke Luo, Qingze Fang and Xu Chen
ParallelDualSPHysics: Supporting Efficient Parallel Fluid Simulations through MPI-enabled SPH method Sifan Long, Xiao-Wei Guo, Chao Li, Xiaokang Fan, Kelvin Wong, Ran Zhao, Yi Liu, Sen Zhang and Canqun Yang
UA-Sketch: An Accurate Approach to Detect Heavy Flow based on Uninterrupted Arrival Jin Ye, Lin Li, Wenlu Zhang, GuiHao Chen, Yuanchao Shan, Yijun Li, Weihe Li and Jiawei Huang
DeepCAT: A Cost-Efficient Online Configuration Auto-Tuning Approach for Big Data Frameworks Hui Dou, Yilun Wang, Yiwen Zhang and Pengfei Chen
Regularizing Sparse and Imbalanced Communications for Voxel-based Brain Simulations on Supercomputers Yuhao Liu, Xin Du, Zhihui Lu, Qiang Duan, Jianfeng Feng, Minglong Wang and Jie Wu
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks Zining Zhang, Bingsheng He and Zhenjie Zhang
BULB: Lightweight and Automated Load Balancing for Fast Datacenter Networks Yuan Liu, Wenxin Li, Wenyu Qu and Heng Qi
Characterizing and Optimizing Transformer Inference on ARM Many-core Processor Jiazhi Jiang, Jiangsu Du and Dan Huang
DC4: Reconstructing Data-Credit-Coupled Congestion Control for Data Centers Shan Huang, Dezun Dong, Lingbin Zeng, Zejia Zhou, Yukun Zhou and Xiangke Liao
HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep Learning Yijun Li, Jiawei Huang, Zhaoyi Li, Shengwen Zhou, Wanchun Jiang and Jianxin Wang
Learning Mean-Field Control for Delayed Information Load Balancing in Large Queuing Systems Anam Tahir, Kai Cui and Heinz Koeppl
Characterizing Job Microarchitectural Profiles at Scale: Dataset and Analysis Kangjin Wang, Cheng Wang, Tong Jia, Kingsum Chow, Yang Wen, Yaoyong Dou, Guoyao Xu, Chuanjia Hou, Jie Yao, Liping Zhang and Ying Li
Adaptive and Efficient GPU Time Sharing for Hyperparameter Tuning in Cloud Liu Liu, Zhijun Ding and Jian Yu
TileSpMSpV : A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs Haonan Ji, Huimin Song, Shibo Lu, Zhou Jin, Guangming Tan and Weifeng Liu
Boosting Cross-rack Multi-Stripe Repair in Heterogeneous Erasure-Coded Clusters Hai Zhou and Dan Feng
Spread: Decentralized Model Aggregation for Scalable Federated Learning Chuang Hu, Huanghuang Liang, Xiaoming Han, Boan Liu, Dazhao Cheng and Dan Wang
Automatically Generating High-performance Matrix Multiplication Kernels on the Latest Sunway Processor Xiaohan Tao, Yu Zhu, Boyang Wang, Jinlong Xu, Jianmin Pang and Jie Zhao
IATF: An Input-Aware Tuning Framework for Compact BLAS Based on ARMv8 CPUs Cunyang Wei, Haipeng Jia, Yunquan Zhang, Liusha Xu and Ji Qi
BWA-MEM-SCALE: Accelerating Genome Sequence Mapping on Commodity Servers Changdae Kim, Kwangwon Koh, Taehoon Kim, Daegyu Han and Jiwon Seo
Mlog: Multi-log Write Buffer upon Ultra-fast SSD RAID Shucheng Wang, Qiang Cao, Ziyi Lu and Jie Yao
Highly Parallel Linear Forest Extraction from a Weighted Graph on GPUs Christoph Klein and Robert Strzodka
Scheduling fork-join task graphs with communication delay and equal processing time huijun wang and Oliver Sinnen
EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers Lijuan Jiang
Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming Distance Shuang Ma, Si Wu, Cheng Li and Yinlong Xu
GraphSD: A State and Dependency aware Out-of-Core Graph Processing System Xianghao Xu, Hong Jiang, Fang Wang, Yongli Cheng and Peng Fang
FAIR-BFL: Flexible and Incentive Redesign for Blockchain-based Federated Learning Rongxin Xu, Shiva Pokhrel, Qiujun Lan and Gang Li
Acuerdo: Fast Atomic Broadcast over RDMA Joseph Izraelevitz, Gaukas Wang, Rhett Hanscom, Kayli Silvers, Tamara Silbergleit Lehman, Gregory Chockler and Alexey Gotsman
Vectorizing SpMV by Exploiting Dynamic Regular Patterns Xin You, Changxi Liu, Hailong Yang, Pengbo Wang, Zhongzhi Luan and Depei Qian
ROWE-tree: A Read-Optimized and Write-Efficient B+-tree for Persistent Memory Xiaomin Zou, Fang Wang, Tianjin Guan, Dan Feng and Nan Su
SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems Qinzhe Wu, Ashen Ekanayake, Ruihao Li, Jonathan Beard and Lizy John
Transparent load balancing of MPI programs using OmpSs-2@cluster and DLB Jimmy Aguilar Mena, Omar Ibrahim, Victor Lopez, Marta Garcia, Paul Carpenter, Eduard Ayguade and Jesus Labarta
ElastiSim: A Batch-System Simulator for Malleable Workloads Taylan Özden, Tim Beringer, Arya Mazaheri, Hamid Mohammadi Fard and Felix Wolf
Penelope: Peer-to-peer Power Management Tapan Srivastava, Huazhe Zhang and Henry Hoffmann
Parallel Algorithms for Masked Sparse Matrix-Matrix Products Srđan Milaković, Oguz Selvitopi, Israt Nisa, Zoran Budimlić and Aydin Buluc
Online Scheduling of Moldable Task Graphs under Common Speedup Models Anne Benoit, Lucas Perotin, Yves Robert and Hongyang Sun
Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine Yelick and Aydin Buluç
NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database Liang Liu, Mingzhu Shen, Ruihao Gong, Fengwei Yu and Hailong Yang
TCB: Accelerating Transformer Inference Services with Request Concatenation Boqian Fu, Fahao Chen, Peng Li and Deze Zeng
Mentha: Enabling Sparse-Packing Computation on Systolic Arrays Minjin Tang, Mei Wen, Yasong Cao, Junzhong Shen, Jianchao Yang, Jiawei Fei, Yang Guo and Sheng Liu
Exploiting CXL-based Memory for Distributed Deep Learning Moiz Arif, Kevin Assogba, M. Mustafa Rafique and Sudharshan Vazhkudai
Postmortem Graph Analysis on the Temporal Graphs. Md Maruf Hossain and Erik Saule
Atos: A Task-Parallel GPU Scheduler for Graph Analytics Yuxin Chen, Benjamin Brock, Serban Porumbescu, Aydin Buluc, Katherine Yelick and John Owens
LDPP: A Learned Directory Placement Policy in Distributed File Systems yuanzhang wang, fengkui yang, ji zhang, ke zhou, chunhua li, chong liu, zhuo cheng, wei fang and jinhu liu
On the Parallelization of MCMC for Community Detection Frank Wanye, Vitaliy Gleyzer, Edward Kao and Wu-chun Feng
Semi-Online Multi-Machine with Restart Scheduling for Integrated Edge and Cloud Computing Systems Liming Ge, Zizhao Wang, Wei Bao, Dong Yuan, Nguyen H. Tran, Bing B. Zhou and Albert Y. Zomaya
Towards Fast Large-scale Graph Analysis via Two-dimensional Balanced Partitioning Shuai Lin, Rui Wang, Yongkun Li, Yinlong Xu, John C.S. Lui, Fei Chen, Pengcheng Wang and Lei Han
A Dynamic and Recoverable BMT scheme for Secure Non-Volatile Memory Mengya Lei, Fang Wang, Dan Feng, Xiaoyu Shuai and Yuchao Cao
An Online Learning Approach for Client Selection in Federated Edge Learning under Budget Constraint Lina Su, Ruiting Zhou, Ne Wang, Guang Fang and Zongpeng Li
Online Resource Optimization for Elastic Stream Processing with Regret Guarantee Yang Liu, Huanle Xu and Wing Cheong Lau
Themis: Fair Memory Subsystem Resource Sharing with Differentiated QoS in Public Clouds Wenda Tang, Senbo Fu, Yutao Ke, Qian Peng and Feng Gao
FedHiSyn: A Hierarchical Synchronous Federated Learning Framework for Resource and Data Heterogeneity Guanghao Li, Yue Hu, Miao Zhang, Ji Liu, Quanjun Yin, Yong Peng and Dejing Dou
Tensor-Accelerated Fourth-Order Epistasis Detection on GPUs Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez and Leonel Sousa
Accelerating Random Forest Classification on GPU and FPGA Milan Shah, Reece Neff, Hancheng Wu, Marco Minutoli, Antonino Tumeo and Michela Becchi
Simmer: Rate proportional scheduling to reduce packet drops in vGPU based NF chains Avinash Kumar Chaurasia, Anshuj Garg, Bhaskaran Raman, Uday Kurkure, Hari Sivaraman, Lan Vu and Sairam Veeraswamy
ParaGraph: An application-simulator interface and toolkit for hardware-software co-design Mikhail Isaev, Nic McDonald, Jeff Young and Richard Vuduc
Parallel Network Slicing for Multi-SP Services Rongxin Han, Dezhi Chen, Song Guo, Xiaoyuan Fu, Jingyu Wang, Qi Qi and Jianxin Liao
Enabling Latency-Sensitive DNN Inference via Joint Optimization of Model Surgery and Resource Allocation in Heterogeneous Edge Zhaowu Huang, Fang Dong, Dian Shen, Huitian Wang, Xiaolin Guo and Shucun Fu
FLOPs as a Discriminant for Dense Linear Algebra Algorithms Francisco López Sánchez, Lars Karlsson and Paolo Bientinesi
FedClassAvg: Local representation learning for personalized federated learning on heterogeneous neural networks Jaehee Jang, Heoneok Ha, Dahuin Jung and Sungroh Yoon
NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks Haoyu Wang, Kevin Zheng, Charles Reiss and Haiying Shen
Dynamic Strategies for High Performance Training of Knowledge Graph Embeddings Anwesh Panda and Sathish Vadhiyar
Counting Induced 6-Cycles in Bipartite Graphs Jason Niu, Jaroslaw Zola and Ahmet Erdem Sarıyüce
A Data-aware Learned Index scheme for Efficient Writes Chunhua Li, Zhou Zhang, Yuhan Liu, Li Liu, Ke Zhou and Ji Zhang
Formulating Interference-aware Data Delivery Strategies in Edge Storage Systems Xiaoyu Xia, Feifei Chen, Qiang He, Guangming Cui, John Grundy, Mohamed Abdelrazek and Fang Dong
Energy-efficient Edge Server Management for Edge Computing: A Game-theoretical Approach Guangming Cui, Qiang He, Xiaoyu Xia, Feifei Chen and Yun Yang
Efficient Phase-Functioned Real-time Character Control in Mobile Games: A TVM Enabled Approach Haidong Lan, Wenxi Zhu, Qian Qiu, Dou Wu, Honglin Zhu, Jingjing Zhao, Xinghui Fu, Minwen Deng and Jintao Meng
DRAM Cache Management with Request Granularity for NAND-based SSDs Haodong Lin, Zhibing Sha, Jun Li, Zhigang Cai, Balazs Gerofi, Jianwei Liao and Yuanquan Shi
MG-GCN: Scalable Multi-GPU Full Batch GCN Training Framework Muhammed Fatih Balin, Kaan Sancak and Umit V. Catalyurek
Cache-Poll: Containing Pollution in Non-Inclusive Caches Through Cache Partitioning Lucia Pons, Julio Sahuquillo, Salvador Petit and Julio Pons
Analyzing Performance and Power-Efficiency Variations among NVIDIA GPUs Kohei Yoshida, Rio Sageyama, Shinobu Miwa, Hayato Yamaki and Hiroki Honda
FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning Nang Hung Nguyen, Phi Le Nguyen, Thuy Dung Nguyen, Trung Thanh Nguyen, Duc Long Nguyen, Thanh Hung Nguyen, Huy Hieu Pham and Thao Nguyen Truong
DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training Zhengbo Chen, Qi Yu, Fang Zheng, Feng Guo and Zuoning Chen
Tesseract: Parallelize the Tensor Parallelism Efficiently Boxiang Wang, Qifan Xu, Zhengda Bian and Yang You
Micro-Benchmarking MPI Partitioned Point-to-Point Communication Yiltan Hassan Temucin, Ryan Grant and Ahmad Afsahi
Automatic Differentiation of Parallel Loops with Formal Methods Jan Hückelheim and Laurent Hascoët
Lobster: Load Balance-Aware I/O for Distributed DNN Training Jie Liu, Bogdan Nicolae and Dong Li
From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus Dian-Lun Lin, Haoxing Ren, Yanqing Zhang, Brucek Khailany and Tsung-Wei Huang
Accelerating Parallel First-Principles Excited-State calculation by Low-Rank Approximation with K-Means Clustering Qingcai Jiang
Multi Resource Scheduling with Task Cloning in Heterogeneous Clusters Huanle Xu, Yang Liu and Wing Cheong Lau
Performance Modeling for Short-Term Cache Allocation Christopher Stewart, Nathaniel Morris, Lydia Chen and Robert Birke
BSCache: A Brisk Semantic Caching Scheme for Cloud-based Performance Monitoring Timeseries Systems Kai ZHANG, Zhiqi WANG and Zili SHAO
A single-tree algorithm to compute Euclidean minimum spanning tree on GPU Andrey Prokopenko, Piyush Sao and Damien Lebrun-Grandié