ICPP 2022 Accepted Papers
The following is a list of the contributed papers accepted for presentation as part of the main conference program to be held virtually.
Updated June 29, 2022| SHE: A Generic Framework for Data Stream Mining over Sliding Windows | Yuhan Wu, Zhuochen Fan, Qilong Shi, Yixin Zhang, Tong Yang, Cheng Chen, Zheng Zhong, Junnan Li, Ariel Shtul and Yaofeng Tu |
| SMEGA2: Distributed Asynchronous Deep Neural Network Training With a Single Momentum Buffer | Refael Cohen, Ido Hakimi and Assaf Schuster |
| EmbRace: Accelerating Communication for Efficient Training of Sparse Deep Neural Networks | Shengwei Li, Zhiquan Lai, Dongsheng Li, Yiming Zhang, Xiangyu Ye and Yabo Duan |
| ADSTS: Automatic Distributed Storage Tuning System Using Deep Reinforcement Learning | Kai Lu, Jiguang Wan, Guokuan Li, Ruixiang Ma and Wei Zhao |
| Aperiodic Local SGD: Beyond Local SGD | Hao Zhang, Tingting Wu, Siyao Cheng and Jie Liu |
| Exploiting Parallelism of Disk Failure Recovery via Partial Stripe Repair for an Erasure-Coded High-Density Storage Server | Lin Wang, Yuchong Hu, Qian Du, Dan Feng, Ray Wu, Ingo He and Kevin Zhang |
| Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline Training | Shengyuan Ye, Liekang Zeng, Qiong Wu, Ke Luo, Qingze Fang and Xu Chen |
| ParallelDualSPHysics: Supporting Efficient Parallel Fluid Simulations through MPI-enabled SPH method | Sifan Long, Xiao-Wei Guo, Chao Li, Xiaokang Fan, Kelvin Wong, Ran Zhao, Yi Liu, Sen Zhang and Canqun Yang |
| UA-Sketch: An Accurate Approach to Detect Heavy Flow based on Uninterrupted Arrival | Jin Ye, Lin Li, Wenlu Zhang, GuiHao Chen, Yuanchao Shan, Yijun Li, Weihe Li and Jiawei Huang |
| DeepCAT: A Cost-Efficient Online Configuration Auto-Tuning Approach for Big Data Frameworks | Hui Dou, Yilun Wang, Yiwen Zhang and Pengfei Chen |
| Regularizing Sparse and Imbalanced Communications for Voxel-based Brain Simulations on Supercomputers | Yuhao Liu, Xin Du, Zhihui Lu, Qiang Duan, Jianfeng Feng, Minglong Wang and Jie Wu |
| HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks | Zining Zhang, Bingsheng He and Zhenjie Zhang |
| BULB: Lightweight and Automated Load Balancing for Fast Datacenter Networks | Yuan Liu, Wenxin Li, Wenyu Qu and Heng Qi |
| Characterizing and Optimizing Transformer Inference on ARM Many-core Processor | Jiazhi Jiang, Jiangsu Du and Dan Huang |
| DC4: Reconstructing Data-Credit-Coupled Congestion Control for Data Centers | Shan Huang, Dezun Dong, Lingbin Zeng, Zejia Zhou, Yukun Zhou and Xiangke Liao |
| HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep Learning | Yijun Li, Jiawei Huang, Zhaoyi Li, Shengwen Zhou, Wanchun Jiang and Jianxin Wang |
| Learning Mean-Field Control for Delayed Information Load Balancing in Large Queuing Systems | Anam Tahir, Kai Cui and Heinz Koeppl |
| Characterizing Job Microarchitectural Profiles at Scale: Dataset and Analysis | Kangjin Wang, Cheng Wang, Tong Jia, Kingsum Chow, Yang Wen, Yaoyong Dou, Guoyao Xu, Chuanjia Hou, Jie Yao, Liping Zhang and Ying Li |
| Adaptive and Efficient GPU Time Sharing for Hyperparameter Tuning in Cloud | Liu Liu, Zhijun Ding and Jian Yu |
| TileSpMSpV : A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs | Haonan Ji, Huimin Song, Shibo Lu, Zhou Jin, Guangming Tan and Weifeng Liu |
| Boosting Cross-rack Multi-Stripe Repair in Heterogeneous Erasure-Coded Clusters | Hai Zhou and Dan Feng |
| Spread: Decentralized Model Aggregation for Scalable Federated Learning | Chuang Hu, Huanghuang Liang, Xiaoming Han, Boan Liu, Dazhao Cheng and Dan Wang |
| Automatically Generating High-performance Matrix Multiplication Kernels on the Latest Sunway Processor | Xiaohan Tao, Yu Zhu, Boyang Wang, Jinlong Xu, Jianmin Pang and Jie Zhao |
| IATF: An Input-Aware Tuning Framework for Compact BLAS Based on ARMv8 CPUs | Cunyang Wei, Haipeng Jia, Yunquan Zhang, Liusha Xu and Ji Qi |
| BWA-MEM-SCALE: Accelerating Genome Sequence Mapping on Commodity Servers | Changdae Kim, Kwangwon Koh, Taehoon Kim, Daegyu Han and Jiwon Seo |
| Mlog: Multi-log Write Buffer upon Ultra-fast SSD RAID | Shucheng Wang, Qiang Cao, Ziyi Lu and Jie Yao |
| Highly Parallel Linear Forest Extraction from a Weighted Graph on GPUs | Christoph Klein and Robert Strzodka |
| Scheduling fork-join task graphs with communication delay and equal processing time | huijun wang and Oliver Sinnen |
| EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers | Lijuan Jiang |
| Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming Distance | Shuang Ma, Si Wu, Cheng Li and Yinlong Xu |
| GraphSD: A State and Dependency aware Out-of-Core Graph Processing System | Xianghao Xu, Hong Jiang, Fang Wang, Yongli Cheng and Peng Fang |
| FAIR-BFL: Flexible and Incentive Redesign for Blockchain-based Federated Learning | Rongxin Xu, Shiva Pokhrel, Qiujun Lan and Gang Li |
| Acuerdo: Fast Atomic Broadcast over RDMA | Joseph Izraelevitz, Gaukas Wang, Rhett Hanscom, Kayli Silvers, Tamara Silbergleit Lehman, Gregory Chockler and Alexey Gotsman |
| Vectorizing SpMV by Exploiting Dynamic Regular Patterns | Xin You, Changxi Liu, Hailong Yang, Pengbo Wang, Zhongzhi Luan and Depei Qian |
| ROWE-tree: A Read-Optimized and Write-Efficient B+-tree for Persistent Memory | Xiaomin Zou, Fang Wang, Tianjin Guan, Dan Feng and Nan Su |
| SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems | Qinzhe Wu, Ashen Ekanayake, Ruihao Li, Jonathan Beard and Lizy John |
| Transparent load balancing of MPI programs using OmpSs-2@cluster and DLB | Jimmy Aguilar Mena, Omar Ibrahim, Victor Lopez, Marta Garcia, Paul Carpenter, Eduard Ayguade and Jesus Labarta |
| ElastiSim: A Batch-System Simulator for Malleable Workloads | Taylan Özden, Tim Beringer, Arya Mazaheri, Hamid Mohammadi Fard and Felix Wolf |
| Penelope: Peer-to-peer Power Management | Tapan Srivastava, Huazhe Zhang and Henry Hoffmann |
| Parallel Algorithms for Masked Sparse Matrix-Matrix Products | Srđan Milaković, Oguz Selvitopi, Israt Nisa, Zoran Budimlić and Aydin Buluc |
| Online Scheduling of Moldable Task Graphs under Common Speedup Models | Anne Benoit, Lucas Perotin, Yves Robert and Hongyang Sun |
| Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly | Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine Yelick and Aydin Buluç |
| NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database | Liang Liu, Mingzhu Shen, Ruihao Gong, Fengwei Yu and Hailong Yang |
| TCB: Accelerating Transformer Inference Services with Request Concatenation | Boqian Fu, Fahao Chen, Peng Li and Deze Zeng |
| Mentha: Enabling Sparse-Packing Computation on Systolic Arrays | Minjin Tang, Mei Wen, Yasong Cao, Junzhong Shen, Jianchao Yang, Jiawei Fei, Yang Guo and Sheng Liu |
| Exploiting CXL-based Memory for Distributed Deep Learning | Moiz Arif, Kevin Assogba, M. Mustafa Rafique and Sudharshan Vazhkudai |
| Postmortem Graph Analysis on the Temporal Graphs. | Md Maruf Hossain and Erik Saule |
| Atos: A Task-Parallel GPU Scheduler for Graph Analytics | Yuxin Chen, Benjamin Brock, Serban Porumbescu, Aydin Buluc, Katherine Yelick and John Owens |
| LDPP: A Learned Directory Placement Policy in Distributed File Systems | yuanzhang wang, fengkui yang, ji zhang, ke zhou, chunhua li, chong liu, zhuo cheng, wei fang and jinhu liu |
| On the Parallelization of MCMC for Community Detection | Frank Wanye, Vitaliy Gleyzer, Edward Kao and Wu-chun Feng |
| Semi-Online Multi-Machine with Restart Scheduling for Integrated Edge and Cloud Computing Systems | Liming Ge, Zizhao Wang, Wei Bao, Dong Yuan, Nguyen H. Tran, Bing B. Zhou and Albert Y. Zomaya |
| Towards Fast Large-scale Graph Analysis via Two-dimensional Balanced Partitioning | Shuai Lin, Rui Wang, Yongkun Li, Yinlong Xu, John C.S. Lui, Fei Chen, Pengcheng Wang and Lei Han |
| A Dynamic and Recoverable BMT scheme for Secure Non-Volatile Memory | Mengya Lei, Fang Wang, Dan Feng, Xiaoyu Shuai and Yuchao Cao |
| An Online Learning Approach for Client Selection in Federated Edge Learning under Budget Constraint | Lina Su, Ruiting Zhou, Ne Wang, Guang Fang and Zongpeng Li |
| Online Resource Optimization for Elastic Stream Processing with Regret Guarantee | Yang Liu, Huanle Xu and Wing Cheong Lau |
| Themis: Fair Memory Subsystem Resource Sharing with Differentiated QoS in Public Clouds | Wenda Tang, Senbo Fu, Yutao Ke, Qian Peng and Feng Gao |
| FedHiSyn: A Hierarchical Synchronous Federated Learning Framework for Resource and Data Heterogeneity | Guanghao Li, Yue Hu, Miao Zhang, Ji Liu, Quanjun Yin, Yong Peng and Dejing Dou |
| Tensor-Accelerated Fourth-Order Epistasis Detection on GPUs | Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez and Leonel Sousa |
| Accelerating Random Forest Classification on GPU and FPGA | Milan Shah, Reece Neff, Hancheng Wu, Marco Minutoli, Antonino Tumeo and Michela Becchi |
| Simmer: Rate proportional scheduling to reduce packet drops in vGPU based NF chains | Avinash Kumar Chaurasia, Anshuj Garg, Bhaskaran Raman, Uday Kurkure, Hari Sivaraman, Lan Vu and Sairam Veeraswamy |
| ParaGraph: An application-simulator interface and toolkit for hardware-software co-design | Mikhail Isaev, Nic McDonald, Jeff Young and Richard Vuduc |
| Parallel Network Slicing for Multi-SP Services | Rongxin Han, Dezhi Chen, Song Guo, Xiaoyuan Fu, Jingyu Wang, Qi Qi and Jianxin Liao |
| Enabling Latency-Sensitive DNN Inference via Joint Optimization of Model Surgery and Resource Allocation in Heterogeneous Edge | Zhaowu Huang, Fang Dong, Dian Shen, Huitian Wang, Xiaolin Guo and Shucun Fu |
| FLOPs as a Discriminant for Dense Linear Algebra Algorithms | Francisco López Sánchez, Lars Karlsson and Paolo Bientinesi |
| FedClassAvg: Local representation learning for personalized federated learning on heterogeneous neural networks | Jaehee Jang, Heoneok Ha, Dahuin Jung and Sungroh Yoon |
| NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks | Haoyu Wang, Kevin Zheng, Charles Reiss and Haiying Shen |
| Dynamic Strategies for High Performance Training of Knowledge Graph Embeddings | Anwesh Panda and Sathish Vadhiyar |
| Counting Induced 6-Cycles in Bipartite Graphs | Jason Niu, Jaroslaw Zola and Ahmet Erdem Sarıyüce |
| A Data-aware Learned Index scheme for Efficient Writes | Chunhua Li, Zhou Zhang, Yuhan Liu, Li Liu, Ke Zhou and Ji Zhang |
| Formulating Interference-aware Data Delivery Strategies in Edge Storage Systems | Xiaoyu Xia, Feifei Chen, Qiang He, Guangming Cui, John Grundy, Mohamed Abdelrazek and Fang Dong |
| Energy-efficient Edge Server Management for Edge Computing: A Game-theoretical Approach | Guangming Cui, Qiang He, Xiaoyu Xia, Feifei Chen and Yun Yang |
| Efficient Phase-Functioned Real-time Character Control in Mobile Games: A TVM Enabled Approach | Haidong Lan, Wenxi Zhu, Qian Qiu, Dou Wu, Honglin Zhu, Jingjing Zhao, Xinghui Fu, Minwen Deng and Jintao Meng |
| DRAM Cache Management with Request Granularity for NAND-based SSDs | Haodong Lin, Zhibing Sha, Jun Li, Zhigang Cai, Balazs Gerofi, Jianwei Liao and Yuanquan Shi |
| MG-GCN: Scalable Multi-GPU Full Batch GCN Training Framework | Muhammed Fatih Balin, Kaan Sancak and Umit V. Catalyurek |
| Cache-Poll: Containing Pollution in Non-Inclusive Caches Through Cache Partitioning | Lucia Pons, Julio Sahuquillo, Salvador Petit and Julio Pons |
| Analyzing Performance and Power-Efficiency Variations among NVIDIA GPUs | Kohei Yoshida, Rio Sageyama, Shinobu Miwa, Hayato Yamaki and Hiroki Honda |
| FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning | Nang Hung Nguyen, Phi Le Nguyen, Thuy Dung Nguyen, Trung Thanh Nguyen, Duc Long Nguyen, Thanh Hung Nguyen, Huy Hieu Pham and Thao Nguyen Truong |
| DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training | Zhengbo Chen, Qi Yu, Fang Zheng, Feng Guo and Zuoning Chen |
| Tesseract: Parallelize the Tensor Parallelism Efficiently | Boxiang Wang, Qifan Xu, Zhengda Bian and Yang You |
| Micro-Benchmarking MPI Partitioned Point-to-Point Communication | Yiltan Hassan Temucin, Ryan Grant and Ahmad Afsahi |
| Automatic Differentiation of Parallel Loops with Formal Methods | Jan Hückelheim and Laurent Hascoët |
| Lobster: Load Balance-Aware I/O for Distributed DNN Training | Jie Liu, Bogdan Nicolae and Dong Li |
| From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus | Dian-Lun Lin, Haoxing Ren, Yanqing Zhang, Brucek Khailany and Tsung-Wei Huang |
| Accelerating Parallel First-Principles Excited-State calculation by Low-Rank Approximation with K-Means Clustering | Qingcai Jiang |
| Multi Resource Scheduling with Task Cloning in Heterogeneous Clusters | Huanle Xu, Yang Liu and Wing Cheong Lau |
| Performance Modeling for Short-Term Cache Allocation | Christopher Stewart, Nathaniel Morris, Lydia Chen and Robert Birke |
| BSCache: A Brisk Semantic Caching Scheme for Cloud-based Performance Monitoring Timeseries Systems | Kai ZHANG, Zhiqi WANG and Zili SHAO |
| A single-tree algorithm to compute Euclidean minimum spanning tree on GPU | Andrey Prokopenko, Piyush Sao and Damien Lebrun-Grandié |