数值计算与计算机应用
       首页 |  期刊介绍 |  编委会 |  投稿指南 |  期刊订阅 |  下载中心 |  联系我们 |  在线办公 | 
数值计算与计算机应用  2014, Vol. 35 Issue (4): 269-276    DOI:
论文 最新目录 | 下期目录 | 过刊浏览 | 高级检索 Previous Articles  |  Next Articles  
一种提高SpMV向量化性能的新型稀疏矩阵存储格式
刘芳芳1, 杨超1,2
1. 中国科学院软件研究所并行软件与计算科学实验室, 北京, 100190;
2. 中国科学院软件研究所计算机科学国家重点实验室, 北京, 100190
A NEW SPARSE MATRIX STORAGE FORMAT FOR IMPROVING SPMV PERFORMANCE BY SIMD
Liu Fangfang1, Yang Chao1,2
1. Institute of Software,Chinese Academy of Sciences, Beijing 100190, China;
2. State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190, China
 全文: PDF (525 KB)   HTML (1 KB)   输出: BibTeX | EndNote (RIS)      背景资料
摘要 稀疏矩阵向量乘(SpMV)是科学与工程计算中一个重要的核心函数, 但在当前基于存储器层次结构的计算平台上, 传统CSR(Compressed Sparse Row)存储的稀疏矩阵向量乘性能较低, 运行效率往往远低于硬件浮点峰值的10%. 目前现有的处理器架构一般都采用SIMD向量化技术进行加速, 但是传统CSR格式的稀疏矩阵向量乘由于访存的不规则性, 不能直接采用向量化技术进行加速, 为了利用SIMD技术, 对具有局部性特征的稀疏矩阵, 提出了新的稀疏矩阵存储格式CSRL(Compressed Sparse Row with Localinformation), 该格式可以减少SpMV时内存访问次数, 并且能够充分利用硬件的SIMD向量化技术进行读取和计算, 提高了SpMV 性能. 实验表明, 该方法相比国际著名商业库Intel MKL10.3版平均性能提升达到29.5%, 最高可达89% 的性能提升.
服务
把本文推荐给朋友
加入我的书架
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词稀疏矩阵   稀疏矩阵向量乘   向量化   局部性   CSRL     
Abstract: Sparse matrix-vector multiplication (SpMV) is an important computational kernel in scientific and engineering applications. The performance of SpMV by using traditional CSR format is often far below 10% of the peak performance on modern processors with memory hierarchy. When using the CSR format for SpMV, it is often hard to directly take advantage of the SIMD acceleration technology on mordern processors, due to irregular memory access pattern. In order to use the SIMD technology, a new storage format for sparse matrices, CSRL (Compressed Sparse Row with Local information), is proposed.The CSRL format has locality characteristic, and is SIMD-friendly. The new format reduces the number of memory access and improves the SpMV performance. Experiments show that, compared with the implementation in Intel MKL library (version 10.3), the SpMV based on the CSRL format gains an average of 29.5% and maximum of 89%performance improvement.
Key wordsSparse matrix   SIMD   SpMV   CSRL   
收稿日期: 2014-02-14;
基金资助:

国家自然科学基金项目(61170075,91130023);国家973项目2011CB309701;国家863项目2012AA010903资助.

引用本文:   
. 一种提高SpMV向量化性能的新型稀疏矩阵存储格式[J]. 数值计算与计算机应用, 2014, 35(4): 269-276.
. A NEW SPARSE MATRIX STORAGE FORMAT FOR IMPROVING SPMV PERFORMANCE BY SIMD[J]. Journal of Numerical Methods and Computer Applicat, 2014, 35(4): 269-276.
 
[1] Saad Y. Iterative methods for sparse linear systems[M]. SIAM, 2003.
[2] Vuduc R, Demmel J W, Yelick K A. OSKI: A library of automatically tuned sparse matrixkernels[C]. Journal of Physics: Conference Series. IOP Publishing, 2005, 16(1): 521.
[3] Willcock J, Lumsdaine A. Accelerating sparse matrix computations via data compression[C].Proceedings of the 20th Annual International Conference on Supercomputing. ACM, 2006, 307-316.
[4] Kourtis K, Goumas G, Koziris N. Optimizing sparse matrix-vector multiplication using index andvalue compression[C]. Proceedings of the 5th conference on Computing Frontiers. ACM, 2008,87-96.
[5] Kourtis K, Karakasis V, Goumas G, et al. CSX: an extended compression format for SpMV onshared memory systems[C]. Proceedings of the 16th ACM Symposium on Principles and Practiceof Parallel Programming. ACM, 2011, 247-256.
[6] Sun X, Zhang Y, Wang T, et al. CRSD: application specific auto-tuning of SpMV for diagonalsparse matrices[M]. Euro-Par 2011 Parallel Processing. Springer Berlin Heidelberg, 2011, 316-327.
[7] Li J, Tan G, Chen M, et al. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication[C]. Proceedings of the 34th ACM SIGPLAN Conference on Programming LanguageDesign and Implementation. ACM, 2013, 117-126.
[8] Im E J, Yelick K. Optimizing sparse matrix computations for register reuse in SPARSITY[M].Computational Science—ICCS 2001. Springer Berlin Heidelberg, 2001, 127-136.
[9] D'Azevedo E F, Fahey M R, Mills R T. Vectorized sparse matrix multiply for compressed rowstorage format[M]. Computational Science—CICCS 2005. Springer Berlin Heidelberg, 2005, 99-106.
[10] Williams S, Oliker L, Vuduc R, et al. Optimization of sparse matrix-vector multiplication onemerging multicore platforms[J]. Parallel Computing, 2009, 35(3): 178-194.
[11] Liu X, Smelyanskiy M, Chow E, et al. Efficient sparse matrix-vector multiplication on x86-basedmany-core processors[C]. Proceedings of the 27th ACM International Conference on Supercomputing.ACM, 2013, 273-282.
[12] 袁娥, 张云泉, 刘芳芳等. SpMV 的自动性能优化实现技术及其应用研究[J]. 计算机研究与发展, 2009, 46(7): 1117-1126.
[1] 柳朝阳. 曲率连续保凸插值四次Bézier曲线及其误差分析[J]. 数值计算与计算机应用, 2005, 26(4): 278-284.
Copyright © 2008 数值计算与计算机应用 版权所有
中国科学院数学与系统科学研究院 《数值计算与计算机应用》编辑部
北京2719信箱 (100190) Email: szjs@lsec.cc.ac.cn
Support by Beijing Magtech Co.ltd   E-mail:support@magtech.com.cn
京ICP备05002806号-10