网站地图 | 联系我们 | English | 意见反馈 | 主任信箱
 
首页 中心概况 新闻动态 科研进展 交流合作 人才培养 研究队伍 人才招聘 政策规章 数学交叉科学传播
学术报告
现在位置:首页 > 学术报告

Exploring the thread-level parallelisms for the next generation geophysical fluid modelling framework Fluidity-ICOM
【2013.5.14 4:00pm,Z301】

【打印】【关闭】

 2013-5-13 

  Colloquia & Seminars 

  Speaker

      

   Prof. Guo Xiaohu,Scientific Computing Department, Science and Technology Facilities Council, Daresbury Laboratory, Daresbury Science and Innovation Campus, Warrington, Cheshire

  Title

  

  Exploring the thread-level parallelisms for the next generation geophysical fluid modelling framework Fluidity-ICOM             

 

  Time

  2013.5.14 4:00pm                         

  Venue

  Z301

  Abstract

   

The major challenges caused by the increasing scale and complexity of the current petascale and the future exascale systems are cross-cutting concerns of the whole software ecosystem. The trend for compute nodes is towards greater numbers of lower power cores, with a decreasing memory to core ratio. This is imposing a strong evolutionary pressure on numerical algorithms and software to efficiently utilise the available memory and network bandwidth.

Unstructured finite elements codes have been effectively parallelised using domain decomposition methods, implemented using libraries such as the Message Passing Interface (MPI) for a long time. However, there are many algorithmic and implementation optimisation opportunities when threading is used for intra-node parallelisation for the latest multi-core/many-core platforms. The benefits include reduced memory requirements, cache sharing, reduced number of partitions and less MPI communication. While OpenMP is promoted as being easy to use and allows incremental parallelisation of codes, na飗e implementations frequently yield poor performance. In practice, as with MPI, the same care and attention should be exercised over algorithm and hardware details when programming with OpenMP.

In this talk, we highlight our progress in implementing a hybrid OpenMP-MPI version of the unstructured finite element application Fluidity-ICOM. We demonstrate that utilising non-blocking algorithms and libraries are critical to mixed-mode application so that it can achieve better parallel performance than the pure MPI version. In the matrix assembly kernels, the OpenMP parallel algorithm utilises graph colouring to identify independent sets of elements that can be assembled simultaneously with no race conditions. The TCMalloc are used here to tackle performance issues arising from automatic arrays memory allocations. The sparse linear systems defined by various equations are solved by using threaded PETSc and HYPRE is utilised as a threaded preconditioner through the PETSc interface. With explicit communication overlap using task-based parallelism, a significant speedup over the pure-MPI mode and efficient strong scaling for PETSc sparse matrixvector multiplication kernels has been achieved. Since unstructured finite element codes are well known to be memory bound, particular attention has to be paid to ccNUMA architectures where data locality is particularly important to achieve good intra-node scaling characteristics. With mixed mode MPI/OpenMP, Fluidity-ICOM can now run well above 32K cores job.

  Affiliation

 

 

欢迎访问国家数学与交叉科学中心 
地址:北京海淀区中关村东路55号 邮编:100190 电话: 86-10-62613242 Fax: 86-10-62616840 邮箱: ncmis@amss.ac.cn