A thread-block-wise computational framework for large-scale hierarchical continuum-discrete modeling of granular media

Shiwei Zhao, Jidong Zhao, Weijian Liang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

17 Citations (Scopus)


This article presents a novel, scalable parallel computing framework for large-scale and multiscale simulations of granular media. Key to the new framework is an innovative thread-block-wise representative volume element (RVE) parallelism, inspired by the resemblance between a typical multiscale computational hierarchy and the hierarchical thread structure of graphics processing units (GPUs). To solve a hierarchical multiscale problem, all computation in an RVE is assigned a single block of threads so that the RVE runs entirely on a GPU to avoid frequent data exchange with the host CPU. The thread blocks can meanwhile run in an asynchronization mode, which implicitly guarantees the independence of inter-RVE computation as featured by the hierarchical multiscale structure. The parallel computing algorithms are formulated and implemented in an in-house code, GoDEM, involving the GPU-specific techniques such as coalesced access, shared memory utilization, and unified memory implementation. Benchmark and performance tests are conducted against an open-source CPU-based DEM code under three typical loading conditions. The performance of GoDEM is examined with varying thread-block size and register pressure of the GPU, and RVE number. It reveals that increasing GPU occupancy by decreasing register pressure results in a significant degradation rather than improvement in performance. We further demonstrate that the proposed GPU parallelism framework may achieve a saturated speedup of approximately 350 compared with the single-CPU-core code. As a demonstration on its application for multiscale modeling of granular media, the material point method is coupled with the new framework powered DEM to simulate a typical engineering-scale problem involving tens of millions of total particles having to be handled. It demonstrates that a speedup of approximately 91 can be achieved by using the proposed framework, compared with the performance of a similar CPU program running on a cluster node of 44 parallel threads. The study offers a viable future solution to large-scale and multiscale modeling of granular media.

Original languageEnglish
Pages (from-to)579-608
Number of pages30
JournalInternational Journal for Numerical Methods in Engineering
Issue number2
Publication statusPublished - 30 Jan 2021
Externally publishedYes


  • continuum-discrete coupling
  • DEM
  • granular media
  • MPM
  • multiscale modeling
  • parallel computing

ASJC Scopus subject areas

  • Numerical Analysis
  • General Engineering
  • Applied Mathematics


Dive into the research topics of 'A thread-block-wise computational framework for large-scale hierarchical continuum-discrete modeling of granular media'. Together they form a unique fingerprint.

Cite this