Research Topics

Accelerators for High Performance Computing

I am delighted to work with a nice team at the “Accelerators for High Performance Computing” group at BSC, Computer Sciences Department

The work done on accelerators like FPGAs and GPUs suggested the creation of the Accelerators for High Performance Computing group at the Computer Sciences department at BSC, managed by Nacho Navarro, co-PI under Prof. Mateo Valero of the NVIDIA CCOE. The main goals of our current endeavors are, among others:

  • Providing a common set of system abstractions to ease the development on GPU-based systems for different system topologies. Studying the effects of NUMA multi-GPU computations using the peer-memory access capability of Fermi/Kepler GPUs.
  • Extending the GPGPUSim simulator to perform a TLB design space exploration in the context of GPUs. This allows for the study of memory page access patterns across GPU cores with the purpose of improving kernel performance (i.e. new thread-block scheduling policies, etc.)
  • Working on novel ideas on how to implement merge sort in single node multi-GPU systems. Currently working on mechanisms and policies for scheduling multiprogrammed workloads on GPUs.
  • Implementing iterative solvers for sparse linear systems in CUDA, and comparing the performance across different sparse matrix formats/representations for these iterative solvers.

IMPACT Research Group at University of Illinois

Current Projects

AMGE and GMAC/CUDA for Multi-GPU Nodes

AMGE, a framework that eases the development of CUDA applications and tools while achieving similar or better performance than hand-tuned code. The new features implemented in AMGE allow programmers to further fine-tune their code and remove some limitations found in the original GMAC library. For example, memory objects can be now arbitrarily mapped on several devices without restrictions and a host thread can launch kernels on any GPU in the system. Moreover, AMGE transparently takes advantage of the new features offered by the hardware like the GPUDirect 2 peer-to-peer communication.

CUDA provides an easy-to-use and versatile development environment for GPU programming. Nowadays the evolution of the Tesla and Kepler cards is opening the smooth integration few GPU cards that are plugged in a single compute node (typically 2 or 4 GPUs). This overcomes the single GPU memory capacity limitation, avoiding to a certain limit to use Message Passing Interface (MPI) to distribute their applications among several compute nodes. The efficient exploitation of these scenarios is hard to take for non-expert parallel programmers, such as physics, biologists or mathematicians. Multi-GPU programming is a hot topic because most of the problems researchers are trying to solve using CUDA requires large GPU memory capacities. Now hardware is ready to experience multi-GPU programming locally. This project will provide all users with a virtual multi-GPU environment where they will have access to a large number of GPUs in their local machines.

  • Researchers: Javier Cabezas

BSC/UPC GPU Center of Excellence

The Barcelona Supercomputing Center (BSC), associated to the Universitat Politecnica de Catalunya (UPC) in Barcelona, Spain, is a NVIDIA CUDA Center of Excellence (CCOE) since 2011. Starting 2005, BSC is the National Supercomputing Facility in Spain and actively participate in the main HPC projects in Europe. Close to 400 researchers and staff from more than 30 different countries work at BSC. Our own research is focused in three main fields: Computer Sciences, Life Sciences and Earth Sciences. (

  • Director: Mateo Valero, BSC Director
  • Managing Director: Nacho Navarro

Master Program one-semester courses at UPC

The GCOE award has enhanced the educational program at Universitat Politecnica de Catalunya. We are actually directly involved in teaching CUDA, OpenCL and MPI/OpenMP variations like OmpSs using GPUs in these courses at the Computer Science School (FIB):

  • Parallelism (PAR) (Eduard Ayguade)
  • Graphic Cards and Accelerators (TGA) (Agustin Fernandez)
  • Parallel Programming Tools & Models (PPTM-MIRI) (Jesus Labarta)
  • Concurrence, Parallelism and Distributed Systems (CPDS-MIRI) (Eduard Ayguade)
  • Parallel Programming Models and Algorithms (34325) (Josep-Ramon Herrero)
  • Operating Systems for New Architectures (34328) (Nacho Navarro)
  • Execution Environments for Parallel Architectures (34329) (Marisa Gil, Xavier Martorell)
  • Applied CUDA Programming (SEM-26) (Isaac Gelado)


Yes, we do have open positions for outstanding students looking for a great PhD in Barcelona!