Workshop on Big Data & Deep Learning in HPC

** NEW ** BDL2020 will be held online (synchronous and/or asynchronous)

Organized within the IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2020)

The number of very large data repositories (big data) is increasing in a rapid pace. Analysis of such repositories using the "traditional" sequential implementations of ML and emerging techniques, like deep learning, that model high-level abstractions in data by using multiple processing layers, requires expensive computational resources and long running times. Parallel or distributed computing are possible approaches that can make analysis of very large repositories and exploration of high-level representations feasible. Taking advantage of a parallel or a distributed execution of a ML/statistical system may: i) increase its speed; ii) learn hidden representations; iii) search a larger space and reach a better solution or; iv) increase the range of applications where it can be used (because it can process more data, for example).  Parallel and distributed computing is therefore of high importance to extract knowledge from massive amounts of data and learn hidden representations.

The workshop will be concerned with the exchange of experience among academics, researchers and the industry whose work in big data and deep learning require high performance computing to achieve goals. Participants will present recently developed algorithms/systems, on going work and applications taking advantage of such parallel or distributed environments.

List of Topics  

All novel data-intensive computing techniques, data storage and integration schemes, and algorithms for cutting-edge high performance computing architectures which targets Big Data and Deep Learning are of interest to the workshop. Examples of topics include but not limited to:

  • parallel algorithms for data-intensive applications;

  • scalable data and text mining and information retrieval;

  • using Hadoop, MapReduce, Spark, Storm, Streaming to analyze Big Data;

  • energy-efficient data-intensive computing;

  • deep-learning with massive-scale datasets;

  • querying and visualization of large network datasets;

  • processing large-scale datasets on clusters of multicore and manycore processors, and accelerators;

  • heterogeneous computing for Big Data architectures;

  • Big Data in the Cloud;

  • processing and analyzing high-resolution images using high-performance computing;

  • using hybrid infrastructures for Big Data analysis. 

  • New algorithms for parallel/distributed execution of ML systems;

  • applications of big data and deep learning to real-life problems.




Telecom ParisTech

Polytechnic Institute of Porto

University of Porto

INESC TEC - Laboratório Associado