The Workflows and Distributed Computing team at the Barcelona Supercomputing Center – Centro Nacional de Supercomputación (BSC-CNS) –one of the research groups in our network– presents dislib 1.0.0 (Distributed Computing Library). This tool provides ready-to-use distributed algorithms, with a strong focus on machine learning and, more recently, on the distributed training of neural networks. Its main objective is to facilitate the execution of big data analytics workflows on distributed platforms such as clusters, clouds, and supercomputers. dislib is implemented on the PyCOMPSs programming model, the Python binding of COMPSs.

dislib is based on a distributed data structure, the ds-array, which allows for the parallel and distributed execution of machine learning methods. The library is implemented as a PyCOMPSs application, where methods are defined as tasks and run transparently in parallel. As a result, users can write simple Python scripts without having to manage the details of parallelization, through an interface closely aligned with scikit-learn. dislib provides methods for clustering, classification, regression, decomposition, model selection, neural network training, and data management.

Since its inception, dislib has been applied in several real-world use cases, including astrophysics (DBSCAN with data from the GAIA mission), molecular dynamics workflows (Dura and PCA within the BioExcel CoE), and multiple applications in the eFlows4HPC project, such as urgent computing for natural hazards and neural networks. It has also been used in the AI-SPRINT project for personalized healthcare in the detection of atrial fibrillation using Random Forest models.

dislib 1.0.0 includes further refinements, updated examples, and a new user guide. The code is open source and available for download.

This is one of the technologies featured in the X4HPC Portfolio, available online.

________________________________________

The BSC’s Workflows and Distributed Computing group aims to provide tools and mechanisms that enable the transparent sharing, selection, and aggregation of a wide variety of geographically distributed computing resources. The research conducted by this team builds upon the group’s previous experience and extends it to aspects of distributed computing that can benefit from that experience. The BSC team maintains a strong focus on scheduling models and on resource management and planning in distributed computing environments.

Back Back to news