3.5 years of ESCAPE at JIVE - what's in it for our community?
by Marjolein Verkouter, Des Small, Mark Kettenis, Aard Keimpema (JIVE, the Netherlands)
Over the past 3.5 years JIVE staff have spent their time developing several software components to integrate the European VLBI Network data, and the software to calibrate and analyse it, into the European Open Science Cloud (EOSC). All of this can be found at https://jupyterhub.jive.eu. This was made possible because of JIVE being a partner in the H2020 European Science Cluster of Astronomy and Particle Physics ESFRI Research Infrastructures (ESCAPE) project, a collaboration of the largest European astronomy and particle physics ESFRI landmarks (CTA, KM3Net, SKA) and pan-European research infrastructures such as CERN, ESO, and JIVE.
The EOSC is part of the European Commission's vision on open science using FAIR data standards in a distributed European-wide framework. What this means is that data from European Research Infrastructures must be Findable, Accessible, Interoperable and Reusable. Within the EOSC, the software to process that data must also be openly available and guaranteed to meet certain minimum quality standards to make sure science results obtained with it are valid, and that claims laid down in publications can be verified by others by repeating the analysis, or, ultimately, improve on it or extract new and different science results.
To address these ambitious goals, the 15 M€ ESCAPE project consisted of seven work packages, out of which JIVE participated in three:
- Open-source scientific Software and Service Registry (OSSR);
- Connecting ESFRI projects to EOSC through VO framework (CEVO);
- ESFRI Science Analysis Platform (ESAP).
The output of OSSR is a curated list of known-good open-source software from the ESFRI partners. During the project the software- and metadata requirements, guidelines and an on-boarding procedure were developed. An essential part of the OSSR work package was to allow partners to improve their analysis software as well as working towards providing containerised images, documentation and an introductory talk on the partner's OSSR contribution: a presentation explaining and demonstrating use of the software.
For JIVE OSSR meant two things: the ability to improve several VLBI-related tasks in CASA (fringefit, importfitsidi, accor) and onboard a JupyterCASA kernel with those updated tools into the OSSR. It is now possible for users to do Jupyter-notebook based (VLBI) radio data processing using that kernel.
As the name suggests, the CEVO workpackage targets making ESFRI data findable through Virtual Observatory (VO) protocols. For the visibility data in the EVN Archive at JIVE this meant helping define the VO standard(s) to describe visibility data. ESCAPE funding allowed JIVE staff to participate in the International Virtual Observatory Alliance (IVOA)'s Radio Special Interest Group, to work with experts from around the world to widen the ObsCore definition for this purpose.
In order to test/experiment this proposed visibility data metadata extension a test installation of an ObsTAP VO-service was purchased and installed at JIVE. A script to extract and compute the necessary metadata from the FITS files in the EVN Archive was written and let grovel over the entire EVN Archive. After publishing the EVN ObsTAP service in the VO, the entire EVN Archive can now easily be queried using the VO's ADQL language, from Python or a webform, or using a VO-aware tool such as Aladin, and let results be easily combined with queries across other VO-enabled archives. The VO-based query mechanism is at least an order of magnitude faster than the fitsfinder.php VO precursor script currently in use on the EVN Archive. It is expected that fitsfinder.php will be replaced by a VO-query powered form. Included in the VO search results are DataLink entries to the raw FITS files (pointing to the current archive location) as well as the a-priori amplitude calibration (ANTAB format) and FLAG (UVFLG) tables.
To increase findability and citability of EVN data sets JIVE has set up internal infrastructure to allow Digital Object Identifiers (DOIs) to be published for data sets in the EVN Archive. Pending final verification these should become available to cite the use of EVN data in publications in the near future.
In and of themselves these developments are already an improvement for the user. The third work package, ESAP, integrates the work from the previous two. A science analysis platform "allows running known-good, containerised software from OSSR on data from multiple ESFRI RIs that is easily found, providing a record of the analysis and its results - provenance".
The ability to work with experts from high-energy and particle physics allowed JIVE to develop a deeply integrated platform which provides compute and storage for users near the EVN Archive: the Jupyterhub at JIVE. After login, the system spawns a Virtual Machine running the JupyterCASA kernel on the user's behalf. Within the system notebooks can be created, copied, and eventually run to process data from e.g., the EVN Archive. Two plugins were developed to facilitate ease of use:
- the publishing version controlled notebooks; this plugin allows the user to experiment with several data reduction approaches, keeping track of branches, forks, additions and deletions. This is a per-user, private, git repository. For citability purposes, the plugin allows a user to publish revision(s) of private notebooks in the public archive, after which they become citable;
- a VO-powered EVN Archive search plugin, allowing for quick search by experiment code (such as "N22L2") or execute a cone search around a position. For individual search results a notebook with the best-known basic EVN calibration strategy at that time can be opened with a single click. The notebook will be pre-configured to download the data, calibration, and flag tables using the DataLink(s) from the search results.
JIVE's goal is to open access to the Jupyterhub service for eduGAIN use. Before being allowed to federate with eduGAIN, the service has to be connected to the production SURFconext (Dutch NREN Identity and Service Provider) first. This work has been completed; the JIVE Jupyterhub is now an official production service in SURFconext and enables members of Dutch higher education institutions to allow their users to use the Jupyterhub service. Work is in progress to set up the necessary, but quite different, infrastructure to allow users to log in to the Jupyterhub at JIVE using their home institute's credentials through eduGAIN.
It is our belief that by opening up an easy-to-use data processing environment, providing compute, network, and storage resources near the EVN Archive, the barrier to entry into (EVN) VLBI data reduction has been lowered significantly. Furthermore, improved citability of data and calibration/analysis workflows brings EVN data, its reduction, and publication of results more in line with 21st century scientific practices.
ESCAPE - The European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement no. 824064.