Virtual Research Cruise#

In this project, we will perform virtual ship-based “observations” in an Ocean simulation dataset and compare those with equivalent observations taken in the real Ocean.

The main challenges are:

  • handling different data structures

  • unit conversions

  • conversion between different data structures

  • comparative visualisation of the virtual and the real observations.

The real-ocean observations#

We use a CTD dataset from the AL540 cruise published on the earth-system data repository Pangaea. The dataset has a DOI which is 10.1594/PANGAEA.969014 and which points to this URL: https://doi.pangaea.de/10.1594/PANGAEA.969014

To obtain the actual raw data, we could use the Python package pangaeapy which downloads the data belonging to a given Pangaea-DOI and converts it to a Pandas data frame. In 02a_virtual_research_cruise_AL540_preparation.ipynb, there is a notebook which implements this step and then processes the data to make them a little easier to use.

To use the prepared data, load the file:

"/home/jovyan/shared_materials/projects_data/02_virtual_research_cruise_AL540_CTD_cleaned.csv"

If you are interested in the application of pangaeapy, have a look at 02a_virtual_research_cruise_AL540_preparation.ipynb.

The Ocean simulation dataset#

We use data representing the Baltic Sea which is distributed by the Copernicus Marine Service. All information about the data can be found here: https://doi.org/10.48670/moi-00013

To get access to this dataset, we will use the copernicusmarine toolbox. To access the daily data, we use the dataset_id "cmems_mod_bal_phy_my_P1D-m".

We can obtain an Xarray dataset with the data as follows:

import copernicusmarine

copernicusmarine.login()

data_set = copernicusmarine.open_dataset(
    dataset_id="...",
    service="arco-time-series",
)

To be able to use the copernicus data, you need an account with the Copernicus Marine Service. Registration takes a minute and can be done here: https://data.marine.copernicus.eu/register

Alternative: Pre-downloaded ocean simulatino dataset#

If you are experiencing long time outs when working with the remote Copernicus data directly, you can also load a pre-downloaded netCDF file with the data cropped to only that part of the Baltic Sea covered by AL540 and to a time period around AL540 which is available in:

"/home/jovyan/shared_materials/projects_data/02_virtual_research_cruise_cmems_mod_bal_phy_my_P1D-m_cropped.nc"

Tasks#

  1. Inspect the data frame with the real-ocean observations AL540. Find the columns for horizontal location (longitude and latitude), depth, time, temperature, salinity. What does the “Event” column refer to?

  2. Visualize one temperature profile from the AL540 dataset.

  3. Inspect the Ocean simulation dataset. What are the variables containing temperature and salinity?

  4. Select a horizontal location from the from the Ocean simulation dataset, pick a time, and visualize a temperature profile at the selected location and selected time.

  5. Select one Event from the AL540 dataset and try to programmatically (ie there is no manual entry of the location and time allowed!) select the corresponding temperature profile from the Ocean simulation dataset.

  6. Now, create the virtual research cruise observations: Select temperature profiles for all Events in the AL540 datasets. (There are may ways to do this. One possibility is to build a workflow using techniques from the “Program Flow” section of the course. Another is based on coverting a processed version of the AL540 data frame to Xarray and then using Xarray’s selection with data arrays.)

  7. Think about how to turn the profile data selected from the Ocean simulation data into a structure that makes it as easy as possible to relate the virtual observations to the real-ocean ones. You may also want to include salinity data.

  8. How can you best compare the virtual and the real observations?