Forming valid ERDDAP requests

Submitted by pwadmin on Thu, 04/05/2018 - 08:41
 

This example demonstrates requesting information about at dataset to help make valid data requests. Datasets vary in structure and this aims to provide users with tools to help make requests from ERDDAP. We will first request information about the dataset, then find specific information about time, latitude, longitude and altitude coverage and use these to make a data request.

In this example we download a netcdf file, but the same workflow could be used for image requests or any other format that ERDDAP provides. This code is presented as an interactive Python notebook. You can access the notebook directly in our examples repository.

In [208]:
import urllib3
import json
import certifi
from netCDF4 import Dataset as netcdf_dataset
import math
In [209]:
# A function that fetches metadata from an ERDDAP server
# Provides details about the dataset that we can use to
# formulate our data request. We do this to determine the available
# extent (time, lat and lon) of the dataset which prevents us from making 
# unreasonable data requests that the server will reject

#Input: ERDDAP ID
#Output: Info about dataset for checking extents and forming data query

def getDatasetInfo(datasetId):

    http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())

    metadataUrl = 'https://polarwatch.noaa.gov/erddap/info/'+datasetId+'/index.json'

    try:
        response = http.request('GET', metadataUrl)
    except HTTPError as e:
        print('The server couldnt fullfill the request')
        print('Error Code: ', e.code)
    except URLError as e:
        print('Failed to reach erddap server for: '+ metadataUrl)
        print('reason: ', e.reason)
    else:
        # load information about dataset
        datasetMeta = json.loads(response.data.decode('utf-8'))
               
    return datasetMeta
In [210]:
# A function that specifically pulls out dimension info from the 
# broader metadata info returned from the ERDDAP server

def getDimensions(datasetInfo):

    dimensions = []
    for x in datasetInfo['table']['rows']:
        
        dimension = {}
        # pull out info from first line in dimension
        if x[0] == 'dimension':
            dimension = {}
            dimension["name"] = x[1]
            dh = x[4].split(',')
            for dhi in dh:
                dhi_split = dhi.split('=')
                dhi_key = dhi_split[0].lstrip()
                dimension[dhi_key] = dhi_split[1]
            dimensions.append(dimension)
    
    # Get dimension detials
    for x in datasetInfo['table']['rows']:
        for dimension in dimensions:
            if x[0] == 'attribute' and x[1] == dimension["name"]:
                dimensionFieldValue=x[4].split(',')
                if len(dimensionFieldValue) > 1:
                    dimension[x[2]] = dimensionFieldValue
                elif len(dimensionFieldValue) == 1:
                    dimension[x[2]] = dimensionFieldValue[0]
                else:
                    pass

    return dimensions
In [211]:
# A function that specifically pulls out global attribute info from the 
# broader metadata info returned from the ERDDAP server
# input is ERDDAP dataset info as json object
# In this example we use this to get the time_coverage_end for our time query

def getGlobalAttribute(datasetInfo, attributeName):
    for x in datasetInfo['table']['rows']:
        if x[0] == 'attribute' and x[1] == 'NC_GLOBAL':
            if x[2] == attributeName:
                attributeValue = x[4]
    return attributeValue
In [212]:
# A function that specifically pulls out parameter info from the 
# broader metadata info returned from the ERDDAP server
# input is ERDDAP dataset info as json object
# output is a list of parameters (variables) for the dataset

def getParameters(datasetInfo):
    parameters = []
    for x in datasetInfo['table']['rows']:
        if x[0] == 'variable':
            parameters.append(x[1])
    return parameters
In [213]:
# Here are two different datasets from the PolarWatch ERDDAP
# You can uncomment the second option to see how the url requests for 
# each of these datasets are formed differently

datasetId = 'jplMURSST41'
#datasetId = 'erdTAssh1day_LonPM180'

# Get information about this dataset from ERDDAP
# Info is returned as an object that we can parse for relevant info
datasetInfo = getDatasetInfo(datasetId)
In [214]:
# Use the information returned by ERDDAP to piece together a valid time query
# Here we look for the latest available timestamp 
# and make the corresponding query string

latestTime = getGlobalAttribute(datasetInfo, 'time_coverage_end')
print(latestTime)

timeQuery = '[(%s):%s:(%s)]' % (latestTime,'1',latestTime)
print(timeQuery)
 
2018-04-03T09:00:00Z
[(2018-04-03T09:00:00Z):1:(2018-04-03T09:00:00Z)]
In [215]:
# Here we use the dimension info returned by ERDDAP to form
# the altitude, latitude and longitude queries

datasetDimensions = getDimensions(datasetInfo)

altQuery = 0

# Search through the dimension list for the ones we are interested in
for dimension in datasetDimensions:
    
    if dimension['standard_name'] == 'altitude':
        # create a default altitude, for satellite data expect there to be one altitude specified, if at all
        defaultAlt = dimension['actual_range'][0].strip()
        altQuery = '[(%s):%s:(%s)]' % (defaultAlt,'1',defaultAlt)
    
    if dimension['standard_name'] == 'latitude':
       
        
        # to make sure we aren't requesting to large of a file we can check the number of 
        # data points along this dimension and reduce our request if needed
        # here I reduce it to be no larger than 500 values
        if float(dimension['nValues']) > 500:
            spacing = math.floor(float(dimension['nValues'])/500)
        else:
            spacing = 1
            
        # Latitude values can either be increasing or decreasing depending on the dataset
        # We can use averageSpacing to determine which way this dataset is setup
        # and use that to form our query string appropriately
        # Here we query the full spatial extent of the data    
            
        if float(dimension['averageSpacing']) >= 0:
            startLat = dimension['actual_range'][0].strip()
            endLat = dimension['actual_range'][1].strip()
            
        else:
            startLat = dimension['actual_range'][1].strip()
            endLat = dimension['actual_range'][0].strip()
        
        latQuery = '[(%s):%f:(%s)]' % (startLat, spacing, endLat)
            
    if dimension['standard_name'] == 'longitude':
        # to make sure we aren't requesting to large of a file we can check the number of 
        # data points along this dimension and reduce our request if needed
        # here I reduce it to be no larger than 500 values
        if float(dimension['nValues']) > 500:
            spacing = math.floor(float(dimension['nValues'])/500)
        else:
            spacing = 1
        startLon = dimension['actual_range'][0].strip()
        endLon = dimension['actual_range'][1].strip()
        lonQuery = '[(%s):%s:(%s)]' % (startLon, spacing, endLon)

if altQuery != 0:
    print('This dataset has an altitude dimension')
    
print(latQuery)
print(lonQuery)
 
[(-89.99):35.000000:(89.99)]
[(-179.99):72:(180.0)]
In [216]:
# Get a variable list for this dataset
# For this demo we use the first variable in the list

datasetParameters = getParameters(datasetInfo)

param = datasetParameters[0]

print(param)
 
analysed_sst
In [ ]:
# Put all our query strings together and form our request url

if altQuery != 0:
    query = param + timeQuery + altQuery + latQuery + lonQuery
else:
    query = param + timeQuery + latQuery + lonQuery
    
base_url = 'https://polarwatch.noaa.gov/erddap/griddap/'+ datasetId +'.nc?'

requestUrl = base_url + query

print(requestUrl)
In [217]:
# Request the data from ERDDAP

# Filename, location to store the requested data
file = 'dataset.nc'

http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
r = http.request('GET', requestUrl, preload_content=False)

with open(file, 'wb') as out:
    while True:
        data = r.read(1024*1024)
        if not data:
            break
        out.write(data)

r.release_conn()

#netcdf data object
dataset = netcdf_dataset(file)

# Now we have a netcdf dataset that we can work with in python

 
https://polarwatch.noaa.gov/erddap/griddap/jplMURSST41.nc?analysed_sst[(2018-04-03T09:00:00Z):1:(2018-04-03T09:00:00Z)][(-89.99):35.000000:(89.99)][(-179.99):72:(180.0)]