Module pyl4c.lib.netcdf
Utilities for working with NetCDF files.
Functions
def nc_dim(variable)
-
Expand source code
def nc_dim(variable): ''' Returns the dimensions of a netCDF variable. Parameters ---------- variable : scipy.io.netcdf.netcdf_variable Returns ------- int ''' if hasattr(variable, 'ndim'): return variable.ndim if hasattr(variable, 'dimensions'): return len(variable.dimensions)
Returns the dimensions of a netCDF variable.
Parameters
variable
:scipy.io.netcdf.netcdf_variable
Returns
int
def netcdf_array(nc, keys, cell_size=None, time_idx=0, x_offset=-180, scale_and_offset=True)
-
Expand source code
def netcdf_array( nc, keys, cell_size = None, time_idx = 0, x_offset = -180, scale_and_offset = True): ''' Extracts an equirectangular NetCDF data array as a NumPy array, with spatial reference system (SRS) information. NOTE: Both `cell_size` elements should be positive numbers. Parameters ---------- nc : netCDF4.Dataset The NetCDF file (opened) keys : tuple A tuple of (variable, x_coords, y_coords) where variable is the variable name, x_coords the name of the X-coordinate array variable, y_coords the name of the Y-coordinate array variable cell_size : tuple or None (Optional) Tuple of the the cell size in both directions in degrees, e.g., (1, 1); will attempt to infer the cell size if not provided time_idx : int (Optional) Index of the "time" axis to extract as a raster (Default: 0) x_offset : int or float (Optional) Because of the absurd X-coordinates convention used in some NetCDF arrays, it may be necessary to "offset" the X coordinates; this is only used if it is detected that the maximum X coordinate value is >/= 180 degrees E (Default: -180); scale_and_offset : bool (Optional) True to apply the scale and offset to the data, if the attributes "scale_factor" and "add_offset" are found (Default); False to do nothing Returns ------- tuple Tuple of `(array, gt, wkt)` ''' def infer_cell_size(x_coords, y_coords): assert x_coords.ndim == 1,\ "Can't infer cell size from coordinate arrays with more than one axis" # NOTE: Rounding coordinate arrays to 5 decimal places for comparison x_diff = np.round(np.array(x_coords)[1:] - np.array(x_coords)[0:-1], 4) y_diff = np.round(np.array(y_coords)[1:] - np.array(y_coords)[0:-1], 4) assert np.median(x_diff) == x_diff[0] and np.median(y_diff) == y_diff[0],\ "Array cell size is not equal in both directions!" return (np.abs(np.median(x_diff)), np.abs(np.median(y_diff))) # Original (equirectangular) spatial reference system wkt0 = osr.SpatialReference() wkt0.ImportFromEPSG(4326) if cell_size is not None: x_res, y_res = cell_size assert x_res > 0 and y_res > 0, 'Argument cell_size must be greater than zero' else: x_res, y_res = infer_cell_size( np.array(nc.variables[keys[1]][:]), np.array(nc.variables[keys[2]][:])) x0 = np.array(nc.variables[keys[1]][:]).min() y0 = np.array(nc.variables[keys[2]][:]).max() # Geotransform params: (x_min, pixel_width, 0, y_max, 0, -pixel_height) gt0 = (x0, x_res, 0, y0, 0, -y_res) # NOTE: Because some NetCDF files have a "degrees east" convention with # longitude increasing monotonically from 0 to 360, we must apply an # offset to the X coordinates start_idx = None if np.array(nc.variables[keys[1]][:]).max() > 180: # Update affine transformation gt0 = (x0 + x_offset, x_res, 0, y0, 0, -y_res) # Transform to proper longitudes x_coords = (nc.variables[keys[1]][:] + x_offset) # Get X-coordinate origin, or index of east longitude closest to zero, # unless all coordinates are in a single hemisphere test = np.logical_and( x_coords > 0, x_coords <= np.abs(x_coords).min()) if np.any(test): idx = np.arange(0, x_coords.shape[0]) start_idx = int(idx[x_coords == x_coords[test]]) # Extract a NumPy array if hasattr(time_idx, 'index'): a, b = (min(time_idx), max(time_idx)) arr = np.array(nc.variables[keys[0]][a:(b+1),...]) else: # Make sure a single time slice as a band axis if nc_dim(nc.variables[keys[0]]) > 2: arr = np.array(nc.variables[keys[0]][time_idx,...])[np.newaxis,...] else: # Some netCDF variables may have no time/ band axis (i.e., static) arr = np.array(nc.variables[keys[0]][:])[np.newaxis,...] # If Y coordinates are in ascending order, flip the array lats = nc.variables[keys[2]][:].tolist() if lats.index(max(lats)) > lats.index(min(lats)): arr = np.flip(arr, axis = 1) # If necessary, re-sort data so west-most longitude is the first column if start_idx is not None: arr = np.concatenate((arr[...,start_idx:], arr[...,:start_idx]), axis = 2) scale = 1 if scale_and_offset and hasattr(nc.variables[keys[0]], 'scale_factor'): scale = nc.variables[keys[0]].scale_factor offset = 0 if scale_and_offset and hasattr(nc.variables[keys[0]], 'add_offset'): offset = nc.variables[keys[0]].add_offset return ((arr * scale) + offset, gt0, wkt0.ExportToWkt())
Extracts an equirectangular NetCDF data array as a NumPy array, with spatial reference system (SRS) information. NOTE: Both
cell_size
elements should be positive numbers.Parameters
nc
:netCDF4.Dataset
- The NetCDF file (opened)
keys
:tuple
- A tuple of (variable, x_coords, y_coords) where variable is the variable name, x_coords the name of the X-coordinate array variable, y_coords the name of the Y-coordinate array variable
cell_size
:tuple
orNone
- (Optional) Tuple of the the cell size in both directions in degrees, e.g., (1, 1); will attempt to infer the cell size if not provided
time_idx
:int
- (Optional) Index of the "time" axis to extract as a raster (Default: 0)
x_offset
:int
orfloat
- (Optional) Because of the absurd X-coordinates convention used in some
NetCDF arrays, it may be necessary to "offset" the X coordinates; this
is only used if it is detected that the maximum X coordinate value is
/= 180 degrees E (Default: -180);
scale_and_offset
:bool
- (Optional) True to apply the scale and offset to the data, if the attributes "scale_factor" and "add_offset" are found (Default); False to do nothing
Returns
tuple
- Tuple of
(array, gt, wkt)
def netcdf_raster(nc,
keys,
cell_size=None,
time_idx=0,
subset_bbox=None,
in_nodata=None,
out_nodata=-9999)-
Expand source code
def netcdf_raster( nc, keys, cell_size = None, time_idx = 0, subset_bbox = None, in_nodata = None, out_nodata = -9999): ''' Dumps a NetCDF variable to a GeoTIFF file. NOTE: Both `cell_size` elements should be positive numbers. Parameters ---------- nc : netCDF4.Dataset The NetCDF file (opened) keys : tuple A tuple of (variable, x_coords, y_coords) where variable is the variable name, x_coords the name of the X-coordinate array variable, y_coords the name of the Y-coordinate array variable cell_size : tuple or None (Optional) Tuple of the the cell size in both directions in degrees, e.g., (1, 1); will attempt to infer the cell size if not provided time_idx : int (Optional) Index of the "time" axis to extract as a raster (Default: 0) subset_bbox : tuple or None (Optional) Subset bounding box in_nodata : int or float or None (Optional) The NoData value to ignore in the input out_nodata : int or float or None (Optional) The NoData value to set in the output (Default: -9999) Returns ------- gdal.Dataset ''' arr, gt0, wkt0 = netcdf_array(nc, keys, cell_size, time_idx) xmin = ymin = 0 if subset_bbox is not None: x_coords = nc.variables[keys[1]][:] y_coords = nc.variables[keys[2]][:] if np.array(nc.variables[keys[1]][:]).max() > 180: # Correct for weird NetCDF easting x_coords = nc.variables[keys[1]][:] - 180 x_idx, y_idx = get_slice_idx_by_bbox( x_coords, y_coords, subset_bbox = subset_bbox) xmin, xmax = x_idx ymin, ymax = y_idx if arr.ndim == 2: arr = arr[ymin:ymax, xmin:xmax] else: arr = arr[:, ymin:ymax, xmin:xmax] # Optional: Mask user-specified NoData value if in_nodata is not None: arr = np.where(arr == in_nodata, np.nan, arr) # Mask any NoData/ fill value defined by the dataset if hasattr(nc.variables[keys[0]], 'missing_value'): arr = np.where( arr == nc.variables[keys[0]].missing_value, np.nan, arr) # Create a gdal.Dataset from the array if out_nodata is not None: return array_to_raster( np.where( np.isnan(arr), out_nodata, arr), gt0, str(wkt0), xmin, ymin) return array_to_raster(arr, gt0, str(wkt0), xmin, ymin)
Dumps a NetCDF variable to a GeoTIFF file. NOTE: Both
cell_size
elements should be positive numbers.Parameters
nc
:netCDF4.Dataset
- The NetCDF file (opened)
keys
:tuple
- A tuple of (variable, x_coords, y_coords) where variable is the variable name, x_coords the name of the X-coordinate array variable, y_coords the name of the Y-coordinate array variable
cell_size
:tuple
orNone
- (Optional) Tuple of the the cell size in both directions in degrees, e.g., (1, 1); will attempt to infer the cell size if not provided
time_idx
:int
- (Optional) Index of the "time" axis to extract as a raster (Default: 0)
subset_bbox
:tuple
orNone
- (Optional) Subset bounding box
in_nodata
:int
orfloat
orNone
- (Optional) The NoData value to ignore in the input
out_nodata
:int
orfloat
orNone
- (Optional) The NoData value to set in the output (Default: -9999)
Returns
gdal.Dataset
def parse_time_units(netcdf_time_axis)
-
Expand source code
def parse_time_units(netcdf_time_axis): ''' Returns the time interval size (e.g., "days") and the epoch (e.g., "1970-01-01") from a given netCDF units declaration for the time axis (e.g., "days since 1970-01-01"). Parameters ---------- netcdf_time_axis : netCDF4._netCDF4.Variable Variable that describes the time axis Returns ------- tuple Tuple of (str, datetime.datetime) ''' units_string = netcdf_time_axis.units if isinstance(units_string, bytes): units_string = bytes(units_string).decode() interval, epoch_str = TIME_RX.match(units_string).groups() epoch_str = epoch_str.replace('-1-1', '-01-01') if epoch_str.rfind('00:00:00') > 0: epoch = datetime.datetime.strptime(epoch_str, '%Y-%m-%d %H:%M:%S') elif epoch_str.rfind('00:00') > 0: epoch = datetime.datetime.strptime(epoch_str, '%Y-%m-%d %H:%M') else: epoch = datetime.datetime.strptime(epoch_str, '%Y-%m-%d') return (interval, epoch)
Returns the time interval size (e.g., "days") and the epoch (e.g., "1970-01-01") from a given netCDF units declaration for the time axis (e.g., "days since 1970-01-01").
Parameters
netcdf_time_axis
:netCDF4._netCDF4.Variable
- Variable that describes the time axis
Returns
tuple
- Tuple of (str, datetime.datetime)
def spatial_average(nc, keys, reducer, t_labels=None, subset_bbox=None, nodata=-9999)
-
Expand source code
def spatial_average( nc, keys, reducer, t_labels = None, subset_bbox = None, nodata = -9999): ''' Calculates a spatial average across a NetCDF time series variable. Parameters ---------- nc : netCDF4.Dataset The NetCDF file (opened) keys : tuple A tuple of (variable, x_coords, y_coords) where variable is the variable name, x_coords the name of the X-coordinate array variable, y_coords the name of the Y-coordinate array variable reducer : function A vectorized function that returns a scalar for an input vector of values; should ignore np.nan values t_labels : tuple or list (Optional) A vector of time index labels subset_bbox : tuple (Optional) To specify a subset area for the spatial average (i.e., instead of averaging over the entire NetCDF array domain), pass a bounding box argument nodata : int or float The NoData value to ignore (Default: -9999) Returns ------- list Of the form `[(index, value), ...]` for each time step ''' if subset_bbox is not None: x_coords = nc.variables[keys[1]][:] y_coords = nc.variables[keys[2]][:] if np.array(nc.variables[keys[1]][:]).max() > 180: # Correct for weird NetCDF easting x_coords = nc.variables[keys[1]][:] - 180 x_idx, y_idx = get_slice_idx_by_bbox( x_coords, y_coords, subset_bbox = subset_bbox) xmin, xmax = x_idx ymin, ymax = y_idx scale = 1 if hasattr(nc.variables[keys[0]], 'scale_factor'): scale = nc.variables[keys[0]].scale_factor offset = 0 if hasattr(nc.variables[keys[0]], 'add_offset'): offset = nc.variables[keys[0]].add_offset num_epochs = nc.variables[keys[0]].shape[0] time_series = [] for i, t in enumerate(range(0, num_epochs)): if subset_bbox is not None: arr = nc.variables[keys[0]][:][t, ymin:ymax, xmin:xmax] else: arr = nc.variables[keys[0]][:][t,...] index = t if t_labels is not None: index = t_labels[i] time_series.append(( # Filter out NoData; use summary function that ignores NaNs index, reducer( # Multiply by scale, add offset np.where(arr == nodata, np.nan, (arr * scale) + offset)) )) return time_series
Calculates a spatial average across a NetCDF time series variable.
Parameters
nc
:netCDF4.Dataset
- The NetCDF file (opened)
keys
:tuple
- A tuple of (variable, x_coords, y_coords) where variable is the variable name, x_coords the name of the X-coordinate array variable, y_coords the name of the Y-coordinate array variable
reducer
:function
- A vectorized function that returns a scalar for an input vector of values; should ignore np.nan values
t_labels
:tuple
orlist
- (Optional) A vector of time index labels
subset_bbox
:tuple
- (Optional) To specify a subset area for the spatial average (i.e., instead of averaging over the entire NetCDF array domain), pass a bounding box argument
nodata
:int
orfloat
- The NoData value to ignore (Default: -9999)
Returns
list
- Of the form
[(index, value), …]
for each time step
def time_series(netcdf_time_axis, has_leap=None)
-
Expand source code
def time_series(netcdf_time_axis, has_leap = None): ''' Constructs a series of datetime.datetime elements based on a netCDF time axis; assumes that time steps are evenly spaced (except in leap years) and ignores fractional time steps. NOTE: This is EXTREMELY difficult to get right because of awful implementation of time axes in the typical netCDF dataset. Among other challenges... Apparently, netCDF time axes use the convention that if the epoch starts at midnight, the first day is the "zeroth" day; e.g., CLM5.0 states its time axis is "days since 1700-01-01 00:00:00" but the first time step is 31.0; it seems highly unlike this monthly time series would start in February (Jan. 1 + 31 days == Feb. 1), so they must be using the *last day* of each month (e.g., Jan 0 + 31 days == Jan. 31). Parameters ---------- netcdf_time_axis : netCDF4._netCDF4.Variable Variable that describes the time axis has_leap : bool or None Set True to force recognition of leap years; if None, will attempt to automatically determine whether leap years should be recognized based on the field metadata (Default: None) Returns ------- list `[*datetime.date|datetime.datetime]` ''' steps = netcdf_time_axis[:] interval, epoch = parse_time_units(netcdf_time_axis) convention = getattr(netcdf_time_axis, 'calendar', '') if hasattr(convention, 'decode'): convention = convention.decode('utf-8') # Attempt to determine leap year recognition if has_leap is None: has_leap = convention in ('', 'standard', 'gregorian') first_step = steps[1] - steps[0] # Basically, we can tolerate different step sizes for intervals that # can be passed to datetime.timedelta, which assumes leap years; # in all other cases, we can't calculate time steps correctly if not has_leap: if not np.all(np.array(steps[1:]) - np.array(steps[:-1]) == first_step): assert interval in ('seconds', 'hours', 'days'),\ 'Time series is not evenly spaced and leap years do not explain this discrepancy' ldays = 1 if has_leap else 0 ndays = 365 if NDAYS_RX.match(convention) is not None: # Some datasets oddly have, e.g., "360_day" conventions ndays = int(NDAYS_RX.match(convention).groups()[0]) # Because timedelta can't process intervals higher than "days"... if interval == 'years': # Simple years series starting January 1 if units are years years = np.arange(epoch.year, epoch.year + steps[-1] + 1) time_axis = [datetime.date(int(y), 1, 1) for y in years] elif interval == 'months': time_axis = [] m_range = (np.arange(0, len(steps)) + epoch.month) % 12 m_range = np.where(m_range == 0, 12, m_range) for t, t_value in enumerate(steps): m = m_range[t] y = epoch.year + np.floor(t_value / 12) time_axis.append(datetime.date(int(y), int(m), epoch.day)) # But if time intervals are in "seconds," "hours," or "days" elif interval in ('seconds', 'hours', 'days') and has_leap: time_axis = [ epoch + datetime.timedelta(**{interval: int(t)}) for t in steps ] # But timedelta assumes leap years are in use, so if the time series # doesn't use leap years, we must construct it manually else: time_axis = [] this_year = epoch.year mdays = np.cumsum(calendar.mdays) for t in steps: if interval == 'seconds': y = epoch.year + (t / (ndays * 24 * 60 * 60)) if interval == 'hours': y = epoch.year + (t / (ndays * 24)) if interval in ('seconds', 'hours'): m = bisect_left(mdays, ((y - epoch.year) * ndays) % ndays) d = (((y - epoch.year) * ndays) % ndays) - mdays[m - 1] if interval == 'days': y = epoch.year + (t / ndays) # e.g., t / 365 # Incidentally, the left "sorting index" is the month number m = bisect_left(mdays, t % ndays) m += 1 if m == 0 else 0 d = (t % ndays) - mdays[m - 1] # And days are what is left over d += 1 if d == 0 else 0 time_axis.append(datetime.datetime(int(y), int(m), int(d))) return time_axis
Constructs a series of datetime.datetime elements based on a netCDF time axis; assumes that time steps are evenly spaced (except in leap years) and ignores fractional time steps.
NOTE: This is EXTREMELY difficult to get right because of awful implementation of time axes in the typical netCDF dataset. Among other challenges… Apparently, netCDF time axes use the convention that if the epoch starts at midnight, the first day is the "zeroth" day; e.g., CLM5.0 states its time axis is "days since 1700-01-01 00:00:00" but the first time step is 31.0; it seems highly unlike this monthly time series would start in February (Jan. 1 + 31 days == Feb. 1), so they must be using the last day of each month (e.g., Jan 0 + 31 days == Jan. 31).
Parameters
netcdf_time_axis
:netCDF4._netCDF4.Variable
- Variable that describes the time axis
has_leap
:bool
orNone
- Set True to force recognition of leap years; if None, will attempt to automatically determine whether leap years should be recognized based on the field metadata (Default: None)
Returns
list
[*datetime.date|datetime.datetime]