pynetcf package¶
Submodules¶
pynetcf.base module¶
Base classes for reading and writing time series and images in NetCDF files using Climate Forecast Metadata Conventions (http://cfconventions.org/).
- class pynetcf.base.Dataset(filename, name=None, file_format='NETCDF4', mode='r', zlib=True, complevel=4, autoscale=True, automask=True)[source]¶
Bases:
object
NetCDF file wrapper class that makes some things easier
- Parameters:
filename (string) – filename of netCDF file. If already exiting then it will be opened as read only unless the append keyword is set. if the overwrite keyword is set then the file will be overwritten
name (string, optional) – will be written as a global attribute if the file is a new file
file_format (string, optional) – file format
mode (string, optional) –
access mode. default “r” “r” means read-only; no data can be modified. “w” means write; a new file is created, an existing file with the
same name is deleted.
- ”a” and “r+” mean append (in analogy with serial files); an existing
file is opened for reading and writing.
Appending s to modes w, r+ or a will enable unbuffered shared access to NETCDF3_CLASSIC or NETCDF3_64BIT formatted files. Unbuffered access may be useful even if you don”t need shared access, since it may be faster for programs that don”t access data sequentially. This option is ignored for NETCDF4 and NETCDF4_CLASSIC formatted files.
zlib (boolean, optional) – Default True if set netCDF compression will be used
complevel (int, optional) – Default 4 compression level used from 1(low compression) to 9(high compression)
autoscale (bool, optional) – If disabled data will not be automatically scaled when reading and writing
automask (bool, optional) – If disabled data will not be masked during reading. This means Fill Values will be used instead of NaN.
- append_var(name, data, **kwargs)[source]¶
append data along unlimited dimension(s) of variable
- Parameters:
name (string) – Name of variable to append to.
data (numpy.array) – Numpy array of correct dimension.
- Raises:
IOError – if appending to variable without unlimited dimension
- read_var(name)[source]¶
reads variable from netCDF file
- Parameters:
name (string) – name of the variable
- write_var(name, data=None, dim=None, attr={}, dtype=None, zlib=None, complevel=None, chunksizes=None, **kwargs)[source]¶
Create or overwrite values in a NetCDF variable. The data will be written to disk once flush or close is called
- Parameters:
name (str) – Name of the NetCDF variable.
data (np.ndarray, optional) – Array containing the data. if not given then the variable will be left empty
dim (tuple, optional) – A tuple containing the dimension names.
attr (dict, optional) – A dictionary containing the variable attributes.
dtype (data type, string or numpy.dtype, optional) – if not given data.dtype will be used
zlib (boolean, optional) – explicit compression for this variable if not given then global attribute is used
complevel (int, optional) – explicit compression level for this variable if not given then global attribute is used
chunksizes (tuple, optional) – chunksizes can be used to manually specify the HDF5 chunksizes for each dimension of the variable.
pynetcf.image module¶
Image class definition for NetCDF files.
- class pynetcf.image.ArrayStack(filename, grid=None, times=None, mode='r', name='')[source]¶
Bases:
OrthoMultiTs
Class for writing stacks of arrays (1D) into netCDF. Array stacks are basically orthogonal multidimensional array representation netCDF files.
pynetcf.point_data module¶
Classes for reading and writing point data in NetCDF files using Climate Forecast Metadata Conventions (http://cfconventions.org/).
- class pynetcf.point_data.GriddedPointData(*args, **kwargs)[source]¶
Bases:
GriddedBase
GriddedPointData class using GriddedBase class as parent and PointData as i/o class.
- class pynetcf.point_data.PointData(filename, mode='r', file_format='NETCDF4', zlib=True, complevel=4, n_obs=None, obs_dim='obs', add_dims=None, loc_id_var='location_id', time_units='days since 1900-01-01 00:00:00', time_var='time', lat_var='lat', lon_var='lon', alt_var='alt', **kwargs)[source]¶
Bases:
object
PointData class for reading and writing netCDF files following the CF conventions for point data.
- Parameters:
filename (str) – Filename of netCDF file. If already exiting then it will be opened as read only unless the append keyword is set.
mode (str, optional) –
access mode. default “r” “r” means read-only; no data can be modified. “w” means write; a new file is created, an existing file with the
same name is deleted.
- ”a” and “r+” mean append (in analogy with serial files); an existing
file is opened for reading and writing.
Appending s to modes w, r+ or a will enable unbuffered shared access to NETCDF3_CLASSIC or NETCDF3_64BIT formatted files. Unbuffered access may be useful even if you don”t need shared access, since it may be faster for programs that don”t access data sequentially. This option is ignored for NETCDF4 and NETCDF4_CLASSIC formatted files.
zlib (boolean, optional) – If set netCDF compression will be used. Default True
complevel (int, optional) – Compression level used from 1(low compression) to 9(high compression). Default: 4
n_obs (int, optional) – Number of observations. If None, unlimited dimension will be used. Default: None
obs_dim (str, optional) – Observation dimension name. Default: “obs”
add_dims (dict, optional) – Additional dimensions. Default: None
loc_id_var (str, optional) – Location id variable name. Default: “location id”
time_units (str, optional) – Time unit.
time_var (str, optional) – Time variable name. Default “time”
lat_var (str, optional) – Latitude variable name. Default “lat”
lon_var (str, optional) – Longitude variable name. Default: “lon”
alt_var (str, optional) – Altitude variable name. Default: “alt”
- write(loc_id, data, lon=None, lat=None, alt=None, time=None, **kwargs)[source]¶
Write data for specified location ids.
- Parameters:
loc_id (numpy.ndarray) – Location id.
data (dict of numpy.ndarray or numpy.recarray) – Dictionary containing variable names as keys and data as items.
lon (numpy.ndarray, optional) – Longitude information. Default: None
lat (numpy.ndarray, optional) – Latitude information. Default: None
alt (numpy.ndarray, optional) – Altitude information. Default: None
time (numpy.ndarray, optional) – Time information. Default: None
pynetcf.time_series module¶
Abstract class providing an interface for reading and writing time series in NetCDF files using Climate Forecast Metadata Conventions (http://cfconventions.org/).
- class pynetcf.time_series.ContiguousRaggedTs(filename, n_loc=None, n_obs=None, obs_loc_lut='row_size', obs_dim_name='obs', **kwargs)[source]¶
Bases:
DatasetTs
Class that represents a Contiguous ragged array representation of time series according to NetCDF CF-conventions 1.6.
- Parameters:
filename (string) – filename of netCDF file. If already exiting then it will be opened as read only unless the append keyword is set. if the overwrite keyword is set then the file will be overwritten
n_loc (int, optional) – number of locations that this netCDF file contains time series for only required for new file
n_obs (int, optional) – how many observations will be saved into this netCDF file in total only required for new file
obs_loc_lut (string, optional) – variable name in the netCDF file that contains the lookup between observations and locations
loc_dim_name (string, optional) – name of the location dimension
obs_dim_name (string, optional) – name of the observations dimension
loc_ids_name (string, optional) – name of variable that has the location id”s stored
loc_descr_name (string, optional) – name of variable that has additional location information stored
time_units (string, optional) – units the time axis is given in. Default: “days since 1900-01-01 00:00:00”
time_var (string, optional) – name of time variable Default: time
lat_var (string, optional) – name of latitude variable Default: lat
lon_var (string, optional) – name of longitude variable Default: lon
alt_var (string, optional) – name of altitude variable Default: alt
- read_time(loc_id)[source]¶
Read the time stamps for the given location id in this case it works like a normal time series variable.
- Returns:
time_var – Time variable.
- Return type:
np.float64
- write(loc_id, data, dates, loc_descr=None, lon=None, lat=None, alt=None, fill_values=None, attributes=None, dates_direct=False)[source]¶
Write time series data, if not yet existing also add location to file.
- Parameters:
loc_id (int) – Location id.
data (dict) – Dictionary with variable names as keys and numpy.ndarrays as values.
dates (numpy.array) – Array of datetime objects.
attributes (dict, optional) – Dictionary of attributes that should be added to the netCDF variables. can also be a dict of dicts for each variable name as in the data dict.
dates_direct (boolean) – If true the dates are already converted into floating point number of correct magnitude.
- class pynetcf.time_series.DatasetTs(filename, n_loc=None, loc_dim_name='locations', obs_dim_name='time', loc_ids_name='location_id', loc_descr_name='location_description', time_units='days since 1900-01-01 00:00:00', time_var='time', lat_var='lat', lon_var='lon', alt_var='alt', unlim_chunksize=None, read_bulk=False, read_dates=True, **kwargs)[source]¶
-
Abstract class to store common methods for NetCDF time series such as OrthoMulti-, ContiguousRaggedArray- and IndexedRaggedArray-representation. Implemented according to the NetCDF CF-conventions 1.6.
- Parameters:
filename (string) – filename of netCDF file. If already exiting then it will be opened as read only unless the append keyword is set. if the overwrite keyword is set then the file will be overwritten
n_loc (int, optional) – number of locations that this netCDF file contains time series for only required for new file
loc_dim_name (string, optional) – name of the location dimension
obs_dim_name (string, optional) – name of the observations dimension
loc_ids_name (string, optional) – name of variable that has the location id”s stored
loc_descr_name (string, optional) – name of variable that has additional location information stored
time_units (string, optional) – units the time axis is given in. Default: “days since 1900-01-01 00:00:00”
time_var (string, optional) – name of time variable Default: time
lat_var (string, optional) – name of latitude variable Default: lat
lon_var (string, optional) – name of longitude variable Default: lon
alt_var (string, optional) – name of altitude variable Default: alt
unlim_chunksize (int, optional) – chunksize to use along unlimited dimensions, other chunksizes will be calculated by the netCDF library
read_bulk (boolean, optional) – if set to True the data of all locations is read into memory, and subsequent calls to “read” read from the cache and not from disk this makes reading complete files faster#
read_dates (boolean, optional) – if false dates will not be read automatically but only on specific request useable for bulk reading because currently the netCDF num2date routine is very slow for big datasets
- extend_time(dates, direct=False)[source]¶
Extend the time dimension and variable by the given dates
- Parameters:
dates (numpy.array of datetime objects or floats) – Timestamps.
direct (boolean) – if true the dates are already converted into floating point number of correct magnitude
- get_time_variable_overlap(dates)[source]¶
Figure out if a new date array has a overlap with the already existing time variable.
Return the index of the existing time variable where the new dates should be located.
At the moment this only handles cases where all dates are new or none are new.
- Parameters:
dates (list) – List of datetime objects
- Returns:
indexes – Array of indexes that overlap
- Return type:
- read_all(loc_id, dates_direct=False)[source]¶
read a time series of all time series variables at a given location id
- read_time(loc_id)[source]¶
Read the time stamps for the given location id in this case the location id is irrelevant since they all have the same timestamps
- abstract write(loc_id, data, dates, loc_descr=None, lon=None, lat=None, alt=None, fill_values=None, attributes=None, dates_direct=False)[source]¶
Write time series data, if not yet existing also add location to file for this data format it is assumed that in each write/append cycle the same amount of data is added.
- Parameters:
loc_id (int) – Location id.
data (dict) – Dictionary with variable names as keys and numpy.ndarrays as values.
dates (numpy.ndarray) – Array of datetime objects.
attributes (dict, optional) – Dictionary of attributes that should be added to the netCDF variables. can also be a dict of dicts for each variable name as in the data dict.
dates_direct (boolean) – If true the dates are already converted into floating point number of correct magnitude.
- write_all(loc_ids, data, dates, loc_descrs=None, lons=None, lats=None, alts=None, fill_values=None, attributes=None, dates_direct=False)[source]¶
Write time series data in bulk, for this the user has to provide a 2D array with dimensions (self.nloc, dates) that is filled with the time series of all grid points in the file.
- Parameters:
loc_ids (numpy.ndarray) – location ids along the first axis of the data array
data (dict) – dictionary with variable names as keys and 2D numpy.arrays as values
dates (numpy.ndarray) – Array of datetime objects with same size as second dimension of data arrays.
attributes (dict, optional) – Dictionary of attributes that should be added to the netCDF variables. can also be a dict of dicts for each variable name as in the data dict.
dates_direct (boolean) – If true the dates are already converted into floating point number of correct magnitude
- class pynetcf.time_series.GriddedNcContiguousRaggedTs(*args, **kwargs)[source]¶
Bases:
GriddedNcTs
- class pynetcf.time_series.GriddedNcIndexedRaggedTs(*args, **kwargs)[source]¶
Bases:
GriddedNcTs
- write_cell(cell, gpi, data, datefield)[source]¶
Write complete data set into cell file.
- Parameters:
cell (int) – Cell number.
gpi (numpy.ndarray) – Location ids.
data (dict or numpy record array) – dictionary with variable names as keys and numpy.arrays as values
datefield (string) – field in the data dict that contains dates in correct format
- class pynetcf.time_series.GriddedNcOrthoMultiTs(*args, **kwargs)[source]¶
Bases:
GriddedNcTs
- class pynetcf.time_series.IndexedRaggedTs(filename, n_loc=None, obs_loc_lut='locationIndex', **kwargs)[source]¶
Bases:
DatasetTs
Class that represents a Indexed ragged array representation of time series according to NetCDF CF-conventions 1.6.
- read_time(loc_id)[source]¶
Read the time stamps for the given location id in this case it works like a normal time series variable.
- Returns:
time_var – Time variable.
- Return type:
np.float64
- write(loc_id, data, dates, loc_descr=None, lon=None, lat=None, alt=None, fill_values=None, attributes=None, dates_direct=False)[source]¶
write time series data, if not yet existing also add location to file
- Parameters:
loc_id (int or numpy.ndarray) – location id, if it is an array the location ids have to match the data in the data dictionary and in the dates array. In this way data for more than one point can be written into the file at once.
data (dict or numpy.recarray) – dictionary with variable names as keys and numpy.arrays as values
dates (numpy.array) – array of datetime objects
attributes (dict, optional) – dictionary of attributes that should be added to the netCDF variables. can also be a dict of dicts for each variable name as in the data dict.
dates_direct (boolean) – if true the dates are already converted into floating point number of correct magnitude
- class pynetcf.time_series.OrthoMultiTs(filename, n_loc=None, loc_dim_name='locations', obs_dim_name='time', loc_ids_name='location_id', loc_descr_name='location_description', time_units='days since 1900-01-01 00:00:00', time_var='time', lat_var='lat', lon_var='lon', alt_var='alt', unlim_chunksize=None, read_bulk=False, read_dates=True, **kwargs)[source]¶
Bases:
DatasetTs
Implementation of the Orthogonal multidimensional array representation of time series according to the NetCDF CF-conventions 1.6.
- Parameters:
filename (string) – filename of netCDF file. If already exiting then it will be opened as read only unless the append keyword is set. if the overwrite keyword is set then the file will be overwritten
n_loc (int, optional) – number of locations that this netCDF file contains time series for only required for new file
loc_dim_name (string, optional) – name of the location dimension
obs_dim_name (string, optional) – name of the observations dimension
loc_ids_name (string, optional) – name of variable that has the location id”s stored
loc_descr_name (string, optional) – name of variable that has additional location information stored
time_units (string, optional) – units the time axis is given in. Default: “days since 1900-01-01 00:00:00”
time_var (string, optional) – name of time variable Default: time
lat_var (string, optional) – name of latitude variable Default: lat
lon_var (string, optional) – name of longitude variable Default: lon
alt_var (string, optional) – name of altitude variable Default: alt
unlim_chunksize (int, optional) – chunksize to use along unlimited dimensions, other chunksizes will be calculated by the netCDF library
read_bulk (boolean, optional) – if set to True the data of all locations is read into memory, and subsequent calls to “read” read from the cache and not from disk this makes reading complete files faster#
read_dates (boolean, optional) – if false dates will not be read automatically but only on specific request useable for bulk reading because currently the netCDF num2date routine is very slow for big datasets
- write(loc_id, data, dates, loc_descr=None, lon=None, lat=None, alt=None, fill_values=None, attributes=None, dates_direct=False)[source]¶
Write time series data, if not yet existing also add location to file for this data format it is assumed that in each write/append cycle the same amount of data is added.
- Parameters:
loc_id (int) – Location id.
data (dict) – Dictionary with variable names as keys and numpy.ndarrays as values.
dates (numpy.ndarray) – Array of datetime objects.
attributes (dict, optional) – Dictionary of attributes that should be added to the netCDF variables. can also be a dict of dicts for each variable name as in the data dict.
dates_direct (boolean) – If true the dates are already converted into floating point number of correct magnitude.