GetNCEI, a Module to Retrieve NCDC's Climate Data¶

Author: Nafis Barizki

Contents¶

  1. GetNCEI Overview
  2. NCDC Climate Data Online (CDO) overview
  3. GetNCEI Explanation
  4. How-To GetNCEI: A Scenario
  5. GetNCEI Documentation
  6. References

1. GetNCEI Overview ¶

I developed a python module, named GetNCEI, with several functions to provide an easier approach for digging into Climate Data Online managed by The United States National Climatic Data Center (NCDC).

GetNCEI has scripts to access the web services using API requests module that can be used to narrow down the request hopefully to find the most desired kind of data. GetNCEI has functions to find specific datatypes (i.e. temperature, wind, etc.) or stations by specifying a keyword, and also find the nearest stations by the given coordinate point, which we can also view the visualization of those recorded stations in a cartographic chart.

2. NCDC Climate Data Online (CDO) overview ¶

NCDC's Climate Data Online (CDO) offers web services that provide access to current data. This API is for developers looking to create their own scripts or programs that use the CDO database of weather and climate data. An access token is required to use the API, and each token will be limited to five requests per second and 10,000 requests per day.

CDO Data¶

CDO data generally can be explained as below:

  1. CDO data is grouped in datasets, based on its datatypes, which describe the context of the data
  2. The datactegories is provided as general type of data used to group similar datatypes.
  3. CDO data comes from stations (for most datasets).
  4. Stations may be grouped in locations.
  5. Locations is categorized as Location Categories such as City, Country, etc.
  6. Recorded CDO data primarily contains data fields of:
    • station field, which record origin of data
    • datatype field, describes context of data
    • value actual value of the data corresponds with its datatype

About Endpoints¶

From previous explanation, we can see that CDO must has these kind of data: datasets, datacategories, datatypes, locationcategories, locations, stations,.

To receive each kind of data, CDO provides access to those by using several endpoints according to its name. Endpoint will be considered in the url that will be used to access the online database: https://www.ncei.noaa.gov/cdo-web/api/v2/{endpoint}.

  1. datasets: to get the available datasets. The returned data will has an 'id' field that can be used as datasetid.
  2. datatypes: to get the available datatypes. Same as datasets endpoint, the returned 'id' field can be used as datatypeid.
  3. locationcategories: to get the available locationcategories, can be accessed to get the locationcategoryid
  4. locations: to get the available locations, can be accessed to get the locationid
  5. stations: to get the available stations, can be accessed to get the stationid.

The recorded/raw data itself is different from previously stated kind of data. This introduce us to the sixth endpoint:

  1. data: to actually fetch the data.

Actually, previous id field (such as datasetid, datatypeid, stationid) can be used to narrow down the data that we want to request. This id field is passed as Additional Parameters for data endpoint while accessing the url.

Moreover, the id field can also be used for each of the 1st to 5th endpoint to confirm that the id will be supported in which, as an 'Additional Parameters' for each endpoint. For example, we can use datasetid=GHCND while we are accessing the stations endpoint, the purpose is to confirm that the stations returned from the request will guaranteed to support the GHCND datasets. Discover https://www.ncdc.noaa.gov/cdo-web/webservices/v2#gettingStarted to identify which id can be used as filter parameter for which endpoints.

I will try to explain how each endpoints work to get the recorded climate data in below flowchart(Red Font: Endpoints access):

Further explanation:

  • Generally datatypeids that will be passed shall exist in datasetid, so we will need to confirm it otherwise there may be none of the data that will be returned.
  • By passing locationid, the recorded data that returned will contain all stations that exist in those locationid.

3. GetNCEI Explanation ¶

Requirements¶

Minimum requirement for GetNCEI is python 3.9+ (recommended) and to have below libraries installed:

  1. Requests
  2. Scipy
  3. Numpy
  4. Plotly

Functionalities¶

GetNCEI is developed with general functionalities consists of:

GetNCEI Class Object Function
FindDataIdNCEI Help users to find pairs of datasetid-datatypeid based on specific keyword.
FindLocationInfoNCEI Help users to find locationid based on specific keyword. Specifying locationid in each request is highly recommended especially to search for stations since it will narrow down the scope of request rather than exploring so many stations available in the database. By this release of documentation, there are around 140,000 stations available. Each request to the website is limited to 1000 rows, so for accessing those stations there will be hundred requests which is time-consuming activity.
FindStationInfoNCEI Help users to find stations based on specific keyword. As I mentioned earlier, specifying locationid with this function is highly recommended.
FindNearestStationNCEI Help users to find nearest stations by specifying location coordinate (decimal degree). This function can return n-numbers of nearest station by the target coordinate, and also to visualize the location of each stations.
GetNCEI Perform request to the website to retrieve data.

Class Object and its Methods¶

Class Object: FindDataIdNCEI

Method Description Return Note
get_matched_datasets() Returns datasets that contain datatypes that list keyword list[dict] dict['id'] can be used as datasetid
get_matched_datatypes() Returns datatypes that specifically list keyword list[dict] dict['id'] can be used as datatypeid parameter
get_id_pairs() Returns pairs of datasetid-[datatypeids], which is the list of datatypeids that contain keyword paired with the dataset where these datatypes exist dict[str: list] contains datasetid and datatypeid

Class Object: FindLocationInfoNCEI

Method Description Return Note
get_location_info() Returns location info, which the name of location equal to keyword dict dict['id'] can be used as locationid

Class Object: FindStationInfoNCEI

Method Description Return Note
get_station_info() Returns list of stations that contain keyword list[dict] dict['id'] can be used as stationid. Passing locationid to the filter parameter is highly recommended

Class Object: FindNearestStationNCEI

Method Description Return Note
get_nearest_station() Return nearest stations to the target coordinate list[dict] dict['id'] can be used as stationid. Passing locationid to the filter parameter is highly recommended
show_location() Show the location of nearest stations in a cartographic chart plotly object Each point also describes distance to the target coordinate

Class Object: GetNCEI

Method Description Return Note
get_datasets() Return available datasets based on filter parameter list[dict] Using datasets endpoint
get_datacategories() Return available datacategories based on filter parameter list[dict] Using datacategories endpoint
get_datatypes() Return available datatypes based on filter parameter list[dict] Using datatypes endpoint
get_locationcategories() Return available locationcategories based on filter parameter list[dict] Using locationcategories endpoint
get_locations() Return available locations based on filter parameter list[dict] Using locations endpoint
get_stations() Return available stations based on filter parameter list[dict] Using stations endpoint
get_data() Fetch the data that recorded by stations, narrowed by filter parameter list[dict] Using data endpoint

Note:

  • Output of each GetNCEI methods are a list of dictionaries: [{field1: value1, field2: value2, ...}, ..., {field1: value1, ...}], which can be accessed with various processor, such as pandas.DataFrame().
  • For the detailed documentation, proceed to Detailed Documentation{}.

Class Object Flowchart¶

How each class object and methods work can be identified in below flowchart: (Red: Class Object, Blue: Methods.)

4. How-To GetNCEI: A Scenario ¶

To see an example of how we can utilize GetNCEI modules, please discover this section.

Scenario¶


Suppose we want to get the daily summary of temperature data, with specification as below:

  • Location of interest is the city of Jakarta, Indonesia
  • Data recorded by 3 nearest stations from coordinate point (-6.12, 106.75).
  • Data recorded in 2021

Solution to Scenario¶

How to approach:¶

We need to request data endpoint to NCEI website, narrow it down by filtering data that recorded only from nearest station and also only select datatypes related to temperature. Objectives:

  1. Find the locationid of Jakarta so that we can query the nearest station by passing locationid parameters to narrow down our search for stations.
  2. Find the stationid of the nearest station using FindNearestStationNCEI.get_nearest_station().
  3. Find the available datasetid and datatypeid related to temperature that recorded by stationid which is the 3 nearest stations.
  4. Do the data request to NCEI website using GetNCEI.get_data() by passing primary parameter of: datasetid,startdate, enddate, and filter parameter that contains: stationid, datatypeid,.

Objective 1: Find the locationid¶

Firstly we want to narrow down list of stations that available in Jakarta, Indonesia. It is clear that we should know the locationid of Jakarta (as a city) or Indonesia (as a country), both will work but it is best to try the most specific target first.

Find the location info of Jakarta City

To run FindLocationInfoNCEI(), we should know what is locationcategoryid for CITY. To do so, we can utilize GetNCEI.get_locationcategories().

In [ ]:
import getncei

token = '' #insert token

locationcategories = getncei.GetNCEI(token).get_locationcategories()
locationcategories[0]
Out[ ]:
{'name': 'City', 'id': 'CITY'}

Take a look at id key. Based on the results above, to search a city, we shall use locationcategoryid = CITY. Next we want to get the location info of Jakarta.

In [ ]:
location = getncei.FindLocationInfoNCEI(token, 'Jakarta', 'CITY').get_location_info()
location
Out[ ]:
{'mindate': '1973-01-12',
 'maxdate': '2022-06-26',
 'name': 'Jakarta, ID',
 'datacoverage': 1,
 'id': 'CITY:ID000008'}

Take a look at id key. Based on the results above, our locationid='CITY:ID000008'.

Objective 2: Find the stationid¶

Next we would to know our stationid. First we can discover all stations that available in Jakarta using GetNCEI.get_stations() and passing locationid='CITY:ID000008' as or filter, as follows:

In [ ]:
filter = dict(
    locationid='CITY:ID000008'
)

jakarta_stations = getncei.GetNCEI(token).get_stations(filter)

As it returns list of dictionary, we will try to access the first record to know the returned fields.

In [ ]:
jakarta_stations[0].keys()
Out[ ]:
dict_keys(['elevation', 'mindate', 'maxdate', 'latitude', 'name', 'datacoverage', 'id', 'elevationUnit', 'longitude'])

Lets inspect the name and id to make a clearer list of available stations:

In [ ]:
{station['id']: station['name'] for station in jakarta_stations}
Out[ ]:
{'GHCND:ID000096745': 'JAKARTA OBSERVATORY, ID',
 'GHCND:IDM00096739': 'BUDIARTO, ID',
 'GHCND:IDM00096741': 'JAKARTA TANJUNG PRIOK',
 'GHCND:IDM00096749': 'SOEKARNO HATTA INTERNATIONAL, ID',
 'GHCND:IDM00096753': 'BOGOR DERMAGA, ID'}

Although we already get the stationid, alternatively we can query the nearest 3 stations as follows:

In [ ]:
filter = dict(
    locationid='CITY:ID000008'
)
coord = (-6.12, 106.75)

nearest_stations_jakarta = \
    getncei.FindNearestStationNCEI(token, coord, filter, 3)
In [ ]:
{station['id']: station['name'] for station in nearest_stations_jakarta.get_nearest_station()}
Out[ ]:
{'GHCND:IDM00096749': 'SOEKARNO HATTA INTERNATIONAL, ID',
 'GHCND:ID000096745': 'JAKARTA OBSERVATORY, ID',
 'GHCND:IDM00096741': 'JAKARTA TANJUNG PRIOK'}

Looks like we already got our station of interest. Furthermore, we can see the location of our stations as below and hover the marker to reveal additional information:

In [ ]:
nearest_stations_jakarta.show_location()

To conclude, our stationid for filter is: ['GHCND:IDM00096749', 'GHCND:ID000096745','GHCND:IDM00096741']

Objective 3: Find the datasetid and datatypeid¶

Next we would to know which temperature data is supported by our stations record. To do this, we will pass 'temperature' as a keyword and our stationid as a filter to FindDataIdNCEI() object.

In [ ]:
station_id_list = ['GHCND:IDM00096749', 'GHCND:ID000096745','GHCND:IDM00096741']
filter = dict(
    stationid=station_id_list
)

dataid = getncei.FindDataIdNCEI(token, 'temperature', filter)

Let's look at the pair of datasetid-datatypeids that match with temperature:

In [ ]:
dataid.get_id_pairs()
Out[ ]:
{'GHCND': ['TAVG', 'TMAX', 'TMIN'],
 'GSOM': ['DT00',
  'DT32',
  'DX32',
  'DX70',
  'DX90',
  'DYNT',
  'DYXT',
  'EMNT',
  'EMXT',
  'TAVG',
  'TMAX',
  'TMIN']}

The available datasetids are 'GHCND' and 'GSOM'. To know its description:

In [ ]:
dataid.get_matched_datasets()
Out[ ]:
[{'uid': 'gov.noaa.ncdc:C00861',
  'mindate': '1763-01-01',
  'maxdate': '2022-06-28',
  'name': 'Daily Summaries',
  'datacoverage': 1,
  'id': 'GHCND'},
 {'uid': 'gov.noaa.ncdc:C00946',
  'mindate': '1763-01-01',
  'maxdate': '2022-06-01',
  'name': 'Global Summary of the Month',
  'datacoverage': 1,
  'id': 'GSOM'}]

Looks like 'GHCND' is our preferred daily data, we also discovered that available data covers until June 2022. We can dig more about datatypes in 'GHCND' and time range of data that available. The datatypes are the first 3 of matched datatypes.

In [ ]:
dataid.get_matched_datatypes()[0:3]
Out[ ]:
[{'mindate': '1874-10-13',
  'maxdate': '2022-06-28',
  'name': 'Average Temperature.',
  'datacoverage': 1,
  'id': 'TAVG'},
 {'mindate': '1763-01-01',
  'maxdate': '2022-06-28',
  'name': 'Maximum temperature',
  'datacoverage': 1,
  'id': 'TMAX'},
 {'mindate': '1763-01-01',
  'maxdate': '2022-06-28',
  'name': 'Minimum temperature',
  'datacoverage': 1,
  'id': 'TMIN'}]

By above filed, we can confirm that the 2021 data should be available for our datatypes. To summarize, we shall add below specification to our request related to datatypes:

  • datasetid = 'GHCND'
  • datatypeid = ['TAVG', 'TMAX', 'TMIN']
  • startdate = '2021-01-01'
  • enddate = '2021-12-31'

Objectives Checkpoint¶

Before actually fetched the raw data using GetNCEI.get_data(), we should confirm our filter parameters. From our activity before, we can conclude below parameters:

for primary parameters:

  • datasetid = 'GHCND'
  • startdate = '2021-01-01'
  • enddate = '2021-12-31'
  • req_size = 'all'

for optional filter parameters:

  • stationid = ['GHCND:IDM00096749', 'GHCND:ID000096745','GHCND:IDM00096741']
  • datatypeid = ['TAVG', 'TMAX', 'TMIN']

Looks like we are ready to retrieve the data we want.

Objectives 4: Fetch the data¶

Now we want to fetch our data using data endpoint:

In [ ]:
datasetid = 'GHCND'
startdate = '2021-01-01'
enddate = '2021-12-31'
filter = dict(
    datatypeid=['TAVG', 'TMAX', 'TMIN'],
    stationid=['GHCND:IDM00096749', 'GHCND:ID000096745','GHCND:IDM00096741']
)

temperature_data = \
    getncei.GetNCEI(token).get_data(
        datasetid=datasetid, 
        startdate=startdate,
        enddate=enddate,
        req_size='all',
        filter=filter
    )
In [ ]:
temperature_data[0:3]
Out[ ]:
[{'date': '2021-01-01T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:ID000096745',
  'attributes': 'H,,S,',
  'value': 271},
 {'date': '2021-01-01T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:ID000096745',
  'attributes': ',,S,',
  'value': 300},
 {'date': '2021-01-01T00:00:00',
  'datatype': 'TMIN',
  'station': 'GHCND:ID000096745',
  'attributes': ',,S,',
  'value': 250}]

We has received the narrowed temperature data, hopefully according to our specification (need to be checked). The data fields are: date, datatype, station, attributes, and values. Detailed information of each data can be discovered in NCEI documentation https://www1.ncdc.noaa.gov/pub/data/cdo/documentation/.

Please carefully check the value as it may need more judgement or identification. Look at our example above about the value of temperature that reach 271, which is abnormally high for daily temperature so we need additional insight about the data to verify it.

In the next section, we will try to process the data using pandas dataframe.

Processing the Data using pandas.DataFrame¶

Lucky enough for us that the data returned consists of dictionary that mark the data field as keys and its records as values. We can process this kind of data using pandas.DataFrame.

In [ ]:
import pandas as pd

temperature_df = pd.DataFrame(temperature_data)
temperature_df.head(5)
Out[ ]:
date datatype station attributes value
0 2021-01-01T00:00:00 TAVG GHCND:ID000096745 H,,S, 271
1 2021-01-01T00:00:00 TMAX GHCND:ID000096745 ,,S, 300
2 2021-01-01T00:00:00 TMIN GHCND:ID000096745 ,,S, 250
3 2021-01-01T00:00:00 TAVG GHCND:IDM00096741 H,,S, 272
4 2021-01-01T00:00:00 TMAX GHCND:IDM00096741 ,,S, 298

Let's see unique values for datatype and station columns:

In [ ]:
for column in temperature_df.columns[1:3]:
    print(f'unique "{column}": ',
          temperature_df[column].unique())
unique "datatype":  ['TAVG' 'TMAX' 'TMIN']
unique "station":  ['GHCND:ID000096745' 'GHCND:IDM00096741' 'GHCND:IDM00096749']

Our specified datatypes and stations is satisfied according to our filter.

Let's discover our date column:

In [ ]:
df = temperature_df
for station in df.station.unique():
    date_count = len(df[df.station == station])
    print(f'{station}. Unique "date" count = ', date_count)
GHCND:ID000096745. Unique "date" count =  989
GHCND:IDM00096741. Unique "date" count =  980
GHCND:IDM00096749. Unique "date" count =  846

Note that for daily record, data for 1 year of 3 datatypes (TAVG, TMAX, TMIN) shall contains roughly about 1,095 records. Seems that we should verify the data accordingly.

Conclusion¶

From above sections, we can see the process to retrieve data from NCEI website and try to look at general idea of the data that we received.

For full documentation of each GetNCEI methods, proceed to the next section.

5. GetNCEI Documentation ¶

A. getncei.FindDataIdNCEI¶

Constructor¶

FindDataIdNCEI(token, keyword, [filter])

Find the datatypes that contains keyword, and inform in which datasets are they existed.

Parameters
----------
token (str): 
    Token to access web services, obtained from https://www.ncdc.noaa.gov/cdo-web/token.

keyword (str or list[str]):
    Specify the keyword to search in various available datatypes.

filter (dict[str, str | list[str]]), optional, default = {}: 
    Filter the datasets or datatypes that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below:
    KEYS:
    'locationid': 
        VALUE (str or list[str]) -> Accepts a valid locationid or a list of locationids. Matched datatypes that returned will be available for location(s) specified. Example: {'locationid': ['FIPS:37', 'CITY:ID000008'], ...}.
    'stationid': 
        VALUE (str or list[str]) -> Accepts a valid stationid or a list of stationids. Matched datatypes that returned will be available for the station(s) specified. Example: {'stationid': ['GHCND:ID000096745', 'GHCND:IDM00096739'], ...}.
    'startdate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Matched datasets that returned will have data after the specified date. Paramater can be use independently of 'enddate'. Example: {'startdate': '1970-10-03', ...}.
    'enddate':
        VALUE(str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Matched datasets that returned will have data before the specified date. Paramater can be use independently of 'startdate'. Example: {'enddate': '2012-09-10', ...}.
    Example:
        filter = {
            'stationid': 'GHCND:ID000096745'
            }

Methods¶

FindDataIdNCEI.get_matched_datasets()

Returns datasets that have any datatypes that contains specified keyword.

Parameter
---------
None

Returns
-------
list[dict]:
    List of datasets that have any datatypes for specified keyword



FindDataIdNCEI.get_matched_datatypes()

Returns datatypes that contains specified keyword.

Parameters
----------
None

Returns
-------
list[dict]:
    List of datatypes that contains specified keyword


FindDataIdNCEI.get_id_pairs()

Returns pairs of datasets-matched datatypes.

Parameters
----------
None

Returns
-------
dict[str: list[str]]:
    Dictionary of {matched_datasetid1: [matched_datatypeids], matched_datasetid2: [matched_datatypeids], ...}. Can be used as datasetid and datatypeid for get_data() method or other get_* method().

B. getncei.FindLocationInfoNCEI¶

Constructor¶

FindLocationInfoNCEI(token, target, locationcategoryid, [filter])

Find a location of available data by searching it based on target keyword. Matched location can be filtered using filter parameter to verify that location will contains that specified features.

Parameters
----------
token (str): 
    Token to access web services, obtained from https://www.ncdc.noaa.gov/cdo-web/token.
target (str):
    Specify the keyword to search in various available locations. Example: 'New York'.
locationcategoryid (str):
    As a category which describes the scope of target keyword. Example: 'CITY', as a suited value if 'New York' was specified in target parameter. 
filter (dict[str, str | list[str]]), optional, default = {}: 
    Filter the datasets or datatypes that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below:\n
    KEYS:\n
    'datasetid':
        VALUE (str or list[str]) -> Accepts a valid datasetid or a list of datasetids. Locations returned will match with keyword and will be supported by dataset(s) specified. Example: {'datasetid': 'GHCND', ...}.
    'datacategoryid':
        VALUE (str or list[str]) -> Accepts a valid datacategoryid or a list of datacategoryids. Locations returned will match with keyword and will be associated with the data category(ies) specified. Example: {'datacategoryid': 'TEMP', ...}.
    'startdate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Locations returned will match with keyword and will have data after the specified date. Parameter can be use independently of 'enddate'. Example: {'startdate': '1970-10-03', ...}.
    'enddate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Locations returned will match with keyword and will have data before the specified date. Parameter can be use independently of 'startdate'. Example: {'enddate': '2012-09-10', ...}.
    Example:
        filter = {
            'datasetid': 'GHCND',
            'datacategoryid': 'TEMP'
            }    

Method¶

FindLocationInfoNCEI.get_location_info()

Get the location info as a dict that match the target keyword.

Parameters
----------
None

Returns
-------
dict:
    Dictionary that contains location info that matched with target keyword.

C. getncei.FindStationInfoNCEI¶

Constructor¶

FindStationInfoNCEI(token, target, [filter])

Find available stations that contains specified target keyword.

Parameters
----------
token (str): 
    Token to access web services, obtained from https://www.ncdc.noaa.gov/cdo-web/token.
target (str):
    Specify the keyword to search in various available stations. Example: 'Salt Lake' to find stations that contains this keyword in its description.
filter (dict):
    Filter the station data that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below:\n
    KEYS:\n
    'datasetid':
        VALUE (str or list[str]) -> Accepts a valid datasetid or a list of datasetids. Matched stations returned will be supported by dataset(s) specified. Example: {'datasetid': 'GHCND'}.
    'locationid': 
        VALUE (str or list[str]) -> Accepts a valid locationid or a list of locationids. Matched stations returned will contain data for the location(s) specified. Example: {'locationid': ['FIPS:37', 'CITY:ID000008'], ...}.
    'datacategoryid':
        VALUE (str or list[str]) -> Accepts a valid datacategoryid or a list of datacategoryids. Matched stationss returned will be associated with the data category(ies) specified. Example: {'datacategoryid': 'TEMP'}.
    'datatypeid': 
        VALUE (str or list[str]) -> Accepts a valid datatypeid or a list of datatypeids. Matched stations returned will contain all of the available data type(s) specified. Example: {'datatypeid': ['TAVG', 'TMAX', 'TMIN'], ...}.
    'extent':
        VALUE (str) -> The desired geographical extent for search. Designed to take a parameter generated by Google Maps API V3 LatLngBounds.toUrlValue. Stations returned must be located within the extent specified. Example: {'extent': '47.5204,-122.2047,47.6139,-122.1065', ...}
    'startdate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Matched stations returned will have data after the specified date. Paramater can be use independently of 'enddate'. Example: {'startdate': '1970-10-03', ...}.
    'enddate':
        VALUE(str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Matched stations returned will have data before the specified date. Paramater can be use independently of 'startdate'. Example: {'enddate': '2012-09-10', ...}.


Method¶

FindStationInfoNCEI.get_station_info()

Returns all stations that contain target keyword in its description.

Parameters
----------
None

Returns
-------
list[dict]:
    List of dictionaries of matched stations.

D. getncei.FindNearestStationNCEI¶

Constructor¶

FindNearestStation(token, coord, [filter, station_nos=1])

Find the nearest station with the specified coordinate.

Parameters
----------
token (str): 
    Token to access web services, obtained from https://www.ncdc.noaa.gov/cdo-web/token.
coord (tuple):
    Tuple of (lat, long) decimal degree coordinate. The latitude (decimated degrees w/northern hemisphere values > 0, southern hemisphere values < 0), longitude (decimated degrees w/western hemisphere values < 0, eastern hemisphere values > 0).
filter (dict), optional, default = {}:
    Filter the station data that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below: 
    KEYS: 
    'datasetid':
        VALUE (str or list[str]) -> Accepts a valid datasetid or a list of datasetids. Nearest stations returned will be supported by dataset(s) specified. Example: {'datasetid': 'GHCND'}.
    'locationid': 
        VALUE (str or list[str]) -> Accepts a valid locationid or a list of locationids. Nearest stations returned will contain data for the location(s) specified. Example: {'locationid': ['FIPS:37', 'CITY:ID000008'], ...}.
    'datacategoryid':
        VALUE (str or list[str]) -> Accepts a valid datacategoryid or a list of datacategoryids. Nearest stationss returned will be associated with the data category(ies) specified. Example: {'datacategoryid': 'TEMP'}.
    'datatypeid': 
        VALUE (str or list[str]) -> Accepts a valid datatypeid or a list of datatypeids. Nearest stations returned will contain all of the available data type(s) specified. Example: {'datatypeid': ['TAVG', 'TMAX', 'TMIN'], ...}.
    'startdate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Nearest stations returned will have data after the specified date. Paramater can be use independently of 'enddate'. Example: {'startdate': '1970-10-03', ...}.
    'enddate':
        VALUE(str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Nearest stations returned will have data before the specified date. Paramater can be use independently of 'startdate'. Example: {'enddate': '2012-09-10', ...}.
station_nos (int), optional, default = 1:
    Number of nearest stations that wanted to be returned.

Methods¶

FindNearestStation.get_nearest_station()

Return a station info that placed nearest with specified target coordinate.

Parameters
----------
None

Returns
-------
dict:
    Nearest station info fields stored as a dictionary.

FindNearestStation.show_location()

Plotting nearest station and target coordinate on a cartographic chart.

Parameters
----------
None

Returns
-------
Plotly.Figure object:
    Nearest station plot that located nearest to the target coordinate.

E. getncei.GetNCEI¶

Constructor¶

GetNCEI(token)

Get the data by requesting several endpoint of NCEI API url.

Parameters
----------
token (str): 
    Token to access web services, obtained from https://www.ncdc.noaa.gov/cdo-web/token.

Method¶

GetNCEI.get_datasets([filter, req_size])

Get the available datasets (using API request endpoint:'datasets'). All of the CDO data are in datasets. The containing dataset must be known before attempting to access its data.
Criteria of the datasets available is specified by the filter parameter, and number of maximum rows returned is specified by req_size parameter.

Parameters
----------
filter (dict[str, str | list[str]]), optional, default = {}: 
    Filter the data sets that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below:
    KEYS:
    'datatypeid':
        VALUE (str or list[str]) -> Accepts a valid datatypeid or a list of datatypeids. Datasets returned will contain all of the data type(s) specified. Example: 'ACMH'.
    'locationid': 
        VALUE (str or list[str]) -> Accepts a valid locationid or a list of locationids.  Datasets returned will contain data for the location(s) specified. Example: {'locationid': ['FIPS:37', 'CITY:ID000008'], ...}.
    'stationid': 
        VALUE (str or list[str]) -> Accepts a valid stationid or a list of stationids.  Datasets returned will contain data for the station(s) specified. Example: {'stationid': 'GHCND:ID000096745', ...}.                
    'startdate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Datasets returned will have data after the specified date. Paramater can be use independently of 'enddate'. Example: {'startdate': '1970-10-03', ...}.
    'enddate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Datasets returned will have data before the specified date. Paramater can be use independently of 'startdate'. Example: {'enddate': '2012-09-10', ...}.
    'sortfield': 
        VALUE (str = one from any of 'id', 'name', 'mindate', 'maxdate', 'datacoverage') -> Sort the results by the specified field. Example: {'sortfield': 'name', ...}.
    'sortorder'
        VALUE (str = 'asc' or 'desc') -> Specifies whether sort is ascending or descending. Defaults to 'asc'. Example: {'sortorder': 'desc', ...}.
    Example:
        filter = {
            'stationid': ''GHCND:ID000096745'
            }

req_size (int), Optional, default = None:
    Determining maximum row size of the data that will be retrieved. If not specified, all of the available datasets will be retrieved.

Returns
-------
list[dict]
    A list of dictionaries that contain datatypes data, which contains fields of {'field1': 'values1', 'field2':'values2', ....}. The value associated within 'id' field can be used as 'datasetid' as a filter for fetching the data using get_data() method or other get_* method.

Raises
------
InputTypeError
    If the input type of each parameters is not valid.
InputValueError
    If the input value of req_size and filter keys are not valid.
JSONDecodeError
    If there was an error with requesting API.

GetNCEI.get_datacategories([filter, req_size])

Get the available datacategories (using API request endpoint:'datacategories'). Data Categories represent groupings of data types.
Criteria of the datacategories available is specified by the filter parameter, and number of maximum rows returned is specified by req_size parameter.

Parameters
----------
filter (dict[str, str | list[str]]), optional, default = {}: 
    Filter the data categories that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below:
    KEYS:
    'datasetid':
        VALUE (str or list[str]) -> Accepts a valid datasetid or a list of datasetids. Data categories returned will be supported by dataset(s) specified. Example: 'GHCND'.
    'locationid': 
        VALUE (str or list[str]) -> Accepts a valid locationid or a list of locationids.  Data categories returned will be applicable for the location(s) specified. Example: {'locationid': ['FIPS:37', 'CITY:ID000008'], ...}.
    'stationid': 
        VALUE (str or list[str]) -> Accepts a valid stationid or a list of stationids.  Data categories returned will be applicable for the station(s) specified. Example: {'stationid': 'GHCND:ID000096745', ...}.                
    'startdate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Data categories returned will have data after the specified date. Paramater can be use independently of 'enddate'. Example: {'startdate': '1970-10-03', ...}.
    'enddate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Data categories returned will have data before the specified date. Paramater can be use independently of 'startdate'. Example: {'enddate': '2012-09-10', ...}.
    'sortfield': 
        VALUE (str = one from any of 'id', 'name', 'mindate', 'maxdate', 'datacoverage') -> Sort the results by the specified field. Example: {'sortfield': 'name', ...}.
    'sortorder'
        VALUE (str = 'asc' or 'desc') -> Specifies whether sort is ascending or descending. Defaults to 'asc'. Example: {'sortorder': 'desc', ...}.
    Example:
        filter = {
            'datasetid': 'GHCND',
            'stationid': ''GHCND:ID000096745'
            }

req_size (int), Optional, default = None:
    Determining maximum row size of the data that will be retrieved. If not specified, all of the available datacategories data will be retrieved.

Returns
-------
list[dict]
    A list of dictionaries that contain datatypes data, which contains fields of {'field1': 'values1', 'field2':'values2', ....}. The value associated within 'id' field can be used as 'datacategoryid' as a filter for fetching the data using get_data() method or other get_* method.

Raises
------
InputTypeError
    If the input type of each parameters is not valid.
InputValueError
    If the input value of req_size and filter keys are not valid.
JSONDecodeError
    If there was an error with requesting API.

GetNCEI.get_datatypes([filter, req_size])

Get the available datatypes (using API request endpoint:'datatypes'). Data Type describes the type of data, acts as a label. If it's 64°f out right now, then the data type is Air Temperature and the data is 64.
Criteria of the datatypes available is specified by the filter parameter, and number of maximum rows returned is specified by req_size parameter.

Parameters
----------
filter (dict[str, str | list[str]]), optional, default = {}: 
    Filter the data types that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below:
    KEYS:
    'datasetid':
        VALUE (str or list[str]) -> Accepts a valid datasetid or a list of datasetids. Data types returned will be supported by dataset(s) specified. Example: 'GHCND'.
    'locationid': 
        VALUE (str or list[str]) -> Accepts a valid locationid or a list of locationids.  Data types returned will be applicable for the location(s) specified. Example: {'locationid': ['FIPS:37', 'CITY:ID000008'], ...}.
    'stationid': 
        VALUE (str or list[str]) -> Accepts a valid stationid or a list of stationids.  Data types returned will be applicable for the station(s) specified. Example: {'stationid': 'GHCND:ID000096745', ...}.                
    'datacategoryid':
        VALUE (str or list[str]) -> Accepts a valid datacategoryid or a list of datacategoryids.  Data types returned will be associated with the data category(ies) specified. Example: {'datacategoryid': 'TEMP'}.
    'startdate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Data types returned will have data after the specified date. Paramater can be use independently of 'enddate'. Example: {'startdate': '1970-10-03', ...}.
    'enddate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Data types returned will have data before the specified date. Paramater can be use independently of 'startdate'. Example: {'enddate': '2012-09-10', ...}.
    'sortfield': 
        VALUE (str = one from any of 'id', 'name', 'mindate', 'maxdate', 'datacoverage') -> Sort the results by the specified field. Example: {'sortfield': 'name', ...}.
    'sortorder'
        VALUE (str = 'asc' or 'desc') -> Specifies whether sort is ascending or descending. Defaults to 'asc'. Example: {'sortorder': 'desc', ...}.
    Example:
        filter = {
            'datasetid': 'GHCND',
            'datacategoryid': 'TEMP',
            'stationid': ''GHCND:ID000096745''
            }

req_size (int), Optional, default = None:
    Determining maximum row size of the data that will be retrieved. If not specified, all of the available datatypes data will be retrieved.

Returns
-------
list[dict]
    A list of dictionaries that contain datatypes data, which contains fields of {'field1': 'values1', 'field2':'values2', ....}. The value associated within 'id' field can be used as 'datatypeid' as a filter for fetching the data using get_data() method or other get_* method.

Raises
------
InputTypeError
    If the input type of each parameters is not valid.
InputValueError
    If the input value of req_size and filter keys are not valid.
JSONDecodeError
    If there was an error with requesting API.

GetNCEI.get_locationcategories([filter, req_size])

Get the available locationcategories (using API request endpoint:'locationcategories'). Location categories are groupings of locations under an applicable label.
Criteria of the locationcategories available is specified by the filter parameter, and number of maximum rows returned is specified by req_size parameter.

Parameters
----------
filter (dict[str, str | list[str]]), optional, default = {}: 
    Filter the location categories data that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below:
    KEYS:
    'datasetid':
        VALUE (str or list[str]) -> Accepts a valid datasetid or a list of datasetids. Location categories returned will be supported by dataset(s) specified. Example: 'GHCND'.
    'startdate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Location categories returned will have data after the specified date. Example: {'startdate': '1970-10-03', ...}.
    'enddate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Location categories returned will have data before the specified date. Parameter can be use independently of 'startdate'. Example: {'enddate': '2012-09-10', ...}.
    'sortfield': 
        VALUE (str = one from any of 'id', 'name', 'mindate', 'maxdate', 'datacoverage') -> Sort the results by the specified field. Example: {'sortfield': 'name', ...}.
    'sortorder'
        VALUE (str = 'asc' or 'desc') -> Specifies whether sort is ascending or descending. Defaults to 'asc'. Example: {'sortorder': 'desc', ...}.
    Example:
        filter = {
            'datasetid': 'GHCND',
            'startdate': ''1970-10-03',
            'sortfield': 'name'
            }

req_size (int), Optional, default = None:
    Determining maximum row size of the data that will be retrieved. If not specified, all of the available locationcategories data will be retrieved.        

Returns
-------
list[dict]
    A list of dictionaries that contain locationcategories data, which contains fields of {'field1': 'values1', 'field2':'values2', ...}. The value associated within 'id' field can be used as 'locationcategoryid' as a filter for fetching the data using get_data() method or other get_* method.

Raises
------
InputTypeError
    If the input type of each parameters is not valid.
InputValueError
    If the input value of req_size and filter keys are not valid.
JSONDecodeError
    If there was an error with requesting API.

GetNCEI.get_locations([filter, req_size])

Get the available locations (using API request endpoint:'locations'). Locations can be a specific latitude/longitude point such as a station, or a label representing a bounding area such as a city.
Criteria of the locations available is specified by the filter parameter, and number of maximum rows returned is specified by req_size parameter.

Parameters
----------
filter (dict[str, str | list[str]]), optional, default = {}: 
    Filter the location data that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below:
    KEYS:
    'datasetid':
        VALUE (str or list[str]) -> Accepts a valid datasetid or a list of datasetids. Locations returned will be supported by dataset(s) specified. Example: 'GHCND'.
    'locationcategoryid': 
        VALUE (str or list[str]) -> Accepts a valid locationcategoryid or a list of locationcategoryids. Locations returned will be in the location category(ies) specified. Example: {'locationcategoryid': 'CITY', ...}.
    'datacategoryid':
        VALUE (str or list[str]) -> Accepts a valid datacategoryid or a list of datacategoryids. Locations returned will be associated with the data category(ies) specified. Example: {'datacategoryid': 'TEMP', ...}.
    'startdate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Locations returned will have data after the specified date. Parameter can be use independently of 'enddate'. Example: {'startdate': '1970-10-03', ...}.
    'enddate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Locations returned will have data before the specified date. Parameter can be use independently of 'startdate'. Example: {'enddate': '2012-09-10', ...}.
    'sortfield': 
        VALUE (str = one from any of 'id', 'name', 'mindate', 'maxdate', 'datacoverage') -> Sort the results by the specified field. Example: {'sortfield': 'name', ...}.
    'sortorder'
        VALUE (str = 'asc' or 'desc') -> Specifies whether sort is ascending or descending. Defaults to 'asc'. Example: {'sortorder': 'desc', ...}.
    Example:
        filter = {
            'datasetid': 'GHCND',
            'locationcategoryid': 'CITY'
            }

req_size (int), Optional, default = None:
    Determining maximum row size of the data that will be retrieved. If not specified, all of the available locations data will be retrieved.        

Returns
-------
list[dict]
    A list of dictionaries that contain locations data, which contains fields of {'field1': 'values1', 'field2':'values2', ....}. The value associated within 'id' field can be used as 'locationid' as a filter for fetching the data using get_data() method or other get_* method.

Raises
------
InputTypeError
    If the input type of each parameters is not valid.
InputValueError
    If the input value of req_size and filter keys are not valid.
JSONDecodeError
    If there was an error with requesting API.

GetNCEI.get_stations([filter, req_size])

Get the available stations (using API request endpoint:'stations'). Stations are where the data comes from (for most datasets) and can be considered the smallest granual of location data. If the desired station is known, all of its data can quickly be viewed
Criteria of the stations available is specified by the filter parameter, and number of maximum rows returned is specified by req_size parameter.

Parameters
----------
filter (dict[str, str | list[str]]), optional, default = {}: 
    Filter the station data that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below:
    KEYS:
    'datasetid':
        VALUE (str or list[str]) -> Accepts a valid datasetid or a list of datasetids. Stations returned will be supported by dataset(s) specified. Example: 'GHCND'.
    'locationid': 
        VALUE (str or list[str]) -> Accepts a valid locationid or a list of locationids. Stations returned will contain data for the location(s) specified. Example: {'locationid': ['FIPS:37', 'CITY:ID000008'], ...}.
    'datacategoryid':
        VALUE (str or list[str]) -> Accepts a valid datacategoryid or a list of datacategoryids. Stations returned will be associated with the data category(ies) specified. Example: {'datacategoryid': 'TEMP'}.
    'datatypeid': 
        VALUE (str or list[str]) -> Accepts a valid datatypeid or a list of datatypeids. Stations returned will contain all of the available data type(s) specified. Example: {'datatypeid': ['TAVG', 'TMAX', 'TMIN'], ...}.
    'extent':
        VALUE (str) -> The desired geographical extent for search. Designed to take a parameter generated by Google Maps API V3 LatLngBounds.toUrlValue. Stations returned must be located within the extent specified. Example: {'extent': '47.5204,-122.2047,47.6139,-122.1065', ...}
    'startdate':
        VALUE (str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Stations returned will have data after the specified date. Paramater can be use independently of 'enddate'. Example: {'startdate': '1970-10-03', ...}.
    'enddate':
        VALUE(str) -> Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Stations returned will have data before the specified date. Paramater can be use independently of 'startdate'. Example: {'enddate': '2012-09-10', ...}.
    'sortfield': 
        VALUE (str = one from any of 'id', 'name', 'mindate', 'maxdate', 'datacoverage') -> Sort the results by the specified field. Example: {'sortfield': 'name', ...}.
    'sortorder'
        VALUE (str = 'asc' or 'desc') -> Specifies whether sort is ascending or descending. Defaults to 'asc'. Example: {'sortorder': 'desc', ...}.
    Example:
        {
            'datasetid': 'GHCND',
            'datatypeid': ['TMAX', 'TMIN'],
            'locationid': 'CITY:ID000008'
            }

req_size (int), Optional, default = None:
    Determining maximum row size of the data that will be retrieved. If not specified, all of the available stations data will be retrieved.

Returns
-------
list[dict]
    A list of dictionaries that contain stations data, which contains fields of {'field1': 'values1', 'field2':'values2', ....}. The value associated within 'id' field can be used as 'stationid' as a filter for fetching the data using get_data() method or other get_* method.

Raises
------
InputTypeError
    If the input type of each parameters is not valid.
InputValueError
    If the input value of req_size and filter keys are not valid.
JSONDecodeError
    If there was an error with requesting API.

GetNCEI.get_data(datasetid, startdate, enddate, [req_size=1000, filter])

Get the fetched data (using API request endpoint:'data') from a single datasetid. 
Criteria of the data is specified by the filter parameter, and number of maximum rows returned is specified by req_size parameter.

Parameters
----------
datasetid (str), required:
    Datasetid of the data that want to be retrieved. Data returned will be from the datasetid specified. Example: 'GHCND'.
startdate (str), required:
    Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Data returned will be after the specified date. Annual and Monthly data will be limited to a ten year range while all other data will be limited to a one year range. Example: '1970-10-03'.
enddate (str), required:
    Required. Accepts a valid ISO formated date (YYYY-MM-DD) or date time (YYYY-MM-DDThh:mm:ss). Data returned will be before the specified date. Annual and Monthly data will be limited to a ten year range while all other data will be limted to a one year range. Example: '2012-09-10'.
req_size (int or str = 'all'), Optional, default = 1000:
    Determining maximum row size of the data that will be retrieved. req_size = 'all' will retrieve all of the available data.
filter (dict[str, str | list[str]]), optional, default = {}: 
    Filter the data that will be retrieved using a Dict, which the KEYS are the 'Additional Parameters' for the API request. Accepted {KEYS:VALUES} pairs are as explained below:
    KEYS:
    'datatypeid': 
        VALUE (str or list[str]) -> Accepts a valid datatypeid or a list of datatypeids. Data returned will contain all of the available data type(s) specified. Example: {'datatypeid': ['TAVG', 'TMAX', 'TMIN'], ...}.
    'locationid': 
        VALUE (str or list[str]) -> Accepts a valid locationid or a list of locationids. Data returned will contain data for the available location(s) specified. Example: {'locationid': ['FIPS:37', 'CITY:ID000008'], ...}.
    'stationid': 
        VALUE (str or list[str]) -> Accepts a valid stationid or a list of stationids. Data returned will contain data for the available station(s) specified. Example: {'stationid': ['GHCND:ID000096745', 'GHCND:IDM00096739'], ...}.
    'units':
        VALUE (str = 'standard' or 'metric') -> Accepts the literal strings 'standard' or 'metric'. Data will be scaled and converted to the specified units. If a unit is not provided then no scaling nor conversion will take place. Example: {'unit': 'standard', ...).
    'sortfield': 
        VALUE (str = one from any of 'date', 'datatype', 'station', 'atribute', 'value') -> Sort the results by the specified field. Example: {'sortfield': 'value', ...}.
    'sortorder'
        VALUE (str = 'asc' or 'desc') -> Specifies whether sort is ascending or descending. Defaults to 'asc'. Example: {'sortorder': 'desc', ...}.
    Example:
        filter = {
            'datatypeid': ['TMAX', 'TMIN'],
            'stationid': 'GHCND:ID000096745'
            }

Returns
-------
list[dict]
    A list of dictionaries that contain data fields of {'field1': 'values1', 'field2':'values2', ....}.

Raises
------
InputTypeError
    If the input type of each parameters is not valid.
InputValueError
    If the input value of req_size and filter keys are not valid.
JSONDecodeError
    If there was an error with requesting API.

6. References ¶

  1. NOAA Climate Data Online: Web Services Documentation. https://www.ncdc.noaa.gov/cdo-web/webservices/v2
  2. Moline, S. Hands-On Data Analysis with Pandas – Second Edition. 127-138.