Weather

The weather module contains two classes, the logsim.weather.MarkovChain class and the logsim.weather.WeatherData class . The WeatherData class is designed to train, use and save a Markov model that is able to generate random weather data. The weather data class takes different setting parameters as the number of bins, the number of hours to generate and the number of samples to generate. When the weather module class is initialized, it will look into the cache to retrieve previously trained models (checks it based on the input file name). The columns name should match the following names exactly: [date_time,Hm0, Tp] in order for the WeatherData class to work properly, respectively the date time in hours, the critical wave height and the peak period. The date_time column should be in the ISO8601 format, for example: 2003-01-03T11:00. The data should be in hourly intervals. When new data is present and no models are present in the cache, the WeatherData class will train a new model and save it in the cache.

The WeatherData class uses the MarkovChain class to fit a Markov model to the data and to store the transition matrix. The MarkovChain class is also used to generate a random new state based on the current state and the corresponding probabilities stored in the transition matrix. This is done by using the logsim.weather.MarkovChain.find_next_state() method of the MarkovChain class.

class logsim.weather.MarkovChain(n: int)
find_next_state(current_row: int) tuple[int, int, tuple[int, int]]

This helper function can be used to return a random new state based on the current state and the corresponding probabilities stored in the transition matrix.

Parameters:

current_row – The row number of the transition matrix that corresponds to the current state of the weather (row, state mappings are stored in self.possible_from_states)

Returns:

The new state, the new row number and the new combination of states. The new row is the row number of the transition matrix that corresponds to the new state of the weather. The new state is the state number of the new state. The new combination of states is the combination of the new state and the previous state in a tuple.

fit(data: array) None

This function fits the Markov Chain to the given data. The data is a numpy array of integers that represent the states of the weather in the given order. The weights are optional and can be used to weight the data.

Parameters:

data (np.array) – The data to fit the Markov Chain to.

class logsim.weather.WeatherData(file_name: str, start_day: int, start_month: int, synthetic: bool = True, synthetic_data_samples: int | None = None, train_model: bool = False, bin_tuple: tuple[int, int] = (15, 15), markov_order: int = 2, timedelta_days: int = 15, sample_hours: int = 10000, scale_factor: float = 1.0, experiment_cache=None)
generate_synthetic_data() LazyFrame | None

Generate synthetic data based on the transition probabilities of a cached Markov Chain model. The set parameters of the class are used to generate the appropriate amount of samples with the appropriate size. The synthetic data is generated based on the standardized model. After generation the data standardization is reverted based on the given start day and month. The generated data is cached and is retrieved if using the same parameters.

Returns:

A Polars LazyFrame containing the generated synthetic data samples or None if cached data is available.

get_sample(sample_no: int) -> (<built-in function array>, <class 'polars.internals.lazyframe.frame.LazyFrame'>)

Get a sample from the synthetic data by filtering the LazyFrame on the sample number. Tests if the sample number is valid and if the synthetic data is available in the cache.

Parameters:

sample_no (int) – The sample number to retrieve

Returns:

Polars LazyFrame with the sample data

split_input_data()

Split training data into yearly samples

train_markov_model(bin_tuple: tuple[int, int] = (15, 15), markov_order: int = 2, timedelta_days: int = 15) None

Train markov model based on synthetic data. The trained markov model (based on the MarkovChain class) is saved in the cache and is retrieved when generating synthetic data with the function generate_synthetic_data(). This allows for faster generation of synthetic data without having to train the markov model every time. Specify the number of bins for Hm0 and Tp and the markov order to train. The timedelta_days parameter can be used to influence the offset used for standardizing the weather data. The higher the value, the more days are used for standardizing the weather data (by default 15 days back and forth). This function does not return anything, but saves the trained markov model in the cache.

Parameters:
  • bin_tuple (tuple[int, int]) – Tuple with number of bins for Hs and Tp

  • markov_order (int) – Order of markov model

  • timedelta_days (int) – Number of days to look back and forth to standardize hourly weather data

logsim.weather.get_max_array_val(np_weather, start_floor, end_floor, current_check_hour)
logsim.weather.get_next_window(sample: array, current_hour: float, window_size: int | float, hm0_limit: float, tp_limit: float) float

Given a specific hour and a window size this function will return the first possible window. First the case is checked when the start is immediately happening. If not possible the hours are checked one by one until the next weather window is found.

Parameters:
  • sample – The numpy array with the weather data

  • current_hour – The current simulation hour

  • window_size – The size of the window that is required

  • hm0_limit – The limit for the Hm0 to determine the next possible window with

  • tp_limit – The limit for the Tp to determine the next possible window with

Return float:

The waiting time until the next possible window as decimal hour. An exception is raised if no window is found.

logsim.weather.get_next_window_2_limits(sample: array, current_hour: float, window_sizes: list[int | float], hm0_limits: list[float], tp_limits: list[float]) float

Given a specific hour and 2 window sizes this function will return the first possible window. First the case is checked when the start is immediately happening. If not possible the hours are checked one by one until the next weather window is found.

Parameters:
  • sample – The numpy array with the weather data

  • current_hour – The current simulation hour

  • window_sizes – The sizes of the windows that are required

  • hm0_limits – The limits for the Hm0 to determine the next possible window with

  • tp_limits – The limits for the Tp to determine the next possible window with

Return float:

The waiting time until the next possible window as decimal hour. An exception is raised if no window is found.

logsim.weather.get_next_window_3_limits(sample: array, current_hour: float, window_sizes: list[int | float], hm0_limits: list[float], tp_limits: list[float]) float

Given a specific hour and 3 window sizes this function will return the first possible window. First the case is checked when the start is immediately happening. If not possible the hours are checked one by one until the next weather window is found.

Parameters:
  • sample – The numpy array with the weather data

  • current_hour – The current simulation hour

  • window_sizes – The sizes of the windows that are required

  • hm0_limits – The limits for the Hm0 to determine the next possible window with

  • tp_limits – The limits for the Tp to determine the next possible window with

Return float:

The waiting time until the next possible window as decimal hour. An exception is raised if no window is found.