Weather
The weather module contains two classes, the logsim.weather.MarkovChain class and the logsim.weather.WeatherData class .
The WeatherData class is designed to train, use and save a Markov model that is able to generate random weather data.
The weather data class takes different setting parameters as the number of bins, the number of hours to generate and the number of samples to generate.
When the weather module class is initialized, it will look into the cache to retrieve previously trained models (checks it based on the input file name).
The columns name should match the following names exactly: [date_time,Hm0, Tp] in order for the WeatherData class to work properly, respectively the date time in hours, the critical wave height and the peak period.
The date_time column should be in the ISO8601 format, for example: 2003-01-03T11:00.
The data should be in hourly intervals.
When new data is present and no models are present in the cache, the WeatherData class will train a new model and save it in the cache.
The WeatherData class uses the MarkovChain class to fit a Markov model to the data and to store the transition matrix.
The MarkovChain class is also used to generate a random new state based on the current state and the corresponding probabilities stored in the transition matrix.
This is done by using the logsim.weather.MarkovChain.find_next_state() method of the MarkovChain class.
- class logsim.weather.MarkovChain(n: int)
- find_next_state(current_row: int) tuple[int, int, tuple[int, int]]
This helper function can be used to return a random new state based on the current state and the corresponding probabilities stored in the transition matrix.
- Parameters:
current_row – The row number of the transition matrix that corresponds to the current state of the weather (row, state mappings are stored in self.possible_from_states)
- Returns:
The new state, the new row number and the new combination of states. The new row is the row number of the transition matrix that corresponds to the new state of the weather. The new state is the state number of the new state. The new combination of states is the combination of the new state and the previous state in a tuple.
- fit(data: array) None
This function fits the Markov Chain to the given data. The data is a numpy array of integers that represent the states of the weather in the given order. The weights are optional and can be used to weight the data.
- Parameters:
data (np.array) – The data to fit the Markov Chain to.
- class logsim.weather.WeatherData(file_name: str, start_day: int, start_month: int, synthetic: bool = True, synthetic_data_samples: int | None = None, train_model: bool = False, bin_tuple: tuple[int, int] = (15, 15), markov_order: int = 2, timedelta_days: int = 15, sample_hours: int = 10000, scale_factor: float = 1.0, experiment_cache=None)
- generate_synthetic_data() LazyFrame | None
Generate synthetic data based on the transition probabilities of a cached Markov Chain model. The set parameters of the class are used to generate the appropriate amount of samples with the appropriate size. The synthetic data is generated based on the standardized model. After generation the data standardization is reverted based on the given start day and month. The generated data is cached and is retrieved if using the same parameters.
- Returns:
A Polars LazyFrame containing the generated synthetic data samples or None if cached data is available.
- get_sample(sample_no: int) -> (<built-in function array>, <class 'polars.internals.lazyframe.frame.LazyFrame'>)
Get a sample from the synthetic data by filtering the LazyFrame on the sample number. Tests if the sample number is valid and if the synthetic data is available in the cache.
- Parameters:
sample_no (int) – The sample number to retrieve
- Returns:
Polars LazyFrame with the sample data
- split_input_data()
Split training data into yearly samples
- train_markov_model(bin_tuple: tuple[int, int] = (15, 15), markov_order: int = 2, timedelta_days: int = 15) None
Train markov model based on synthetic data. The trained markov model (based on the MarkovChain class) is saved in the cache and is retrieved when generating synthetic data with the function
generate_synthetic_data(). This allows for faster generation of synthetic data without having to train the markov model every time. Specify the number of bins for Hm0 and Tp and the markov order to train. The timedelta_days parameter can be used to influence the offset used for standardizing the weather data. The higher the value, the more days are used for standardizing the weather data (by default 15 days back and forth). This function does not return anything, but saves the trained markov model in the cache.- Parameters:
bin_tuple (tuple[int, int]) – Tuple with number of bins for Hs and Tp
markov_order (int) – Order of markov model
timedelta_days (int) – Number of days to look back and forth to standardize hourly weather data
- logsim.weather.get_max_array_val(np_weather, start_floor, end_floor, current_check_hour)
- logsim.weather.get_next_window(sample: array, current_hour: float, window_size: int | float, hm0_limit: float, tp_limit: float) float
Given a specific hour and a window size this function will return the first possible window. First the case is checked when the start is immediately happening. If not possible the hours are checked one by one until the next weather window is found.
- Parameters:
sample – The numpy array with the weather data
current_hour – The current simulation hour
window_size – The size of the window that is required
hm0_limit – The limit for the Hm0 to determine the next possible window with
tp_limit – The limit for the Tp to determine the next possible window with
- Return float:
The waiting time until the next possible window as decimal hour. An exception is raised if no window is found.
- logsim.weather.get_next_window_2_limits(sample: array, current_hour: float, window_sizes: list[int | float], hm0_limits: list[float], tp_limits: list[float]) float
Given a specific hour and 2 window sizes this function will return the first possible window. First the case is checked when the start is immediately happening. If not possible the hours are checked one by one until the next weather window is found.
- Parameters:
sample – The numpy array with the weather data
current_hour – The current simulation hour
window_sizes – The sizes of the windows that are required
hm0_limits – The limits for the Hm0 to determine the next possible window with
tp_limits – The limits for the Tp to determine the next possible window with
- Return float:
The waiting time until the next possible window as decimal hour. An exception is raised if no window is found.
- logsim.weather.get_next_window_3_limits(sample: array, current_hour: float, window_sizes: list[int | float], hm0_limits: list[float], tp_limits: list[float]) float
Given a specific hour and 3 window sizes this function will return the first possible window. First the case is checked when the start is immediately happening. If not possible the hours are checked one by one until the next weather window is found.
- Parameters:
sample – The numpy array with the weather data
current_hour – The current simulation hour
window_sizes – The sizes of the windows that are required
hm0_limits – The limits for the Hm0 to determine the next possible window with
tp_limits – The limits for the Tp to determine the next possible window with
- Return float:
The waiting time until the next possible window as decimal hour. An exception is raised if no window is found.