data

Data reading and wrangling functionality

Synthetic data

Except for the first and last lines, everything else comes from Rubanova’s implementation (comments mine)

make_periodic_dataset

 make_periodic_dataset (timepoints:int, extrap:bool, max_t:float, n:int,
                        noise_weight:float)

	Type	Details
timepoints	int	Number of time instants
extrap	bool	Whether extrapolation is peformed
max_t	float	Maximum value of time instants
n	int	Number of examples
noise_weight	float	Standard deviation of the noise to be added

time, observations = make_periodic_dataset(timepoints=100, extrap=True, max_t=5.0, n=200, noise_weight=0.01)
time.shape, observations.shape

(torch.Size([101]), torch.Size([200, 101, 1]))

PyTorch

A class defining a (somehow complex) collate function for a PyTorch DataLoader

source

CollateFunction

 CollateFunction (time:torch.Tensor, n_points_to_subsample=None)

Initialize self. See help(type(self)) for accurate signature.

	Type	Default	Details
time	Tensor		Time axis [time]
n_points_to_subsample	NoneType	None	Number of points to be “subsampled”

Let us build an object for testing

collate_fn = CollateFunction(time, n_points_to_subsample=50)
collate_fn

Collate function expecting time series of length 101, with the second half to be predicted from the first.

We also need a PyTorch DataLoader

dataloader = torch.utils.data.DataLoader(observations, batch_size = 10, shuffle=False, collate_fn=collate_fn)
dataloader

<torch.utils.data.dataloader.DataLoader>

How many batches is this DataLoader providing?

n_batches = len(dataloader)
n_batches

Let us get the first batch

batch_bundle = next(iter(dataloader))
type(batch_bundle)

dict

Notice that, as seen from CollateFunction.__call__ function’s prototype, the type is returned is a dictionary. It contains the following fields

print(batch_bundle.keys())

dict_keys(['observed_time', 'observed_data', 'to_predict_at_time', 'to_predict_data', 'observed_mask'])

observed_time and observed_data is the first part of a time series we want to learn, whereas
to_predict_at_time, to_predict_data is the second part of the same time series we aim at predicting; on the other hand
observed_mask is True for every observation that is available (it only applies to the observed data)

If one must think of this in terms of an input, \(x\), that is given, and a related output, \(y\), that is to be predicted, the latter would be to_predict_data and the former would encompass the rest of the fields.

We can check the size of every component

for k, v in batch_bundle.items():
    print(f'Dimensions of {k}: {tuple(v.shape)}')

Dimensions of observed_time: (50,)
Dimensions of observed_data: (10, 50, 1)
Dimensions of to_predict_at_time: (51,)
Dimensions of to_predict_data: (10, 51, 1)
Dimensions of observed_mask: (10, 50, 1)

In this simple example, every observatios is available

(batch_bundle['observed_mask'] == 1.).all()

tensor(True)

GPU support

If one wants to move this object to another device, this function will do that for all the relevant internal state.

source

CollateFunction.to

 CollateFunction.to (device)