Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
jacksklar authored Jun 30, 2021
1 parent 42d8649 commit e8553ff
Show file tree
Hide file tree
Showing 7 changed files with 1,328 additions and 2 deletions.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
/Datasets/CIFAR/
/Datasets/FashionMNIST/
/Datasets/MNIST/
/Datasets/Noise/
/Datasets/OFDM/
/experiment_results/
.idea
/IQGAN_project_archived/
78 changes: 76 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,76 @@
# opensource-repo
This repository is the recommended template repository for NIST opensource contributions.
# <u> **Software for Modeling OFDM Communication Signals with Generative Adversarial Networks** </u>

This repository contains Python code to generate results for experiments on generative modeling of radio frequency (RF) communication signals, specifically synthetic Orthogonal-Frequency Division Multiplexing (OFDM) signals. This code implements two novel Generative adversarial network (GAN)
models, a 1D and 2D convolutional model, named **PSK-GAN** and **STFT-GAN**, respectively, as well as the **WaveGAN** model architecture as
a baseline for comparison. All three GAN models are trained on synthetic datasets over a range of OFDM parameters and conditions and evaluate
their performance with simulated datasets.

## Software Implementation
The software enables automated testing of many model configurations across different datasets. Model creation and training is implemented
using the Pytorch library. This repository contains files for initializing the experiment test runs (`main.py`), training of GAN models(`gan_train.py`), loading target distributions (`data_loading.py`), and evaluation(`gan_evaluation.py`) of generated distributions. The `/utils` directory contains
supporting modules for target dataset creation, and model evaluation. The `models/` directory contains modules that create **PSK-GAN**, **STFT-GAN**,
and **WaveGAN** architectures.

Running `main.py` runs the default GAN configuration specified by the configuration dictionary `./experiment_resources/training_specs_dict.py`.
Descriptions for the fields specified in `./experiment_resources/training_specs_dict.py` are located in
`./experiment_resources/configuration_dictionary_description.csv`. Additionally, a set of model configurations can be run in an automated fashion
by passing a configuration table (csv file) as an argument to the main python module (ex. `main.py --configs path_to_config_table.csv`). Column labels
of a configuration table should correspond to desired keys in the GAN configuration dictionary that are to be changed across runs.

The training and test target datasets used in this study were synthesized using the script
`scripts/target_data_synth.py` and are provided in a separate gzip file, `target_distributions.tar.gz`. To execute experiments, first unzip this file and place its contents in a subdirectory named `Data/`. When running the models, experimental results are saved in `experiment_results/`, which is divided into sub-folders corresponding to each experiment. Each experiment folder contains sub-folders with results from three trials of each test configuration. Each test-trial folder contains saved GAN models, training metadata, as well as evaluations of the generated distributions. Previously-computed experimental results are saved in `experiment_results.tar.gz`

## <u>Requirements</u>
We use a `conda` virtual environment to manage the project library dependencies.
Run the following commands to install requirements to a new conda environment:
```setup
conda create --name <env> --file .experiment_resources/requirements.txt
conda activate <env>
pip install -r .experiment_resources/pip_requirements.txt
```


## <u>Running Experiments</u>
This code executes three experiments: (1) a data complexity experiment, (2) a modulation order experiment, and (3) a fading channel experiment. In order to reproduce results from each of the three experiments, run
```angular2html
main.py --configs ./experiment_resources/test_configs_complexity_PSKGAN.csv
main.py --configs ./experiment_resources/test_configs_complexity_WaveGAN.csv
main.py --configs ./experiment_resources/test_configs_complexity_STFTGAN.csv
main.py --configs ./experiment_resources/test_configs_modulation_STFTGAN.csv
main.py --configs ./experiment_resources/test_configs_channel_STFTGAN.csv
```
Aggregated plots across model runs are created using the script `./scripts/plotting_script.py`.

## <u>Implementation Notes</u>
Single process multi-GPU training is done using Pytorch's DataParallel method, in order to increase training speed.
(Multi-process multi-GPU (DistributedDataParallel) is not compatible with the gradient penalty operation (autograd.grad)
and is not recommended when using Wasserstein-GP loss).

## <u>Authors</u>
Jack Sklar (jack.sklar@nist.gov) and Adam Wunderlich (adam.wunderlich@nist.gov) \
Communications Technology Laboratory \
National Institute of Standards and Technology \
Boulder, Colorado

## <u>Acknowledgements</u>
The authors thank Ian Wilkins and Sumeet Batra for their contributions to an early version of this software.

## <u>Licensing Statement</u>
This software was developed by employees of the National Institute of Standards and Technology (NIST), an
agency of the Federal Government and is being made available as a public service. Pursuant to title 17 United
States Code Section 105, works of NIST employees are not subject to copyright protection in the United States.
This software may be subject to foreign copyright. Permission in the United States and in foreign countries,
to the extent that NIST may hold copyright, to use, copy, modify, create derivative works, and distribute this
software and its documentation without fee is hereby granted on a non-exclusive basis, provided that this
notice and disclaimer of warranty appears in all copies.

THE SOFTWARE IS PROVIDED 'AS IS' WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY,
INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND FREEDOM FROM INFRINGEMENT, AND ANY WARRANTY
THAT THE DOCUMENTATION WILL CONFORM TO THE SOFTWARE, OR ANY WARRANTY THAT THE SOFTWARE WILL BE ERROR FREE. IN
NO EVENT SHALL NIST BE LIABLE FOR ANY DAMAGES, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, SPECIAL OR
CONSEQUENTIAL DAMAGES, ARISING OUT OF, RESULTING FROM, OR IN ANY WAY CONNECTED WITH THIS SOFTWARE, WHETHER OR NOT
BASED UPON WARRANTY, CONTRACT, TORT, OR OTHERWISE, WHETHER OR NOT INJURY WAS SUSTAINED BY PERSONS OR PROPERTY OR
OTHERWISE, AND WHETHER OR NOT LOSS WAS SUSTAINED FROM, OR AROSE OUT OF THE RESULTS OF, OR USE OF, THE SOFTWARE
OR SERVICES PROVIDED HEREUNDER.

274 changes: 274 additions & 0 deletions data_loading.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
"""
This module holds pytorch related data loading methods and data
preprocessing methods related to target data representation.
This file can also be imported as a module and contains the following functions:
* unpack_complex - unpack complex waveform to two-channel real-valued waveform
* pack_to_complex - pack two-channel real-valued waveform to complex waveform
* scale_dataset - scale target distribution to range [-1, 1]
* load_target_distribution - load target training distribution from files
* TargetDataset - wrapper for dataset for ease of loading into pytorch framework
* build_DataLoader - create PyTorch DataLoader object
* get_latent_vectors - load batch of latent vectors for input to generator
* stft_to_waveform - convert complex STFT representation to complex waveform
* waveform_to_stft - convert complex waveform to complex STFT
* pad_signal_to_power_of_2 - zero-pad waveform to next power of 2
* unpad_signal - remove zero-padding from waveform
"""

import h5py
import json
import torch
import numpy as np
from scipy import signal
from numpy.fft import fftshift
from sklearn import preprocessing
from scipy.stats import truncnorm
from torch.utils.data import Dataset


def unpack_complex(iq_data):
"""
Convert complex 2D matrix to 3D matrix with 2 channels for real and imaginary dimensions
:param iq_data: numpy complex matrix (2D)
:return: numpy floating point matrix (3D)
"""
iq_real = iq_data.real
iq_imaginary = iq_data.imag
iq_real = np.expand_dims(iq_real, axis=1) # Make dataset 3-dimensional to work with framework
iq_imaginary = np.expand_dims(iq_imaginary, axis=1) # Make dataset 3-dimensional to work with framework
unpacked_data = np.concatenate((iq_real, iq_imaginary), 1)
return unpacked_data


def pack_to_complex(iq_data):
"""
convert 3D matrix with 2 channels for real and imaginary dimensions to 2D complex representation
:param iq_data: numpy floating point matrix (3D)
:return: numpy complex matrix (2D)
"""
num_dims = len(iq_data.shape)
if num_dims == 2:
complex_data = 1j * iq_data[:, 1] + iq_data[:, 0]
elif num_dims == 3:
complex_data = 1j * iq_data[:, 1, :] + iq_data[:, 0, :]
else:
complex_data = 1j * iq_data[:, 1, :, :] + iq_data[:, 0, :, :]
return complex_data


def scale_dataset(data, data_set, data_scaler):
"""
Scale target distribution's range to [-1, 1] with multiple scaling options
:param data: Target distribution
:param data_set: dataset name
:param data_scaler: data-scaler setting
:return: scaled target distribution
"""
if data_scaler == "activation_scaler":
return data, None

# Feature Based data scaling:
if data_scaler.find("feature") != -1:
print(f"feature Based Scaling: {data_scaler}")
data_shape = data.shape
data = data.reshape(data_shape[0], -1)
transformer = preprocessing.MaxAbsScaler() if data_scaler == "feature_max_abs" \
else preprocessing.MinMaxScaler(feature_range=(-1, 1))
transformer = transformer.fit(data)
data = transformer.transform(data)
data = data.reshape(data_shape)
return data, transformer

# Global Dataset scaling:
elif data_scaler.find("global") != -1:
transformer = None
with open(rf'./Datasets/{data_set}/scale_factors.json', 'r') as F:
channel_scale_factors = json.loads(F.read())
channel_max = channel_scale_factors["max"]
channel_min = channel_scale_factors["min"]
if data_scaler == "global_min_max":
feature_max, feature_min = 1, -1
data = (data - channel_min) / (channel_max - channel_min)
data = data * (feature_max - feature_min) + feature_min
else:
data = data / np.max(np.abs([channel_max, channel_min]))
return data, transformer


def load_target_distribution(data_set, data_scaler, pad_signal, num_samples, stft, nperseg, fft_shift):
"""
Load in target distribution, scale data to [-1, 1], and unpack any labels from the data
:param fft_shift: Shift STFT to be zero-frequency centered
:param nperseg: STFT FFT window length
:param stft: Convert complex waveform to STFT
:param num_samples: Number of samples to load from the target distribution
:param pad_signal: Length of zero padding target distribution waveforms
:param data_set: Name of dataset
:param data_scaler: Name of scaling function option
:return: PyTorch tensors
"""
d_type = complex
h5f = h5py.File(rf"./Datasets/{data_set}/train.h5", 'r')
real_dataset = h5f['train'][:]
print("Dataset_length: ", len(real_dataset))
h5f.close()
data = np.array(real_dataset[:, 1:]).astype(d_type)
class_labels = np.real(real_dataset[:, 0]).astype(np.int)

if int(num_samples) > 64:
data = data[:num_samples]
class_labels = class_labels[:num_samples]

input_length = len(data[0, :])
pad_length = None
if pad_signal and not stft:
# WaveGAN uses strides of 4 so waveforms are padded to be powers of 2
data, pad_length = pad_signal_to_power_of_2(data)
input_length = pad_length + input_length
if stft:
data, pad_length = pad_signal_to_power_of_2(data)
data, f, t = waveform_to_stft(data, 2, nperseg)
if fft_shift:
data = np.fft.fftshift(data, axes=(1,))

input_length = (nperseg, data.shape[-1])
data = data.reshape(data.shape[0], nperseg, -1)
data = data.view(complex)
data = unpack_complex(data).view(float) # Unpacking complex-representation to 2-channel representation

data = np.expand_dims(data, axis=1) if len(data.shape) < 3 else data
data, transformer = scale_dataset(data, data_set, data_scaler)
data = torch.from_numpy(data).float()
class_labels = torch.from_numpy(class_labels).float()
return data, class_labels, input_length, pad_length, transformer


class TargetDataset(Dataset):
"""
Wrapper for dataset that can be easily loaded and used for training through PyTorch's framework.
Pairs a training example with its label in the format (training example, label)
"""
def __init__(self, data_set, data_scaler, pad_signal, num_samples, stft=False, nperseg=0, fft_shift=False, **kwargs):
self.dataset, self.labels, self.input_length, self.pad_length, self.transformer = \
load_target_distribution(data_set, data_scaler, pad_signal, num_samples, stft, nperseg, fft_shift)

def __len__(self):
return self.dataset.shape[0]

def __getitem__(self, idx):
return self.dataset[idx], self.labels[idx]


def build_DataLoader(dataset_specs):
"""
Creates new Dataset, Sampler, and DataLoader using train_specs_dict. data-factors are
specified by dataset-specs dictionary
:param dataset_specs: dictionary defining data-specific
:return: DataLoader
"""
dataset = TargetDataset(**dataset_specs)
sampler = None
return dataset, sampler


def get_latent_vectors(batch_size, latent_size, latent_type="gaussian", device="cuda:0"):
"""
Load latent space variables and fake labels used for Generator
:param latent_type: Uniform or Gaussian latent distribution
:param batch_size: length of batch
:param latent_size: lantent space random seed variable dimension
:param device: nvidia-device object
:return: latent variable pytorch-tensor and fake class labels
"""
if latent_type == "gaussian":
z = torch.randn(batch_size, latent_size, 1, device=device)
elif latent_type == "uniform":
z = torch.from_numpy(np.random.uniform(low=-1.0, high=1.0, size=(batch_size, latent_size, 1))).float().to(device)
else:
truncate = 1.0
lower_trunc_val = -1 * truncate
z = [] # assume no correlation between multivariate dimensions
for dim in range(latent_size):
z.append(truncnorm.rvs(lower_trunc_val, truncate, size=batch_size))
z = np.transpose(z)
z = torch.from_numpy(z).unsqueeze(2).float().to(device)
return z


def stft_to_waveform(dataset, fs=2, nperseg=64):
"""
Transform Short-Time-Fourier-Transform (STFT) representation to complex waveform
:param dataset: STFT Dataset
:param fs: Sampling frequency (Hz)
:param nperseg: N-Per-Segment Window length
:return: Complex waveform dataset
"""
waveform_dataset = []
print("Mapping STFT dataset to timeseries:", end=" ")
for i, spectrogram in enumerate(dataset):
if i % 10000 == 0:
print(i)
t, x = signal.istft(spectrogram, fs, nperseg=nperseg, noverlap=int(nperseg * 0.75), input_onesided=False)
waveform_dataset.append(x)
waveform_dataset = np.array(waveform_dataset, dtype=complex)
return waveform_dataset


def waveform_to_stft(dataset, fs=2, nperseg=64):
"""
Convert complex waveform representation to Transform Short-Time-Fourier-Transform (STFT) representation
:param dataset: Complex waveform dataset
:param fs: sampling frequency (Hz)
:param nperseg: N-per-segment window length
:return: STFT Dataset
"""
stft_dataset = []
print("Mapping timeseries dataset to stft")
for i, x in enumerate(dataset):
if i % 10000 == 0:
print(i)
f, t, spectrogram = signal.stft(x, fs=fs, nperseg=nperseg, noverlap=int(nperseg * 0.75),
return_onesided=False, boundary="even")
stft_dataset.append(spectrogram)
stft_dataset = np.array(stft_dataset, dtype=complex)
return stft_dataset, f, t


def pad_signal_to_power_of_2(waveform_dataset):
"""
Add zero padding to signal to nearest power of 2
:param waveform_dataset: Target Distribution
:return: zero-padded target distribution, zero-padding length
"""
waveform_length = waveform_dataset.shape[-1]
d_type = complex
found = False
test_int = waveform_length
next_power_of_2 = None
while found is False:
if test_int & (test_int - 1) == 0:
found = True
next_power_of_2 = test_int
else:
test_int += 1
pad_length = next_power_of_2 - waveform_length
padding_array_1 = np.zeros((len(waveform_dataset), pad_length // 2)).astype(d_type)
padding_array_2 = np.zeros((len(waveform_dataset), pad_length // 2)).astype(d_type)
padding_array_1, padding_array_2 = padding_array_1 + 1e-8, padding_array_2 + 1e-8
waveform_dataset = np.hstack((padding_array_1, waveform_dataset, padding_array_2))
return waveform_dataset, pad_length


def unpad_signal(waveform_dataset, pad_length):
"""
Remove zero-padding of signal
:param waveform_dataset: zero-padded dataset
:param pad_length: length of zero-padding
:return: waveform dataset
"""
if pad_length > 0:
waveform_dataset = waveform_dataset[:, :, pad_length // 2: - pad_length // 2]
return waveform_dataset
else:
return waveform_dataset
Loading

0 comments on commit e8553ff

Please sign in to comment.