Brian Lee, Logan Eisenbeiser
Virginia Tech National Security Institute, 2022
ITEA Article
Ever since their introduction in 2014 by Ian Goodfellow and colleagues,
generative adversarial networks (GANs) have been a popular subject of study
in the field of generative modeling methods. They have gained particular
popularity in the community because of the relatively heightened realism
and originality of the samples they can generate.
Despite the exciting results GANs have achieved, the training process is
often challenging, fraught with known pitfalls resulting in poor model
behavior. Some of these training failures occur by the adversarial
nature of the model's two networks, in which one model's performance comes
at the expense of the other. Other training failures arise from computational
constraints. Not only does training time need to be long enough, but
sufficiently diverse and abundant training data is required to produce
realistic results.
In this project, I worked with my supervisor Logan Eisenbeiser to study
how to train GANs capable of realistic image generation with shorter training time
and with less data by way of transfer learning.
Transfer learning is the method of continuing the training of a pretrained
network on a new dataset, a target domain.
Firstly, we desired a way to curate small datasets of images. To do so,
we created a data pipeline to automate the scraping of images and filtering
of images. For the scraping portion of the pipeline, we scraped images
from public Instagram posts by hashtag using the Python Instascrape package.
Images from Instagram emulate the kind of variability that would exist in domains
limited by specifiicity or an abundance of examples.
After scraping and storing these images, the filter portion of a pipeline
would sort through the raw dataset and discard poor training examples or those
with elements difficult for the GAN to train with. This entails only keeping
images of a sufficient resolution, sharpness, and automatically rejecting
images that contain text, human faces, or are grayscale. What is left behind
is a small dataset ready for training.
Next, training experiments were then run on Virginia Tech Advanced Research
Computing (ARC) GPU clusters. We used NVIDIA's StyleGAN3 model, a GAN model
known for being adept at differentiating alternate styles of generated images.
Using batch jobs of Python scripts, GANs were trained using found hyperparameters,
a chosen network, dataset, and training length.
Performance of these trained GANs and the realism of their generations are
measured by the Frechet Inception Distance (FID) score that measures the
distance between the real and generated images. The model generates examples
at snapshots in the training process to assemble performance metrics throughout training.
As hypothesized, certain datasets resulted in better quality generations
than others. In addition, the choice of pre-trained network seemed to
yield different training results as well.
First, I chose images from "#beachsunsets" to serve as a control dataset because of
its relative simple contents and geometry. While some datasets would perform worse
with a higher FID score, there would be some datasets that perform better.
A dataset scraped from "#bettafish" roughly saw a 21% reduction in FID score compared
to the control dataset as training continued. The sample generations from the networks
seemed to confirm this improvement, as generations appeared consistent and lifelike,
with the generated fish possessing anatomically correct features.
WIP