It seems like the Geometry Score is a necessary but not sufficient condition for the approximate distribution to be close to the real one. Meaning, a good (low) score doesn’t necessarily imply a good generator but a bad (high) score implies the generating data’s distribution is likely very different from the real data. It was exciting to see this sort of reasoning built around topological similarities of the distributions of real and generated data and I’m hoping we’ll see similar works in the future.
The Toward Theoretical Understanding of Deep Learning tutorial highlights a method from the same authors’ paper Do GANs learn the distribution? Some Theory and Empirics published in ICLR this year, which is worth bringing up again. This method relies on the birthday paradox, which establishes how large a population is based on the number of identical values in a sample. By sampling images, finding near duplicates with a heuristic and using a human in the loop to evaluate the duplicates, the method can give an estimate of the generator’s diversity.
Is Generator Conditioning Causally Related to GAN Performance? expands upon previous work showing the importance of controlling the Jacobian singular values of deep neural networks. The authors focus on the condition number (CN), which is the ratio between the largest and smallest singular values. This choice is motivated partially by its theoretical link to instability (lower is more stable), and partially by the authors’ observation that this correlates well with both the IS and FID. Beyond studying the CN they also study the entire spectrum of singular values (here the ordered vectors of the singular values). Two things that seem particularly noteworthy:
- Roughly half of their runs get a low CN, and the other half a high CN, with values in the same cluster being pretty close. Thus, they’re able to identify “good” and “bad” training runs.
- The authors study the singular value spectrum of both GANs and VAE decoders average Jacobian (see figure below). They note that a) VAEs tend to have less variance in the singular values between runs and b) VAEs tend to have lower CN. This is interesting because it is a quantity which is taken to reflect the stability of training, and when applied to a comparison of VAEs and GANs, reflects the general experience that GANs are significantly less stable.
Finally, the authors suggest a method for controlling the range of the singular values to reduce the CN which they call Jacobian Clamping. The authors remark that Spectral Normalization for Generative Adversarial Networks tries to achieve a similar task to Jacobian Clamping. This link is not fully explored, but is yet another work showing improved results by studying and controlling the singular values of neural networks. I’d expect even more exciting work in this area over the coming year.