Less than you might think, considering the small range perspectives involved. Rendering to a stack of layers or a grid of offsets technically counts. It is more information than simply transmitting a flat frame… but update rate isn’t do-or-die, if the headset itself handles perspective.
Optimizing for bandwidth would probably look more like depth-peeled layers with very approximate depth values. Maybe rendering objects independently to lumpy reliefs. The illusion only has to work for a fraction of a second, from about where you’re standing.