(The same text appears as a front page article, www.getDPI.com; thank you Bob, Jack and Guy)
A (brief) introduction to video, for the stills photographer
Of course, this will be anything but brief! I say this because once I started thinking about the problem, it opened like Pandora’s box. Here’s the essentials, to start a discussion.
In a recent interaction on GetDPI.com, I wrote the following bare bones of an article; Rawfa, mentioned below, is a member who made a lovely one-camera video (https://vimeo.com/81553093). I want to use the interaction on that thread as the framework for bringing out what I see as the two main differences between video and stills as story-telling mediums.
Some background: I was a national network director (ABC-TV, Australia) for five years; I made 350+ live programs, and half-a-dozen film-based documentaries (one which starred Arnold Schwarzenegger, The Comeback). Here, “film-based” means more than the recording medium: it is a whole approach. All the live TV work (350 programs) used what we now call multi-cam (as all video-based TV does; in studio-based setups, you have up to five synchronised cameras, and multiple sound sources also ’synced’; once live, the director simply calls for the required angle; all are visible on a bank of monitors. The TV-based idiom is very different to the film-based approach. But one aspect that remains similar between the mediums is the audience’s expectation of continuity and placement of sound, and the implicit expectation that the sound should seem to come from the person in view on the camera being used. This is a critical, and largely mis-, or not, understood dimension in the modern video-from-stills-camera-era, except in Hollywood. More on this below.
Rawfa’s approach: single camera, multiple takes, on-camera sound recording: ‘going Hollywood’
The way Rawfa shot that lovely video (one camera and recording live sound via the on-camera Røde mic) makes a lot of work in post. One-camera shoots and multiple setups (to get all the angles you need) are traditional Hollywood, but are much more work than the method I use. And as many have noticed, recording sound from a microphone mounted on the camera has multiple drawbacks, even if you mount a decent mic.
And it’s clear that he was 1–3m away from the voice and the drum—and you can capture decent audio at those distances—but the on-camera mic means the perspective of the sound changes as the camera position changes, and on-camera mics are sensitive to handling and camera-operation sounds (like focussing, and iris changes). Humans are very sensitive to this, and find any mechanical sound intrusive; we will tolerate pretty much any visual chopping and changing (think of MTV clips) but we are disturbed by sound track changes, either in terms of perceived loudness, change of perspective (stereo L-R effects), and the quality of the sound (close-mic’d sound is totally different to room atmosphere plus voices, for example).
As Rawfa said:
“But in this particular case the guy called me for a photo shoot and I ended up making him a music video with LIVE sound…which was kind of hell, as each time I shot a different angle the music had a different duration (from 3 to 5 minutes). Editing was much harder than with the artist playing with a master track playing in the background.”
And he added:
“Regarding video, it would have been a smart move from Sony to have chosen a touch screen LCD that allowed you to choose focus by tapping.”
Let me use that comment as the jump-in point to explain my approach:
Multiple cameras recording simultaneously, separate sound recording, method depending on situation
Let’s use Rawfa’s shoot as an example. A relatively inexperienced performer cannot reproduce the same performance in subsequent takes—this was the differences in take durations noted. A skilled performer can come close enough. But no two performances are the same, and if you want to shoot two or (in this case) five takes to get all the camera angles you want, you can enter editing Hell very quickly: done this way, the post production editing process is difficult; there will be no accurate sync between the takes because the performer is actually doing a different performance each time, the perspective of the sound being recorded changes as the camera moves, and so on. In the modern small-camera era, there is a much better way, I believe.
I use multiple µ4/3rds cameras and separate sound recorder(s). I record sound and vision at the same time (sound on the recorder; vision on the cameras). And if the performer wants to do a number of takes, I record the sound of each take, plus the vision of each take. I will detail gear below, but the most important aspect of the way I shoot is not the cameras; it is that I use a number of small broadcast-standard recorders, each of which have different sound characteristics. There are two main approaches I use:
I get the sound into the recorder via lavaliere mics (those small ones you see newsreaders use, pinned on to lapels; these feed into a radio transmitter, hidden on the performer, or out of shot) or use the recorders’ microphones to record sound directly into the recorder. In the latter case, the recorder itself, with its built-in mics, will either be hidden out of shot, or (usually with much better sound) suspended over the performer, and within a metre (three feet) of the mouth of the performer. Getting the mike close to the mouth is critical: sound obeys the inverse square law. What does this mean in practise? It mean that the sound you want to record becomes one quarter as loud if the distance between the sound and the recorder doubles. And the unfortunate effect of this law is that, as the perceived loudness of the voice diminishes with this distance change, the perceived loudness of background sound (compared to the voice) increases; this is one form of the signal-to-noise problem. We usually do not want to hear the background sound in preference to, or competing with, the voice we are recording—so we try to get the mic as close as we can.
What kind of stereo recording?
An aside without getting too technical: one of my recorders, the $179 Zoom H2n, records variable-width mid-side (MS) recording; others use a conventional x-y microphone pattern; the latter is a standard approach to micing stereo sound with one device. Modern recorders work very well. The significance of mid-side recording is that the apparent width of the sound stage being recorded can be changed; and in some noisy situations this can give better final sound. However we get the sound into the recorder, though, it will be higher quality than recording the sound into the camera itself, unless the camera can record clean sound at a minimum standard of 48KHz and 16bit (what pro. editing programs use). Note that lavaliere mics are all monaural (single audio track, with equal loudness in both channels when played back in post into a pair of studio monitors, or headphones); the perception from the viewer’s perspective is that the sound is coming from the centre of the picture. This can be adjusted in post if necessary, but usually I don’t. If, as I often do, I record stereo rather than mono, then as the sound changes from L–R in real time (person A speaks, then person B), the listener/viewer hears this change too. So, for example, in my exercise videos, the person on the left of screen’s voice comes from the L of picture (assuming a wide shot that shows the ‘action’ from the front) and a person in the middle will be heard as speaking from the centre of the picture, and so on.
The Stage metaphor
Since the days of the ancient Greeks, performers work to the front of a stage (where the audience is viewing them or where the cameras are placed) or turn away from it: the point is that most performances include this invisible stage. All films work off the same metaphor, and this is the main reason we need to hear voices from where they actually originate. An important point to note is that, in a low-budget production, if the sound is recorded in mono and the ‘stage’ is created in an establishing “wide shot” (so we, the audience, can see where everyone is) then the brain interprets the person speaking on camera L as the sound coming from the L, even though a monaural signal, by definition, comes from both speakers equally. This helps us, a lot, in practise.
Establishing the stage in video
So, back to my approach: I record vision from a number of cameras simultaneously; one camera records the whole scene; this allow the audience’s point of view (“POV”) to be established (and if the wheels fall off, can be the fallback angle), and the other two cameras record either closer-perspective ’two shots’ (a framing that contains two whole people, or the heads and shoulders of two people) and another might shoot a ‘close-up’ (say, the head, or the hands doing something that’s important to the story). The performer will know where his/her closeup camera is, and when making to-audience remarks, will direct his/her eyeline to it (in this example, eyeline will mean that, when making discursive remarks, as though speaking directly to the viewer, the performer looks at this camera directly). In a documentary setup, performers are asked to ignore the camera(s); this way, the fiction of the invisible observer can be created. There’s more, but you get the idea.
I often use a top shot (a camera suspended over the action) too, because my videos are educational, and people need to see what’s happening in order to imitate. And the sound is recorded on one recorder. Very occasionally I use a number of recorders; if using the latter approach, then each recorder will be recording a single person ‘close up’; all are synced, and when I choose to go to a closeup when cutting the program, then that sound is selected too, so that intimacy is preserved (closer picture requires closer sound, for verisimilitude. Individual micing is not necessary on most low-budget productions. And I define “low-budget” as what we used to call documentary style shooting: small crew numbers. I usually do everything!
Focusing in the video world
The ‘tap to focus’ aspect mentioned above is one key advantage of the Panasonic range over all competitors: unless you want to ‘go Hollywood’, and use a follow-focus setup (and a second someone skilled to operate it), and have the performers hit their marks within fractions of an inch so they are in focus, you need to be able to focus on the fly. Panasonic’s tap-to-focus system works, and it’s an ‘eased’ movement of the focus point, too (starts slow, speeds, then slows as it finds it; this looks very natural). Using AF-C, continuous focus, never works as hoped for, in my experience, anyway. All the AF-C systems, when shooting video, constantly “hunt”: this means that the camera refocuses, and checks, and refocuses again: this is disturbing for the viewer. We use mostly wide angles on our cameras (and as we all know, µ4/3rds has a deeper zone of acceptable focus anyway, compared to FF or APS-C sensors). I use the 14, 17, 20, and 45 mainly. µ4/3rds shoots truly lovely video; 3D is perfectly possibly as is background separation—and what most people don’t know is that µ4/3rds is close to the ‘sensor size’ of 35mm film (which is shot across the negative, not along it); see this video https://www.youtube.com/watch?v=K0shWr-oon4 for details. More on cameras below.
The Slate: tying it all together
Shooting multi-cam, I use an old fashioned slate (clapper board) at the beginning of the recording process—this has many advantages. Assuming I am recording “live” (actual sound and vision being recorded simultaneously) I have no post problems at all: I bring the sound and all camera’s vision into FCPX, and sync the cameras’ angles and the sound tracks on the slate, then simply (while watching all cameras’ angles simultaneously), I decide which angle I want the audience to “see” at any time. All this is non-destructive, and all can be changed. Because sound and vision is already in sync, and the sound is the actual sound, I have no problems in post, and the result sounds real.
This is the merest intro to shooting video. The biggest learning curve for the stills pro learning video is not the angles, or the lighting; you have all that. It’s what does the audience need to see, to tell the story you want to tell, and how to get the best realistic sound (sound that is perceived as real in relation to the vision you are showing). Getting the sound ‘right’ is the key to good video, yet almost without exception, beginning directors focus on image quality. I have to say this again: an audience will tolerate any crappy quality, image-wise (think black and white, super grainy, choppy vision) but they demand sound as close to ‘being there’ as they can get.
As an aside, in Guy’s runway case, I would be taking an audio feed from the emcee’s audio, or record the live audio from the audience’s perspective (assuming the live audio is good; often it’s not); you can always get a feed from whoever is doing the sound for the show. I would record this audio with a recorder plugged in to the desk itself; this is what the audience is hearing, after all. And whatever else I record (if I do) this is the ‘safety sound’, meaning that if everything else fails, sound-wise, I have a program at the end of the process.
Post production (factor in 2–10 times the time it took you to shoot!)
The second biggest part of the learning curve for video is in the post productions editing programs; I use FCPX. It is an amazing program (I have been using FCP since FCP2) but there was a big learning curve moving from FCP studio (FCP7) to FCPX. Learning how to edit convincingly is the hardest part of the additional skill set for stills photographers moving to video, IMHO.
Briefly, and on the gear front: I use Panasonic GX-1, G6, GX-7, and an Oly EM-5 (the latter is my ‘steadycam’: I attach a monopod, and hold loosely, use a relatively wide angle lens (usu. 34mm EFOV) and move like a ninja—and the footage is excellent and cuts perfectly with the rest). I use the other cameras (usually two others, sometime three) on fixed tripods
Re. sound recorders: my go-to recorder is the Sony PCM-10; I mount it to a boom, and position it above the action, and record stereo. Sometimes I use recorders that look like USB sticks; I tape these inside a performer’s collar with Elastoplast; this stops clothing sound completely. These record in relatively high quality mono .mp3 files; I convert these to 48KHz/16 bit using a free program (Audacity), so FCPX can ingest. If recording an event’s sound, you will need to take a ‘line feed’ (this is a higher level than a mic feed; all decent recorders can switch between these. Check the connector needs with the audio dude before production night!
I have one fluid head for any panning/tilting requirements; the other cameras sit on fixed tripods, and usually are unmanned.
Is 4K necessary?
No non-Hollywood director needs more than 1080p video, in my humble opinion. Forget 4K, unless you need extensive post production (reframing, colour, green-screen, etc.). In all my videos for my Vimeo on Demand programs, all we use is the low end of HD, 720p (this means progressive scan, so discrete images) shot at 30 frames a second (“fps”). I suggest forget shooting at 24 fps: unless you obey the physics-determined panning speed specs (way slower than any beginning director ever uses!) then horrible choppy video will result. 30fps looks pleasing, and renders movement well. And there is always 60fps to render rapid movement nicely as an option on many cameras.