Voice notes and Transcriva

At work, I often need to summarize long engineering reports and other documents sent to me by my clients. I could use paper and pen to take notes as I go through a document, but because that’s relatively slow I find myself being overly terse. Also, since I like having my notes in electronic form, I’d have to retype them. I could type my notes directly into the computer, but that’s usually clumsy because both the document and the keyboard have to be in front of me. So I’ve chosen to use a voice recorder and transcription software.

The voice recorder I use is an Olympus DS-2, which is now an “archived product” on the Olympus web site, which I guess is an interesting way of saying they don’t make it anymore.

The DS-2 is reasonably compact: about 4¾ inches long and 1½ inches wide. It runs on two AAA batteries, has jacks for an external microphone and earphones. There’s a Hold slider switch on the left side that—much like the Hold slider on the iPod—disables the other buttons and prevents the recorder from being accidentally activated while in your pocket or bag.

The DS-2 can be set to voice-activated recording, but I find that doesn’t work too well for me; the beginnings of my sentences tend to get clipped. Instead, I use the regular Record button on the front panel. It acts as a Pause button, too: pressing it once starts recording, pressing it again pauses, the next press records again, and so on until I push the Stop button. The advantage of recording this way is that saves everything in a single file. Toggling between the Rec and Stop buttons creates a separate file for each little snippet of audio, which is a nightmare to deal with later.

The recorder’s menu system is accessed through some multi-purpose buttons on the right side of the recorder. It’s not an especially good user interface, but I’m used to the functions I use the most and can generally slog through the others.

The DS-2 saves its audio files in “folders.” There are five available folders, with the clever and unchangeable names A, B, C, D, and E. I suppose if I did a lot of recording at once this would be a good way to separate notes on different projects. But my habit is to download the files to my computer almost immediately, so I’ve never used the folders.

I keep the DS-2’s cradle on my desk, connected to my computer through a USB cord. When I set the DS-2 in its cradle, it shows up as a USB drive on my Mac, and the audio files can be dragged to the computer. The cable can also connected directly to the DS-2, so I don’t need to bring the cradle if I want to record and transcribe on the road.

The audio files are, unfortunately, in WMA format, which isn’t very Apple-friendly. But since I have the Perian QuickTime extender installed, the files can be listened to by any application that uses QuickTime.

I don’t us speech recognition software to turn my audio notes into text because:

  1. I don’t like having to speak all the punctuation—PERIOD, COMMA, NEW PARAGRAPH, etc.—as I dictate.
  2. My recorded notes need lots of editing because, as I mentioned in this post, words don’t flow out of me in anything close to final form.
  3. MacSpeech’s Dictate software doesn’t accept recorded audio; you have to talk into a microphone connected to the computer running Dictate.

Instead, I use Transcriva, a transcription program that plays the audio notes files to me as I type them up. To use it, I just drag the audio file into a new Transcriva window and start listening and typing. I recast my sentences and clean up the awkward phrases as I type.

Transcriva has keyboard shortcuts for Play/Pause, Skip Forward, and Skip Backward, and you can change the default shortcuts to your liking through its Preferences.

Thankfully for slower typists like me, Transcriva allows the playback speed to be changed; there’s a little rabbit-to-bunny slider in the lower left corner of the window. No resampling is done when the playback speed is changed, so the pitch changes, too. I tend to sound a little Darth Vaderish as I slow the audio down so my typing can keep up.

As you can see in the screenshot above, Transcriva puts a timestamp at the top of each transcription snippet. You can also use color-coding to distinguish different speakers, which would be useful for transcribing interviews.

The transcription text can be exported as either Rich Text or Plain Text. Since my goal is to get the transcribed notes into my project notes wiki, I type up the transcription in Markdown format and use the Plain Text exporter. Unfortunately, this is where two of Transcriva’s idiosyncrasies get in my way:

  1. Transcriva insists on sticking a .txt extension on the name of the exported file, even when I tell it to use .md (for Markdown) instead. So I have to fix the file name to get it to work with my project notes wiki.
  2. Transcriva saves the text file with the old-style CR line endings instead of the current Unix-style LF line endings. This is a hidden problem that can screw up the Markdown processing that the file later goes through. To change the file to the more orthodox line endings, I run it through a converter program that changes the CRs to LFs. (I could also open it in TextMate or BBEdit and resave it with the correct line endings. If you’re unfamiliar with the line endings issue, this Wikipedia article gives a pretty good rundown.)

Overall, I’ve found this to be a good way to make notes on long documents. With the recorder, I can make the notes quickly almost anywhere—I don’t need a computer or even a solid writing surface. And the later transcription lets me think about what I said and fix up the stumbles, redundancies, and badly-constructed sentences.