I learnt an interesting thing last week, the day after I had been working on some DVD Studio Pro subtitles. I had used Excel and BBedit (from barebones software) to do some find and replace operations in order to re-structure the subtitle file that was sent to a friend of mine. As it turned out my familiarity with Excel was good enough to get the file into the correct format, but there were still some bits and pieces that needed tidying up manually. With several hundred lines of sub title to work through that was always going to be a challenge, so I stopped at that point. The subtitle would import into DVD Studio Pro fine, and so I felt reasonably OK that the job was close enough to allow my friend to finish it off without too much trouble.
I mentioned the struggle I had with this to my colleague Alex Blanc, and he suggested I used ‘Grep’. Now, I have seen the checkbox in BBEdit’s ‘find and replace’ dialogue, and I had a vague idea that it worked with regular expressions, but I had not taken the time to read up on it or try it out. I was, in fact, completely in the dark.
Which is a reasonable place to be, for a lot of the time, when it comes to really geeky things!
However, Alex went to work on instructing me about how to use Grep. It turns out to be a really flexible language which has a small instruction set that you can adapt and use to do very intricate find and replace routines. For example, in the entry about DVD Studio Pro subtitles (a couple earlier than this one) you’ll see that the file format was all wrong. There was a full stop towards the end of the timecode that should have been a colon, but I couldn’t simply use find and replace to change all full stops to colons because there were also full stops within the subtitles themselves. However, using Grep you can write a command that looks for specific instances of full stops nestling between digits, and you can be pretty explicit about the string of digits before and after the full stops that you want to replace.
When complete, the search field in BBedit looks a bit odd, using strings like this:
^\s\d{1,3}:\s(\d{2}:\d{2}:\d{2})\.(\d{2})\s(\d{2}:\d{2}:\d{2})\.(\d{2})\s*\r\s*(.+)$
But when you understand what it is saying you can really get a lot out of it. Each character in that string represents a different command. For example ‘^\s’ is looking for a space character, ‘\d’ looks for a digit and the curly braces give the search criteria for the digits. Reading along that line then the function is going to look for a series of digits followed by a colon, then followed by a space and then another distinct series of digits (notice how the distinct set are held in parentheses) ending with a full stop and then another two digits…. and so it goes on.
What that string is doing is describing the general format of the incorrect timecode from the subtitle file and setting up sections that we can rely on when replacing the content.
Now, Alex is particularly gifted with this stuff, in my opinion, and can see the way through a lot of the code like he was part of the Matrix. I, on the other hand, tend to need time to go through it all slowly after being shown it so that I can then begin to understand it in my own time. I think it’s an age thing…
Anyway, Alex had written the Grep string in less than ten minutes, having tried a couple of versions first to get it working. The end result was that in about ten minutes, using only BBEdit, Alex had re-structured the subtitle file perfectly – mine still had issues that needed sorting out manually.
I’m not sure what message this gives me, really. I am now reading through the Grep manual that comes with BBEdit and it makes a lot of sense. It can do ‘If:Then’ type structures and you can nest commands quite deep. This gives it a lot of power and more flexibility than you can get by using Excel alone. Those of you reading this who are quietly chuckling away because you have been using Grep in Linux or Unix installations should share a bit of that knowledge… it really is a very useful tool, if slightly arcane to get to grips with!