Since cutscenes use a lot of similar elements it might be a good idea to write a bytecode interpreter to parse live how the cutszene looks like, so that, e.g. you could do like {0x00,0x01,5,5} where 0x00 would mean "fade screen from white", 0x01 means "move camera to position" and then the next two bytes are the x/y coordinates.
Of cource you'd need to program all these routines manually. I wrote such a system for the gameubino community RPG thing, the code can be found
here