Are you making it purely 2D in ortho mode? The basic idea behind video rendering is to extract each video frame and load it as a texture to graphics card memory (btw this transfer is very slow). Don't know if buffering 5 or so frames help but it may be possible to use any "idle time" during playback to load as many frames in time allows. Either way, a simple TMediaplayer should greatly outperform a graphics card based videoplayer.

Drawing the frame itself in position, was it 2D or 3D is a trivial task. Using glBegin(GL_QUADS), glTexCoord2f(), glVertex2f(), finally glEnd().

Don't know the best way to loading a 2D image, made functions myself for each format like BMP, JPG, PNG, GIF. But there may be some public libraries available.