Big Blue Button: webcam, chat and presentation in one video

It is known BBB doesn’t support downloading a recording as a single file. The main reason for this is that it ruins the experience (no jumping to specific slides) and is not flexible (can’t rearrange or resize content); the other reason is that this kind of content and usability requirements increase complexity and this is the reason the playback page exists.

While I personally really enjoy the playback page, I’m constantly bombarded with can you send me the recording? questions all the time. I’m tired of having to say no or giving lengthy explanations on why I can’t give the users a single video file they can download or store on portable storage. This motivated me to take a look on how 2.0 BBB’s published recordings work and this post is my solution to the problem. It only handles basic usage (no pan/zoom, no cursor and no deskshare), but it covers the most common use case in my experience and I hope it’ll be useful to someone else as much as it has been to me.

import xml.etree.ElementTree as ET
import cairocffi
import pangocffi
import pangocairocffi
import os.path
import shlex
import subprocess
import sys

The shapes

In the published recording folder, BBB stores when and what slide was displayed in the shapes.svg file. This file has a bunch of <image> tags that describe when and what file should be displayed (fields file, in and out). Here we’ll parse shapes.svg and generate an appropriate input file for FFmpeg’s concat demuxer which we’ll be using later to generate a video out of the presentation.

def parse_shapes(directory):
    tree = ET.parse(os.path.join(directory, "shapes.svg"))
    root = tree.getroot()
    
    shapes = []
    for x in root.findall('{http://www.w3.org/2000/svg}image'):
        shapes.append({
            'start': float(x.attrib["in"]),
            'end': float(x.attrib["out"]),
            'file': os.path.realpath(os.path.join(directory, x.attrib["{http://www.w3.org/1999/xlink}href"])),
        })
    return shapes

concat expects a file with the syntax bellow. The duration field is optional but we have to use it or else the slides will just flick for a frame or two.

file 'file1'
duration 2
file 'file2'
duration 15

def write_shape_inputs(directory, shapes):
    try:
        os.mkdir(os.path.join(directory, "temp"))
    except FileExistsError:
        pass
    
    with open(os.path.join(directory, "temp", "shapes.in"), "w") as f:
        for shape in shapes:
            filename = shape["file"].replace("'", r"\'")
            print(f"file '{filename}'", file=f)
            print(f"duration {int(shape['end'] - shape['start'])}", file=f)

The chat problem

BBB stores the chat contents in slides_new.xml as a series of <chattimeline> tags containing information about who said what and when (fields name, text and in). My guess by looking at the target field of those tags is that this file is also used for other things, such as questions, but we don’t care about them here so we’ll just make sure we only parse entries targeting chat.

def parse_chat(directory):
    tree = ET.parse(os.path.join(directory, "slides_new.xml"))
    root = tree.getroot()

    messages = []
    
    for x in root.findall("chattimeline"):
        if x.attrib["target"] == "chat":
            messages.append({
                'timestamp': int(x.attrib["in"]),
                'name': x.attrib["name"],
                'text': x.attrib["message"]
            })
    
    return messages

So far so good, but how do we display this text on the video? FFmpeg’s drawtext seemed like a good idea at first, but getting hundreds of messages to display at specific times proved challenging even using the timeline editing feature and I don’t want to get into how to get text scrolling.

I worked around the issue by rendering the messages to png files and then generating a file that the concat demuxer could use to give us a video we could use later.

def write_chat_frames(directory, messages):
    try:
        os.mkdir(os.path.join(directory, "temp"))
    except FileExistsError:
        pass
    
    surf = cairocffi.ImageSurface(cairocffi.FORMAT_ARGB32, 640, 540)
    context = cairocffi.Context(surf)
    layout = pangocairocffi.create_layout(context)
    layout.set_width(pangocffi.units_from_double(636)) # 2px before and after
    
    def format_msg(msg):
        return f"<b>{msg['name']}</b>: {msg['text']}\r\n"
    
    last_timestamp = 0
    markup = ""
    chat_frames = []
    for i in range(0, len(messages)):
        cur_timestamp = messages[i]['timestamp']
        if last_timestamp != cur_timestamp:
            context.rectangle(0, 0, 640, 540)
            context.set_source_rgb(1, 1, 1)
            context.fill()

            context.set_source_rgb(0, 0, 0)
            layout.set_markup(markup)

            height = pangocffi.units_to_double(layout.get_extents()[0].height)
            context.move_to(2, 540 - height)
            pangocairocffi.show_layout(context, layout)

            filename = os.path.realpath(os.path.join(directory, "temp", f"chat_{i}.png"))
            chat_frames.append((filename, cur_timestamp - last_timestamp))

            with open(filename, "wb") as image_file:
                surf.write_to_png(image_file)

        markup += format_msg(messages[i])

        last_timestamp = cur_timestamp
    
    return chat_frames

def write_chat_input(directory, chat_frames):
    try:
        os.mkdir(os.path.join(directory, "temp"))
    except FileExistsError:
        pass
    
    with open(os.path.join(directory, "temp", "chat.in"), "w") as f:
        for filename, duration in chat_frames:
            filename = filename.replace("'", r"\'")
            print(f"file '{filename}'", file=f)
            print(f"duration {duration}", file=f)    

Wrapping it up

Time to put the code to work for us now. All functions assume you’ll be passing a path to a folder generated by BBB’s publishing process, this means the webcam video will get encoded twice but that’s a price I’m willing to pay – that video is tiny anyways. You’ll probably find those folders inside /var/bigbluebutton/published/presentation.

We’ll start by generating the chat frames and concat demuxer files for both chat and slides:

# directory where the published presentation is
PRESENTATION_DIR = "977d4a145cd70f8b013d723cedbab0ba4b8bba74-1585911377811"

# for education purposes, only encode the first 10 minutes. make it [] for the real thing
EXTRA_OPTS = ["-t", "600"]

print("Generating shape.in....")
shapes = parse_shapes(PRESENTATION_DIR)
write_shape_inputs(PRESENTATION_DIR, shapes)
print(f"Processed {len(shapes)} frames")

print("Generating chat frames and chat.in...")
messages = parse_chat(PRESENTATION_DIR)
chat_frames = write_chat_frames(PRESENTATION_DIR, messages)
write_chat_input(PRESENTATION_DIR, chat_frames)
print(f"Processed {len(chat_frames)} frames")

Generating shape.in....
Processed 41 frames
Generating chat frames and chat.in...
Processed 102 frames

Now what’s left to do is to convert all the pictures to videos. The chat frames are all ready to go but we’ll need to use the scale and pad FFmpeg filters during the encode in order to fit the slides in 1280x1080.

shapes_in = os.path.join(PRESENTATION_DIR, "temp", "shapes.in")
shapes_mp4 = os.path.join(PRESENTATION_DIR, "temp", "shapes.mp4")

print("Generating shapes.mp4... (this might take a while)")
with open(os.path.join(PRESENTATION_DIR, "temp", "shapes_mp4.log"), "w") as logfile:
    cmd = [ "ffmpeg", "-y", "-hide_banner",
           "-f", "concat", "-safe", "0", "-i", shlex.quote(shapes_in),
           "-vf", "scale='min(1280,iw)':min'(1080,ih)':force_original_aspect_ratio=decrease,pad=1280:1080:(ow-iw)/2:(oh-ih)/2",
           "-preset", "veryfast", "-tune", "stillimage", "-pix_fmt", "yuv420p"]
    cmd = cmd + EXTRA_OPTS + [shlex.quote(shapes_mp4)]
          
    
    print(f"    {' '.join([shlex.quote(c) for c in cmd])}")
    proc = subprocess.run(cmd, stdout=logfile, stderr=logfile, universal_newlines=True)
print("Done")

Generating shapes.mp4... (this might take a while)
    ffmpeg -y -hide_banner -f concat -safe 0 -i 977d4a145cd70f8b013d723cedbab0ba4b8bba74-1585911377811/temp/shapes.in -vf 'scale='"'"'min(1280,iw)'"'"':min'"'"'(1080,ih)'"'"':force_original_aspect_ratio=decrease,pad=1280:1080:(ow-iw)/2:(oh-ih)/2' -preset veryfast -tune stillimage -pix_fmt yuv420p -t 600 977d4a145cd70f8b013d723cedbab0ba4b8bba74-1585911377811/temp/shapes.mp4
Done

chat_in = os.path.join(PRESENTATION_DIR, "temp", "chat.in")
chat_mp4 = os.path.join(PRESENTATION_DIR, "temp", "chat.mp4")

print("Generating chat.mp4... (this might take a while)")
with open(os.path.join(PRESENTATION_DIR, "temp", "chat_mp4.log"), "w") as logfile:
    cmd = ["ffmpeg", "-y", "-hide_banner", "-f", "concat", "-safe", "0", "-i", shlex.quote(chat_in),
           "-preset", "veryfast", "-tune", "stillimage", "-pix_fmt", "yuv420p"]
    cmd = cmd + EXTRA_OPTS + [shlex.quote(chat_mp4)]
    
    print(f"    {' '.join([shlex.quote(c) for c in cmd])}")
    proc = subprocess.run(cmd, stdout=logfile, stderr=logfile, universal_newlines=True)
print("Done")

Generating chat.mp4... (this might take a while)
    ffmpeg -y -hide_banner -f concat -safe 0 -i 977d4a145cd70f8b013d723cedbab0ba4b8bba74-1585911377811/temp/chat.in -preset veryfast -tune stillimage -pix_fmt yuv420p -t 600 977d4a145cd70f8b013d723cedbab0ba4b8bba74-1585911377811/temp/chat.mp4
Done

BBB’s webcams.webm file has a resolution of 640x480 and the sole reason to pick 640x540 for the chat and 1280x1080 for the shapes is to be able to lay all three out in the following configuration while outputting a 1920x1080 video:

######################
#             # cams #
#             #      #
#  slides     ########
#             # chat #
#             #      #
#             #      #
######################

The command to do so will follow soon but first I want to explain the value I passed in -filter_complex. I split it up in separate filter chains so it’s a little easier to read:

[0:v]pad=1920:1080:x=0:y=-1[padded];: pick the video stream from input #0 (shapes.mp4), pad it to 1920x1080 and keep it centerd vertically. Name the output as [padded].
[padded][1:v]overlay=x=1280:y=480[overlaid];: Take [padded] and lay the video stream from input #1 (chat.mp4) over it at the coordinates (1280, 480). Name the output of this operation as [overlaid].
[overlaid][2:v]overlay=x=1280[out]: Take [overlaid] and lay the video stream from input #2 (webcams.webm) over it at the coordinates (1280, 0). Name the output as [out] (this will end up in the output file with the -map argument).

webcams_webm = os.path.join(PRESENTATION_DIR, "video", "webcams.webm")
conference_mp4 = os.path.basename(PRESENTATION_DIR) + "_output.mp4"

print(f"Generating {conference_mp4}... (this might take even longer than the previous two)")
with open(os.path.join(PRESENTATION_DIR, "temp", "output_mp4.log"), "wb") as logfile:
    cmd = ["ffmpeg", "-y", "-i", shlex.quote(shapes_mp4), "-i", shlex.quote(chat_mp4),
           "-i", shlex.quote(webcams_webm),
           "-filter_complex", "[0:v]pad=1920:1080:x=0:y=-1[padded];[padded][1:v]overlay=x=1280:y=480[overlaid];[overlaid][2:v]overlay=x=1280[out]",
           "-map", "[out]", "-c:v", "libx264", "-preset", "slow", "-profile:v", "high", "-level", "4.0", "-movflags", "+faststart",
           "-crf", "18", "-bf", "2", "-coder", "1",
           "-map", "2:a", "-c:a", "aac", "-ac", "2", "-cpu-used", "0"]
    cmd = cmd + EXTRA_OPTS + [shlex.quote(conference_mp4)]
    
    print(f"    {' '.join([shlex.quote(c) for c in cmd])}")
    proc = subprocess.run(cmd, stdout=logfile, stderr=logfile, universal_newlines=True)

print("All done!")

Generating 977d4a145cd70f8b013d723cedbab0ba4b8bba74-1585911377811_output.mp4... (this might take even longer than the previous two)
    ffmpeg -y -i 977d4a145cd70f8b013d723cedbab0ba4b8bba74-1585911377811/temp/shapes.mp4 -i 977d4a145cd70f8b013d723cedbab0ba4b8bba74-1585911377811/temp/chat.mp4 -i 977d4a145cd70f8b013d723cedbab0ba4b8bba74-1585911377811/video/webcams.webm -filter_complex '[0:v]pad=1920:1080:x=0:y=-1[padded];[padded][1:v]overlay=x=1280:y=480[overlaid];[overlaid][2:v]overlay=x=1280[out]' -map '[out]' -c:v libx264 -preset slow -profile:v high -level 4.0 -movflags +faststart -crf 18 -bf 2 -coder 1 -map 2:a -c:a aac -ac 2 -cpu-used 0 -t 600 977d4a145cd70f8b013d723cedbab0ba4b8bba74-1585911377811_output.mp4
All done!

The output file will be named after the published directory name (<directory>_output.mp4) and stored in the current working directory. All other files we created are in the temp folder inside the published recording directory. It’s safe to delete this folder and if you ever need to debug FFmpeg invocation issues check the .log files inside.

If you want to turn this into an script, just copy and paste all the code snippets into a .py file and change the PRESENTATION_DIR to something more appropriate (perhaps sys.argv[1]?).

I hope this post helped. Have a good worker’s day!