Chapter 3 - transcoding

TLDR; show me the code and execution.

  1. $ make run_transcoding

We’ll skip some details, but don’t worry: the source code is available at github.

In this chapter, we’re going to create a minimalist transcoder, written in C, that can convert videos coded in H264 to H265 using FFmpeg/libav library specifically libavcodec, libavformat, and libavutil.

media transcoding flow

Just a quick recap: The AVFormatContext is the abstraction for the format of the media file, aka container (ex: MKV, MP4, Webm, TS). The AVStream represents each type of data for a given format (ex: audio, video, subtitle, metadata). The AVPacket is a slice of compressed data obtained from the AVStream that can be decoded by an AVCodec (ex: av1, h264, vp9, hevc) generating a raw data called AVFrame.

Transmuxing

Let’s start with the simple transmuxing operation and then we can build upon this code, the first step is to load the input file.

  1. // Allocate an AVFormatContext
  2. avfc = avformat_alloc_context();
  3. // Open an input stream and read the header.
  4. avformat_open_input(avfc, in_filename, NULL, NULL);
  5. // Read packets of a media file to get stream information.
  6. avformat_find_stream_info(avfc, NULL);

Now we’re going to set up the decoder, the AVFormatContext will give us access to all the AVStream components and for each one of them, we can get their AVCodec and create the particular AVCodecContext and finally we can open the given codec so we can proceed to the decoding process.

The AVCodecContext holds data about media configuration such as bit rate, frame rate, sample rate, channels, height, and many others.

  1. for (int i = 0; i < avfc->nb_streams; i++)
  2. {
  3. AVStream *avs = avfc->streams[i];
  4. AVCodec *avc = avcodec_find_decoder(avs->codecpar->codec_id);
  5. AVCodecContext *avcc = avcodec_alloc_context3(*avc);
  6. avcodec_parameters_to_context(*avcc, avs->codecpar);
  7. avcodec_open2(*avcc, *avc, NULL);
  8. }

We need to prepare the output media file for transmuxing as well, we first allocate memory for the output AVFormatContext. We create each stream in the output format. In order to pack the stream properly, we copy the codec parameters from the decoder.

We set the flag AV_CODEC_FLAG_GLOBAL_HEADER which tells the encoder that it can use the global headers and finally we open the output file for write and persist the headers.

  1. avformat_alloc_output_context2(&encoder_avfc, NULL, NULL, out_filename);
  2.  
  3. AVStream *avs = avformat_new_stream(encoder_avfc, NULL);
  4. avcodec_parameters_copy(avs->codecpar, decoder_avs->codecpar);
  5.  
  6. if (encoder_avfc->oformat->flags & AVFMT_GLOBALHEADER)
  7. encoder_avfc->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
  8.  
  9. avio_open(&encoder_avfc->pb, encoder->filename, AVIO_FLAG_WRITE);
  10. avformat_write_header(encoder->avfc, &muxer_opts);

We’re getting the AVPacket‘s from the decoder, adjusting the timestamps, and write the packet properly to the output file. Even though the function av_interleaved_write_frame says “write frame” we are storing the packet. We finish the transmuxing process by writing the stream trailer to the file.

  1. AVFrame *input_frame = av_frame_alloc();
  2. AVPacket *input_packet = av_packet_alloc();
  3.  
  4. while (av_read_frame(decoder_avfc, input_packet) >= 0)
  5. {
  6. av_packet_rescale_ts(input_packet, decoder_video_avs->time_base, encoder_video_avs->time_base);
  7. av_interleaved_write_frame(*avfc, input_packet) < 0));
  8. }
  9.  
  10. av_write_trailer(encoder_avfc);

Transcoding

The previous section showed a simple transmuxer program, now we’re going to add the capability to encode files, specifically we’re going to enable it to transcode videos from h264 to h265.

After we prepared the decoder but before we arrange the output media file we’re going to set up the encoder.

  1. AVRational input_framerate = av_guess_frame_rate(decoder_avfc, decoder_video_avs, NULL);
  2. AVStream *video_avs = avformat_new_stream(encoder_avfc, NULL);
  3.  
  4. char *codec_name = "libx265";
  5. char *codec_priv_key = "x265-params";
  6. // we're going to use internal options for the x265
  7. // it disables the scene change detection and fix then
  8. // GOP on 60 frames.
  9. char *codec_priv_value = "keyint=60:min-keyint=60:scenecut=0";
  10.  
  11. AVCodec *video_avc = avcodec_find_encoder_by_name(codec_name);
  12. AVCodecContext *video_avcc = avcodec_alloc_context3(video_avc);
  13. // encoder codec params
  14. av_opt_set(sc->video_avcc->priv_data, codec_priv_key, codec_priv_value, 0);
  15. video_avcc->height = decoder_ctx->height;
  16. video_avcc->width = decoder_ctx->width;
  17. video_avcc->pix_fmt = video_avc->pix_fmts[0];
  18. // control rate
  19. video_avcc->bit_rate = 2 * 1000 * 1000;
  20. video_avcc->rc_buffer_size = 4 * 1000 * 1000;
  21. video_avcc->rc_max_rate = 2 * 1000 * 1000;
  22. video_avcc->rc_min_rate = 2.5 * 1000 * 1000;
  23. // time base
  24. video_avcc->time_base = av_inv_q(input_framerate);
  25. video_avs->time_base = sc->video_avcc->time_base;
  26.  
  27. avcodec_open2(sc->video_avcc, sc->video_avc, NULL);
  28. avcodec_parameters_from_context(sc->video_avs->codecpar, sc->video_avcc);

We need to expand our decoding loop for the video stream transcoding:

  1. AVFrame *input_frame = av_frame_alloc();
  2. AVPacket *input_packet = av_packet_alloc();
  3.  
  4. while (av_read_frame(decoder_avfc, input_packet) >= 0)
  5. {
  6. int response = avcodec_send_packet(decoder_video_avcc, input_packet);
  7. while (response >= 0) {
  8. response = avcodec_receive_frame(decoder_video_avcc, input_frame);
  9. if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) {
  10. break;
  11. } else if (response < 0) {
  12. return response;
  13. }
  14. if (response >= 0) {
  15. encode(encoder_avfc, decoder_video_avs, encoder_video_avs, decoder_video_avcc, input_packet->stream_index);
  16. }
  17. av_frame_unref(input_frame);
  18. }
  19. av_packet_unref(input_packet);
  20. }
  21. av_write_trailer(encoder_avfc);
  22.  
  23. // used function
  24. int encode(AVFormatContext *avfc, AVStream *dec_video_avs, AVStream *enc_video_avs, AVCodecContext video_avcc int index) {
  25. AVPacket *output_packet = av_packet_alloc();
  26. int response = avcodec_send_frame(video_avcc, input_frame);
  27.  
  28. while (response >= 0) {
  29. response = avcodec_receive_packet(video_avcc, output_packet);
  30. if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) {
  31. break;
  32. } else if (response < 0) {
  33. return -1;
  34. }
  35.  
  36. output_packet->stream_index = index;
  37. output_packet->duration = enc_video_avs->time_base.den / enc_video_avs->time_base.num / dec_video_avs->avg_frame_rate.num * dec_video_avs->avg_frame_rate.den;
  38.  
  39. av_packet_rescale_ts(output_packet, dec_video_avs->time_base, enc_video_avs->time_base);
  40. response = av_interleaved_write_frame(avfc, output_packet);
  41. }
  42. av_packet_unref(output_packet);
  43. av_packet_free(&output_packet);
  44. return 0;
  45. }

We converted the media stream from h264 to h265, as expected the h265 version of the media file is smaller than the h264 however the created program is capable of:

  1. /*
  2. * H264 -> H265
  3. * Audio -> remuxed (untouched)
  4. * MP4 - MP4
  5. */
  6. StreamingParams sp = {0};
  7. sp.copy_audio = 1;
  8. sp.copy_video = 0;
  9. sp.video_codec = "libx265";
  10. sp.codec_priv_key = "x265-params";
  11. sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0";
  12.  
  13. /*
  14. * H264 -> H264 (fixed gop)
  15. * Audio -> remuxed (untouched)
  16. * MP4 - MP4
  17. */
  18. StreamingParams sp = {0};
  19. sp.copy_audio = 1;
  20. sp.copy_video = 0;
  21. sp.video_codec = "libx264";
  22. sp.codec_priv_key = "x264-params";
  23. sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-cfr=1";
  24.  
  25. /*
  26. * H264 -> H264 (fixed gop)
  27. * Audio -> remuxed (untouched)
  28. * MP4 - fragmented MP4
  29. */
  30. StreamingParams sp = {0};
  31. sp.copy_audio = 1;
  32. sp.copy_video = 0;
  33. sp.video_codec = "libx264";
  34. sp.codec_priv_key = "x264-params";
  35. sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-cfr=1";
  36. sp.muxer_opt_key = "movflags";
  37. sp.muxer_opt_value = "frag_keyframe+empty_moov+default_base_moof";
  38.  
  39. /*
  40. * H264 -> H264 (fixed gop)
  41. * Audio -> AAC
  42. * MP4 - MPEG-TS
  43. */
  44. StreamingParams sp = {0};
  45. sp.copy_audio = 0;
  46. sp.copy_video = 0;
  47. sp.video_codec = "libx264";
  48. sp.codec_priv_key = "x264-params";
  49. sp.codec_priv_value = "keyint=60:min-keyint=60:scenecut=0:force-cfr=1";
  50. sp.audio_codec = "aac";
  51. sp.output_extension = ".ts";
  52.  
  53. /* WIP :P -> it's not playing on VLC, the final bit rate is huge
  54. * H264 -> VP9
  55. * Audio -> Vorbis
  56. * MP4 - WebM
  57. */
  58. //StreamingParams sp = {0};
  59. //sp.copy_audio = 0;
  60. //sp.copy_video = 0;
  61. //sp.video_codec = "libvpx-vp9";
  62. //sp.audio_codec = "libvorbis";
  63. //sp.output_extension = ".webm";

Now, to be honest, this was harder than I thought it’d be and I had to dig into the FFmpeg command line source code and test it a lot and I think I’m missing something because I had to enforce force-cfr for the h264 to work and I’m still seeing some warning messages like warning messages (forced frame type (5) at 80 was changed to frame type (3)).