Getting Started

This guide is intended to serve as a deep dive into video and audio encoding for amateurs/enthusiasts. The beginning parts of the guide assume that you have little previous experience with video encoding. The deeper parts of the guide build upon the earlier parts--advanced users may feel more comfortable skipping forward in the guide.

The guide can be navigated in any of the following ways:

  • The left and right arrows on the sides of the screen
  • The left arrow and right arrow keys on the keyboard
  • The collapsible table of contents

Video Formats

Before we can begin encoding, we have to have a basic understanding of what encoding is. Encoding is essentially compressing the video or audio so that it results in a smaller file. This is similar to what you might have done with .rar or .zip files, the difference being that video and audio encoders are highly specialized, allowing for far greater compression than a generic format could give.

There have been hundreds of video compression formats across the decades, but as technology has advanced, allowing the creation of better and better formats, this guide will only cover the most popular formats that are still worthwhile for encoding today.

Lossy Encoding

"Lossy" encoding refers to a type of video compression where there is information lost between the source and the encode. This may include artifacts such as banding, blocking, ringing, blurring, etc., and generally the smaller the output file is, the more artifacts it tends to have. This is the predominant type of compression used in video encoding, as it results in much smaller files, and the quality loss is generally considered acceptable. Modern encoders attempt to reduce the amount of artifacts that are created at an equivalent bitrates, so that they can produce smaller files that look better.

H.264

H.264 has been around for almost 20 years now, and has remained the most popular video format throughout most of this period. Until recently with the rise in popularity of HDR, it has been the primary format used by streaming services and on Blu-ray discs, as well as by many digital video cameras, and is still extremely common today. It is also extremely popular among hobbyist encoders due to the presence of x264, which is a highly advanced, fast, free and open-source encoder for the H.264 format.

H.265

H.265 is the successor to the H.264 format, and includes more advanced features for compression as well as support for HDR, or High Dynamic Range, which is a technology that allows increased contrast (and by extension visual appeal) of a video. H.265 also has a popular, advanced, and free open-source encoder, called x265, which has become more developed over the years and now is every bit as advanced as x264. H.265, for now, is the recommended format for use with any HDR videos. However, one major limitation of H.265 is that most browsers will not playback H.265 natively, making it difficult to use for internet streaming, although some providers such as Netflix have adapted their services to support H.265.

VP9

VP9 is a video format developed by Google, and is predominantly used by YouTube. It achieves similar compression to H.264, and is free of patent restrictions. It also has a free, open-source encoder, called vpxenc, although this encoder has not developed to the point of x264. As a result, vpxenc is still quite slow and does not necessarily achieve the full quality that the VP9 format is capable of.

AV1

AV1 is the successor to VP9, and is the joint work of many major companies. Like VP9, it intends to be patent-free, but it brings many compression improvements not just from its own design, but from other codecs which were in development at the time, such as Daala and Thor. AV1 currently has three major open-source encoders: aomenc, which is the reference encoder developed predominantly by Google; SVT-AV1, which is focused heavily on speed and multithreading; and rav1e, which was originally sponsored by Mozilla, and aims to be more focused on psychovisual features (i.e. more appealing to human eyes) than the other two encoders. AV1 playback is currently supported in all major browsers and by most phones and tablets, and is beginning to be supported by newer set-top devices such as the Fire Stick 4k Max. Although there is still plenty of work to be done for any AV1 encoder to reach its full potential, AV1 appears to be the future of video encoding.

Lossless Encoding

Unlike lossy encoding, lossless encoding is a type of encoding where the output is exactly identical to the input. The reason why one might do this is to apply filtering such as resizing, debanding, tonemapping, etc. to a video prior to lossy encoding. This could be encoded directly to lossy in one step, but sometimes the addition of a lossless encoding step makes more sense either from a workflow standpoint or from an encoding performance standpoint.

H.264

x264 is capable of encoding videos losslessly, and it is quite good at it. Generally, this is going to be one of the best options for lossless video encoding, as it is fast, compresses well, and has support for 10-bit encoding. Given that using a slower preset does not increase quality, but only reduces filesize, and that lossless files are often used only as intermediates or temporary files, it is common to use an x264 preset such as veryfast for lossless.

FFv1

FFv1 is a lossless codec included with the encoding tool ffmpeg. It is solely intended for lossless encoding, and can losslessly store up to 16-bit video. However, it does not compress as well as H.264 and tends to be slower at both encoding and decoding, particularly if using a fast x264 preset.

Audio Formats

Similar to video, there are dozens of audio compression formats that have been used over the decades. Historically, the most widely known has been MP3, but this has hardly ever been used in video, and is now antiquated for music as well. The following are the current most widely used formats for audio in videos. (This guide is not intended to cover encoding music tracks, although much of the advice here will still apply similarly.)

Lossy Encoding

AAC

TODO

Vorbis

TODO

Opus

TODO

Lossless Encoding

The use of lossless encoding is a bit more common with audio, due to the fact that a losslessly encoded audio track is relatively small compared to a losslessly encoded video track. The distribution of lossless audio is still relatively rare, but it will occasionally be seen, whereas it is incredibly rare for lossless video to be distributed.

FLAC

TODO

Formats you may need to decode

TODO

PCM

TODO

DTS

TODO

Dolby Atmos

TODO

Containers

Necessary Tools

Compression

Example 1: x264 + AAC

Example 2: x265 + FLAC

Example 3: AV1 + Opus

Advanced: av1an

Color Management

Color Models

A color model is a method of representing colors in a video or image using data. This guide will be covering the most widely used color models.

RGB

RGB is probably the most well-known color model, and is primarily used in image encoding. RGB consists of three color channels, Red, Green, and Blue, which are then combined to determine the final color of each pixel. Typically, RGB is the final model that a monitor or TV will use to display images, although it is not commonly used for video encoding.

YUV

YUV, also known as YCbCr, is the most widely used color model for video encoding. It consists of three components: Y aka Luma, which represents luminance or brightness, and two chroma planes, which represent color. Generally a video player will have to convert a YUV video into RGB before it can be rendered, but there are significant compression benefits to using YUV over RGB for video.

The most notable reason to use YCbCr is an optimization called chroma subsampling. This means that the chroma components can be encoded at a lower resolution than the luma components, which results in a smaller output file. You can read more about chroma subsampling in Color formats.

Color Formats

To represent color values, a format is agreed upon. Color formats are made up of three things, the order of the components, the bit depth, and whether it is a packed or a planar format. In some cases, endianness may be important.

Component order

The order in which the components (which come from a color model) are arranged is simply represented by writing them out. For example, RGB for red first, then green, then blue, or BGR for blue, green, red.

Bit depth

A bit depth is how many bits are available to store the sample value. There are two main ways to specify the bit depth in a :

  • bits per component. Here, RGB888 reads as RGB color model, with 8 bits for the red component, 8 bits for the green component, and 8 bits for the blue component and RGB565 reads as RGB color model, with 5 bits for the red component, 6 bits for the green component, and 5 bits for the blue component.
  • bits per sample. Here, RGB24 reads as RGB color model, with 24 bits in total for the red, green, and blue components. This is ambiguous, because one does not know exactly how many bits are allocated to each component. RGB565, RGB556, and RGB655 (even though the latter ones do not make much sense as the eye is most sensitive to green light) all become RGB16.

Packed vs planar

Components can be stored either packed, where all components are interleaved (here, RGB):

Sample number:   1   2   3   4   5
Data:          RGB RGB RGB RGB RGB

or stored separately for each component:

Sample number: 1 2 3 4 5
Data:          R R R R R...
Data:          G G G G G...
Data:          B B B B B...

In planar formats, many operations can be easier to implement, as it is possible to implement the algorithm once and then operate on all planes. On the other hand, packed formats are simpler and often used in hardware.1

Endianness

Different computer architectures store numbers differently. For more information, visit the Wikipedia article on endianness. There are two main ways to store numbers with more than 8 bits (1 is the least significant byte and 4 is the most significant byte, here 4 bytes):

  • Most significant byte first, little endian, 4321. This is what x86-family processors use.
  • Least significant byte first, big endian, 1234. This is what PowerPC-family processors use.

This can be important for color formats, as some computers might store it in their native endianness. VapourSynth doesn't seem to care about endianness, but FFmpeg does.

For example, RGB565 might store its two bytes in 12 or 21 order, and if they are read in the wrong order, it will produce garbage.

Chroma subsampling

In Y'CbCr signals, there are three widely used variants of chroma subsampling:

  • 4:2:0 which has half the vertical and horizontal chroma resolution
  • 4:2:2 which has half the horizontal chroma resolution but full vertical resolution
  • 4:4:4 which has full chroma resolution (no subsampling)

4:2:2 is not particularly useful over the other options, so this guide will focus on 4:2:0 and 4:4:4.

4:2:0 is the most commonly used format for videos. Nearly every DVD, blu-ray, YouTube video, etc. uses 4:2:0 subsampling. This is because, in the majority of cases, human eyes do not notice the reduction in chroma resolution. There is very little benefit to using 4:4:4 in the average case.

However, there are some exceptions. The most notable is screen recordings. Things like text overlays, video game UI overlays, etc. have very fine, color-dependent detail that can be destroyed by chroma subsampling and result in an aliased look to the video. Therefore, it is recommended to use 4:4:4 subsampling when recording your desktop, and 4:2:0 subsampling in most other cases.

Common formats

VS nameFFmpeg nameMeaning
GRAY8gray8Brightness only, 8 bits, packed
GRAY16gray16le, gray16be (the suffix specifies the endianness)Brightness only, 16 bits
RGB888rgb24red, green, blue, 8 bits per component
YUV420P8yuv420pluma, chroma blue, chroma red, 8 bits per component, planar, 4:2:0 subsampling
YUV422P8yuv422pluma, chroma blue, chroma red, 8 bits per component, planar, 4:2:2 subsampling
YUV444P8yuv444pluma, chroma blue, chroma red, 8 bits per component, planar, no subsampling
YUV420P10yuv420p10le, yuv420p10leluma, chroma blue, chroma red, 10 bits per component, planar, 4:2:0 subsampling
YUV422P10yuv422p10le, yuv422p10leluma, chroma blue, chroma red, 10 bits per component, planar, 4:2:2 subsampling
YUV444P10yuv444p10le, yuv444p10leluma, chroma blue, chroma red, 10 bits per component, planar, no subsampling

References

Color Range

Range is a concept that describes the valid values for a pixel. Typically, RGB will use full range and YUV will use limited range.

What does this mean?

For 8-bit video, full range indicates that all values between 0-255 may be used to represent a color value. On the other hand, limited range indicates that only values between 16-235, or 16-240 for chroma, are valid, and any values outside that range will be clamped to fit in that range. These ranges do expand appropriately for high bit depth videos as well.

Why is limited range a thing that exists? Essentially, it's due to historical reasons, but it's a convention that we are stuck with today. Even though full range may provide slightly better color accuracy, it is far less meaningful for high bit depth content, and even HDR blu-rays use limited color range. Therefore, it is recommended to follow existing conventions.

Color Primaries

This section enters the first of three settings that are important for retaining accurate color when encoding videos, those settings being primaries, color matrix, and transfer characteristics.

Color primaries are used to indicate the correct coordinates for the red, blue, and green colors. There are historical reasons for why so many standards exist, and this guide will not go in depth into history lessons, but will explain what primaries are available and when to use each one.

Note that for primaries, matrices, and transfer, you can view the values that are set on a video using a tool like MediaInfo. If there are no values set, the renderer will need to guess which values to use. A safe default assumption for most modern videos is BT.709, although this may vary depending on source and resolution for the video. It is strongly preferred to set the correct values when encoding.

Each setting has at least one name and exactly one integer value representing it--most encoder softwares will accept one or more of the names, but some tooling such as Vapoursynth and MKVToolnix accepts the integer values instead.

1: BT.709

BT.709 is the standard used for modern high-definition video, and is a safe default assumption.

This color primary setting is used in the following standards:

  • Rec. ITU-R BT.709-6
  • Rec. ITU-R BT.1361-0 conventional colour gamut system and extended colour gamut system (historical)
  • IEC 61966-2-1 sRGB or sYCC
  • IEC 61966-2-4
  • Society of Motion Picture and Television Engineers (SMPTE) RP 177 (1993) Annex B

2: Unspecified

This value indicates that no color primary is set for the video, and the player must decide which value to use.

mpv will use the following heuristics in this case:

if matrix == "BT.2020" {
    "BT.2020"
} else if matrix == "BT.709" {
    "BT.709"
} else if width >= 1280 || height > 576 {
    "BT.709"
} else if height == 576 {
    "BT.470BG"
} else if height == 480 || height == 488 {
    "SMPTE 170M"
} else {
    "BT.709"
}

4: BT.470M

BT.470M is a standard that was used in analog television systems in the United States.

This color primary setting is used in the following standards:

  • Rec. ITU-R BT.470-6 System M (historical)
  • United States National Television System Committee 1953 Recommendation for transmission standards for color television
  • United States Federal Communications Commission (2003) Title 47 Code of Federal Regulations 73.682 (a) (20)

5: BT.470BG

BT.470BG is a standard that was used for European (PAL) television systems and DVDs.

This color primary setting is used in the following standards:

  • Rec. ITU-R BT.470-6 System B, G (historical)
  • Rec. ITU-R BT.601-7 625
  • Rec. ITU-R BT.1358-0 625 (historical)
  • Rec. ITU-R BT.1700-0 625 PAL and 625 SECAM

6: SMPTE 170M

SMPTE 170M is a stanrard that was used for NTSC television systems and DVDs.

  • Rec. ITU-R BT.601-7 525
  • Rec. ITU-R BT.1358-1 525 or 625 (historical)
  • Rec. ITU-R BT.1700-0 NTSC
  • SMPTE ST 170 (2004)

7: SMPTE 240M

SMPTE 240M was an interim standard used during the early days of HDTV (1988-1998). Its primaries are equivalent to SMPTE 170M.

8: Film

This represents generic film using Illuminant C.

9: BT.2020

BT.2020 is a standard used for ultra-high-definition video, i.e. 4K and higher. It may be used with or without HDR, as HDR is defined by the transfer characteristics.

This color primary setting is used in the following standards:

  • Rec. ITU-R BT.2020-2
  • Rec. ITU-R BT.2100-2

10: SMPTE 428

SMPTE 428 is used for D-Cinema Distribution Masters, aka DCDM.

This color primary setting is used in the following standards:

  • SMPTE ST 428-1 (2019)
  • (CIE 1931 XYZ as in ISO 11664-1)

11: DCI-P3

DCI-P3 is a wide-gamut colorspace used alongside RGB. It is used internally by most HDR monitors on the market.

12: Display-P3

Display-P3 is a variant of DCI-P3 developed by Apple because they wanted to be different.

22: EBU Tech 3213

Nobody really knows what this is.

Matrix Coefficients

Matrix coefficients represent the multiplication matrix that is used when converting from YUV to RGB. The following values are available:

0: Identity

Specifies that the identity matrix should be used, i.e. this data is already in an RGB-compatible colorspace.

This matrix coefficient setting is used in the following standards:

  • GBR (often referred to as RGB)
  • YZX (often referred to as XYZ)
  • IEC 61966-2-1 sRGB
  • SMPTE ST 428-1 (2019)

1: BT.709

BT.709 is the standard used for modern high-definition video, and is a safe default assumption.

This matrix coefficient setting is used in the following standards:

  • Rec. ITU-R BT.709-6
  • Rec. ITU-R BT.1361-0 conventional colour gamut system and extended colour gamut system (historical)
  • IEC 61966-2-4 xvYCC709
  • SMPTE RP 177 (1993) Annex B

2: Unspecified

This value indicates that no color matrix is set for the video, and the player must decide which value to use.

mpv will use the following heuristics in this case:

if width >= 1280 || height > 576 {
    "BT.709"
} else {
    "SMPTE 170M"
}

4: BT.470M

BT.470M is a standard that was used in analog television systems in the United States.

5: BT.470BG

BT.470BG is a standard that was used for European (PAL) television systems and DVDs.

This matrix coefficient setting is used in the following standards:

  • Rec. ITU-R BT.470-6 System B, G (historical)
  • Rec. ITU-R BT.601-7 625
  • Rec. ITU-R BT.1358-0 625 (historical)
  • Rec. ITU-R BT.1700-0 625 PAL and 625 SECAM
  • IEC 61966-2-1 sYCC
  • IEC 61966-2-4 xvYCC601

6: SMPTE 170M

SMPTE 170M is a stanrard that was used for NTSC television systems and DVDs. Its matrix coefficients are equivalent to BT.470BG.

This matrix coefficient setting is used in the following standards:

  • Rec. ITU-R BT.601-7 525
  • Rec. ITU-R BT.1358-1 525 or 625 (historical)
  • Rec. ITU-R BT.1700-0 NTSC
  • SMPTE ST 170 (2004)

7: SMPTE 240M

SMPTE 240M was an interim standard used during the early days of HDTV (1988-1998).

8: YCgCo

The YCoCg color model, also known as the YCgCo color model, is the color space formed from a simple transformation of an associated RGB color space into a luma value and two chroma values called chrominance green and chrominance orange.

9: BT.2020 Non-Constant Luminance

BT.2020 is a standard used for ultra-high-definition video, i.e. 4K and higher. It may be used with or without HDR, as HDR is defined by the transfer characteristics. If you do not know if you want non-constant or constant luminance, you probably want non-constant.

This matrix coefficient setting is used in the following standards:

  • Rec. ITU-R BT.2020-2 (non-constant luminance)
  • Rec. ITU-R BT.2100-2 Y′CbCr

10: BT.2020 Constant Luminance

This is a variant of BT.2020 with constant luminance values, represented using the YcCbcCrc colorspace. You probably want the non-constant luminance variant instead, unless you know you want this one.

11: SMPTE 2085

SMPTE 2085 is a standard used with HDR signals in the XYZ colorspace. I've never actually seen it used in the wild.

12: Chromaticity-Derived Non-Constant Luminance

I'm not really sure when you would use this.

13: Chromaticity-Derived Constant Luminance

I'm not really sure when you would use this.

14: ICtCp

ICtCp is an alternative colorspace developed for use with HDR and wide gamut video, by Dolby because they love doing extra stuff like this. I've never actually seen it used in the wild.

Transfer Functions

Transfer functions, also known as transfer characteristics, represent the gamma function of a video--that is, how to convert from a gamma-compressed video to one that is in linear light. These are sometimes also called EOTF and OETF functions. The following values are available:

1: BT.1886

BT.1886 is the standard used for most modern, SDR video, and is a safe default assumption.

This transfer function is used in the following standards:

  • Rec. ITU-R BT.709-6
  • Rec. ITU-R BT.1361-0 conventional colour gamut system (historical)

2: Unspecified

This value indicates that no transfer function is set for the video, and the player must decide which value to use.

mpv will always assume BT.1886 in this case.

4: BT.470M

BT.470M is a standard that was used in analog television systems in the United States. This transfer represents a power function with a gamma of 2.2.

This transfer function is used in the following standards:

  • Rec. ITU-R BT.470-6 System M (historical)
  • United States National Television System Committee 1953 Recommendation for transmission standards for color television
  • United States Federal Communications Commission (2003) Title 47 Code of Federal Regulations 73.682 (a) (20)
  • Rec. ITU-R BT.1700-0 625 PAL and 625 SECAM

5: BT.470BG

BT.470BG is a standard that was used for European (PAL) television systems and DVDs. This transfer represents a power function with a gamma of 2.8.

6: SMPTE 170M

SMPTE 170M is a stanrard that was used for NTSC television systems and DVDs. Its transfer function is equivalent to BT.1886.

This transfer function is used in the following standards:

  • Rec. ITU-R BT.601-7 525 or 625
  • Rec. ITU-R BT.1358-1 525 or 625 (historical)
  • Rec. ITU-R BT.1700-0 NTSC
  • SMPTE ST 170 (2004)

7: SMPTE 240M

SMPTE 240M was an interim standard used during the early days of HDTV (1988-1998).

8: Linear

This value indicates that the content is already in linear light.

9: Logarithmic 100

Indicates a logarithmic transfer function with a 100:1 range.

10: Logarithmic 316

Indicates a logarithmic transfer function with a (100 * sqrt(10)):1 range.

11: XVYCC

Used in standard IEC 61966-2-4. I have no idea what this actually is.

12: BT.1361E

This was intended to be a standard for "future" television systems, but it never really came into use.

13: sRGB

Represents the sRGB colorspace.

This transfer function is used in the following standards:

  • IEC 61966-2-1 sRGB (with MatrixCoefficients equal to 0)
  • IEC 61966-2-1 sYCC (with MatrixCoefficients equal to 5)

14: BT.2020 10-bit

Typically used with ultra-high-definition 10-bit SDR video. Its transfer function is equivalent to BT.1886.

15: BT.2020 12-bit

Typically used with ultra-high-definition 12-bit SDR video. Its transfer function is equivalent to BT.1886.

16: PQ aka SMPTE 2084

PQ is the most widely used transfer function for HDR content. It allows for a wider range of luminance to be represented than conventional transfer functions.

This transfer function is used in the following standards:

  • SMPTE ST 2084 (2014) for 10-, 12-, 14- and 16-bit systems
  • Rec. ITU-R BT.2100-2 perceptual quantization (PQ) system

17: SMPTE 428

SMPTE 428 is used for D-Cinema Distribution Masters, aka DCDM.

18: HLG aka Hybrid Log-Gamma

HLG is an alternative transfer function for HDR content used by some televisions.

This transfer function is used in the following standards:

  • ARIB STD-B67 (2015)
  • Rec. ITU-R BT.2100-2 hybrid log- gamma (HLG) system

HDR

Video Filtering

Intro to Vapoursynth

Vapoursynth is a software tool for filtering videos. It is cross-platform, supports advanced scripting in Python, and has many community-supported plugins for advanced filtering.

Before we begin, let's go ahead and make sure you have Vapoursynth installed on your machine.

Windows

For Windows, you will need to make sure you first have Python 3.11 installed. You can download it from this page, using the "Windows installer (64-bit)" link.

Once you have Python installed, you can now download the latest version of the Vapoursynth 64-bit installer from the Vapoursynth releases pages. Setup should be fairly straightforward.

There is a utility called VSRepoGUI which can be used as a convenience method for installing many community plugins.

Linux

It is recommend to use your distribution's package manager for installing Vapoursynth. Some distributions do not provide a recent version of Vapoursynth. They are bad. The latest version in Debian's repo is 2 years old. They consider this a feature, even though it causes basically every plugin to be broken. Debian sucks. Just use Arch.

See the Vapoursynth documentation for further details.

Introduction to the FFmpeg command-line utilities

FFmpeg is a multimedia framework that has utilities for transcoding, transmuxing, and filtering audio and video. It provides the ffmpeg, ffprobe, and ffplay command-line utilities. It also features the libav* libraries, which allow you to use the functionality of FFmpeg without the programs.1

Obtaining FFmpeg

Unix-like

Package manager

The easiest way to obtain FFmpeg is through your package manager. On most package managers, the package is simply named ffmpeg, however, ffprobe and ffplay may have their own packages. Note that the packages may be outdated.

Compiling from source

To compile FFmpeg from source:

  • grab the sources.
  • Run ./configure --help to see a list of features and libraries you can choose to build with.
  • Install all libraries you want to build FFmpeg with
  • Run ./configure with all --enable- flags you want.
  • Run make.
  • Run make install.

Binary packages

These packages are not compiled by FFmpeg themselves. Be careful.

Windows

Binary Packages

These packages are not compiled by FFmpeg themselves. Be careful.

Using the ffmpeg program

ffmpeg is the primary command-line tool of FFmpeg. It takes 0 or more bitstreams as inputs and outputs.

Concepts

Bitstream

A bitstream or bit stream is a media file, the kind that is played in a media player. It consists of a container wrapping multiple elementary streams

Container

A container is a format for putting one or more elementary streams intro one file, or bitstream.

Elementary stream

An elementary stream is an audio, video, or subtitle track. Basically, it's the compressed data you want to mux into the container.

Muxing

Putting elementary streams into a container.

Codec

A codec (coder/decoder) is the piece of code that actually encodes the data you put in. It takes as input and produces as output an elementary stream.

Filter

A filter is a piece of code you can apply to the data to make something about it different, for instance sharpening, removing artifacts, shakiness, denoising, scaling, overlay, etc.

Muxer/Demuxer

The pieces of code that mux or do the reverse, getting elementary streams from the container.

Bitstream filter

A bitstream filter is a filter that is directly applied to the bitstream in order to change something about the container, for instance, convert frame types, or corrupt some packets.

Command-line arguments

ffmpegs command-line arguments are positional. That means, it matters where you put options. Each input and output has their own arguments. So, for example ffmpeg -r 24 -i file1 file2 applies the -r 24 option to the input file1, interpreting the video as having that frame rate, while ffmpeg -i file1 -r 24 file2 applies the -r 24 option to file2. To get a list of options, refer to the FFmpeg documentation.

How do I...

Transcode a video

ffmpeg -i input -c:v video_codec -b:v video_bitrate -c:a audio_codec -b:a audio_bitrate output

OptionMeaning
-c:v video_codeccodec for the automatically selected video stream
-b:v video_bitratebitrate for the automatically selected video stream
-c:a audio_codeccodec for the automatically selected audio stream
-b:a audio_bitratebitrate for the automatically selected audio stream

Transmux a video

ffmpeg -i input -c copy output

OptionMeaning
-c copyset the codec to copy

Filter a video

ffmpeg -i input -c:v video_codec -c:a audio_codec (...) -vf filter_name output

OptionMeaning
-vf filter_nameset the video filter to filter_name

References

Source Filters

Source filters are the way to import a video into a Vapoursynth script.

This article will be expanded in the future, but for now we will cover the most common source filter, which handles the vast majority of use cases.

LSmashSource

LSmashSource is a source filter which supports the vast majority of video formats and is capable of frame-accurate decoding and seeking.

We will see an example of using LSmashSource below:

clip = core.lsmas.LWLibavSource(source="My Video.mkv")

clip.set_output(0)

This will load the video located at this filepath into the Vapoursynth script, and make it available in the clip variable for later use. The set_output call at the end tells Vapoursynth to export this clip to the program that is calling the script, e.g. vspipe or vapoursynth-preview.

Note that some video formats, such as VC1, may encounter issues when attempting to seek through them. For these formats, it is recommended to only encode them linearly, i.e. without seeking. This means when using tools such as av1an, it is recommended to transcode to a lossless intermediate first.

Bit Depths and Color Formats

A generic overview to colorimetry is presented in the Color Management section. This area will go more into detail on how colorimetry relates to video filtering, and how to modify colorimetry as part of your filter chain.

Bit Depth

Historically, the majority of videos have been encoded using 8-bits of data per pixel. However, that is changing, especially with HDR becoming more and more popular. Modern encoders, including x265 and all AV1 encoders, support encoding at higher bit depths, and 10-bit encoding is becoming more common and recommended. (Note that although x264 also supports 10-bit encoding, some players may not support playback of 10-bit H.264, so if set-top compatibility is important, 10-bit H.264 should be avoided. It will however playback just fine on PC media players.)

In addition, when filtering, we typically want to run our filters on a 16-bit video to ensure the highest quality filtering possible.

We can account for these needs using functions from the vstools Vapoursynth plugin.

import vstools

clip = core.lsmas.LWLibavSource(source="My Video.mkv")
clip = vstools.initialize_clip(clip)
# We can do other filtering here
clip = vstools.finalize_clip(clip)

Cropping and Resizing

Interlacing, Telecine, and Combing

Noise: Good or Bad?

Ringing & Haloing

Debanding & Dithering

Sharpening

Anti-aliasing

Chroma Problems

Tonemapping

AV1 In Depth

License

This work is made public under the Creative Commons BY-SA 4.0 license.

Markdown source code for this guide is available on GitHub.