5. RTP Media API

The RTP media API lets a web application send and receive MediaStreamTracks over a peer-to-peer connection. Tracks, when added to an RTCPeerConnection, result in signaling; when this signaling is forwarded to a remote peer, it causes corresponding tracks to be created on the remote side.

Note

There is not an exact 1:1 correspondence between tracks sent by one RTCPeerConnection and received by the other. For one, IDs of tracks sent have no mapping to the IDs of tracks received. Also, replaceTrack changes the track sent by an RTCRtpSender without creating a new track on the receiver side; the corresponding RTCRtpReceiver will only have a single track, potentially representing multiple sources of media stitched together. Both addTransceiver and replaceTrack can be used to cause the same track to be sent multiple times, which will be observed on the receiver side as multiple receivers each with its own separate track. Thus it’s more accurate to think of a 1:1 relationship between an RTCRtpSender on one side and an RTCRtpReceiver‘s track on the other side, matching senders and receivers using the RTCRtpTransceiver‘s mid if necessary.

When sending media, the sender may need to rescale or resample the media to meet various requirements including the envelope negotiated by SDP.

Following the rules in [RFC8829] (section 3.6.), the video MAY be downscaled in order to fit the SDP constraints. The media MUST NOT be upscaled to create fake data that did not occur in the input source, the media MUST NOT be cropped except as needed to satisfy constraints on pixel counts, and the aspect ratio MUST NOT be changed.

Note

The WebRTC Working Group is seeking implementation feedback on the need and timeline for a more complex handling of this situation. Some possible designs have been discussed in GitHub issue 1283.

When video is rescaled, for example for certain combinations of width or height and scaleResolutionDownBy values, situations when the resulting width or height is not an integer may occur. In such situations the user agent MUST use the integer part of the result. What to transmit if the integer part of the scaled width or height is zero is implementation-specific.

The actual encoding and transmission of MediaStreamTracks is managed through objects called RTCRtpSenders. Similarly, the reception and decoding of MediaStreamTracks is managed through objects called RTCRtpReceivers. Each RTCRtpSender is associated with at most one track, and each track to be received is associated with exactly one RTCRtpReceiver.

The encoding and transmission of each MediaStreamTrack SHOULD be made such that its characteristics (width, height and frameRate for video tracks; sampleSize, sampleRate and channelCount for audio tracks) are to a reasonable degree retained by the track created on the remote side. There are situations when this does not apply, there may for example be resource constraints at either endpoint or in the network or there may be RTCRtpSender settings applied that instruct the implementation to act differently.

An RTCPeerConnection object contains a set of RTCRtpTransceivers, representing the paired senders and receivers with some shared state. This set is initialized to the empty set when the RTCPeerConnection object is created. RTCRtpSenders and RTCRtpReceivers are always created at the same time as an RTCRtpTransceiver, which they will remain attached to for their lifetime. RTCRtpTransceivers are created implicitly when the application attaches a MediaStreamTrack to an RTCPeerConnection via the addTrack() method, or explicitly when the application uses the addTransceiver method. They are also created when a remote description is applied that includes a new media description. Additionally, when a remote description is applied that indicates the remote endpoint has media to send, the relevant MediaStreamTrack and RTCRtpReceiver are surfaced to the application via the track event.

In order for an RTCRtpTransceiver to send and/or receive media with another endpoint this must be negotiated with SDP such that both endpoints have an RTCRtpTransceiver object that is associated with the same media description.

When creating an offer, enough media descriptions will be generated to cover all transceivers on that end. When this offer is set as the local description, any disassociated transceivers get associated with media descriptions in the offer.

When an offer is set as the remote description, any media descriptions in it not yet associated with a transceiver get associated with a new or existing transceiver. In this case, only disassociated transceivers that were created via the addTrack() method may be associated. Disassociated transceivers created via the addTransceiver() method, however, won’t get associated even if media descriptions are available in the remote offer. Instead, new transceivers will be created and associated if there aren’t enough addTrack()-created transceivers. This sets addTrack()-created and addTransceiver()-created transceivers apart in a critical way that is not observable from inspecting their attributes.

When creating an answer, only media media descriptions that were present in the offer may be listed in the answer. As a consequence, any transceivers that were not associated when setting the remote offer remain disassociated after setting the local answer. This can be remedied by the answerer creating a follow-up offer, initiating another offer/answer exchange, or in the case of using addTrack()-created transceivers, making sure that enough media descriptions are offered in the initial exchange.