Overview

At ByteDance, we are thrilled to serve billions of users worldwide with a suite of more than a dozen products. The unprecedented growth of our business, however, brings significant challenges of cutting the cost of on-demand video streaming without sacrificing user experience so that we can continue providing high quality services for all users.

Multi-site Parallel Downloading (MPD) is one approach that we have found very effective in bringing down the overall streaming cost up to 50%. Unlike the conventional on-demand streaming solutions that rely on DASH streaming from dedicated CDN servers, MPD tries to take advantage of the low-cost computing resources available on the Internet. Such computing resources (i.e., set top boxes, smart home devices, idle servers, etc.), also called data nodes in our MPD system, usually offer adequate storage space, varying network bandwidth, limited computing capability, and unreliable availability. Even though one single data node cannot replace the dedicated CDN server to offer video streaming at the same quality and reliability, allowing the downloading program to connect simultaneously to multiple data nodes can potentially make up the performance gap. Therefore, tuning up the performance of MPD transport can help offload more traffic from CDN servers to less expensive MPD nodes and significantly bring down the overall streaming cost.

In this grand challenge, we provide the platform for contestants to design an MPD transport algorithm and test the algorithm performance in the real world network environment. For simplicity, we provide APIs for querying, connecting, downloading data from available data nodes. The contestants can focus on the design and implementation of the core transport algorithm, which is, in summary, to determine which data chunk should be requested from which data node at what time. The goal is to maximize the video downloading speed while minimizing the overall bandwidth usage. We have built a testing platform with over 5000 data node instances for performance evaluation. All data node instances run on the same computing devices that are actively used in the commercial ByteDance MPD streaming system serving Chinese customers. All submissions will be evaluated and ranked based on the actual performance score of downloading video content on our MPD test platform.

picture

Task Description

In this MMSys Grand Challenge, the contestants are asked to design and develop the core MPD transport function to complete an app program (we call it client program in the following context). You may have noticed that MPD is similar to P2P in many ways. However, since MPD aims to provide an alternative low-cost solution to replace CDN, it sets a much higher performance bar than conventional P2P in terms of both speed and latency. Keep in mind that the MPD downloading module aims to support an on-demand video streaming application, which means the subsequent video playback module expects the speed of sequential downloading (not just throughput) to be at least faster than the video bitrate. Meanwhile, the goal of low-cost means MPD also needs to manage the overall bandwidth usage during streaming.

For a given downloading task, the client program connects to a set of data nodes (usually no more than 25) through UDP connections. All parties agree to divide the video file into small chunks of 1024 bytes (the last chunk of a file may contain less than 1024 bytes). To download a data chunk, the client needs to explicitly send a data request with the chunk ID to a connected data node, which passively responds by sending the requested data chunk back. The client program needs to decide to which data node it sends the request for each data chunk. Moreover, the data node does not check or guarantee the success of data chunk delivery. It is the client program’s responsibility to check the possible data loss and request the missing data chunk from the same or a different data node.

Basic Framework

We will provide a basic app framework to simplify the development. The app framework handles basic operations, such as querying the data nodes, connecting and authenticating with data nodes, sending requests to and downloading data from data nodes following certain protocols, and so on. It also keeps monitoring the arrival of downloaded data chunks. It automatically stops the task session when all data chunks are received or the session times out. The app framework interacts with the core transport function through a set of callbacks. Please refer to the documentation and sample code in the provided framework package for more details of API usage. Please note that you need to strictly follow the API requirements so that your code can be imported for platform testing.

Evaluation Metrics

For the performance evaluation, we propose a new metric adjusted speed to evaluate downloading speed, redundancy rate, and out-of-order situation in one score. We define for a given downloading task i as follows:


where

  • : In test case , is the size of the file to download. However, if the program fails to download the file within the video duration, will be set to 0.

  • : In test case , is the time it takes for the program to finish downloading the file. If the program fails to download the file within the video duration, will be set to the video duration of the download file .

  • : In test case , is the size of the overall data downloaded by the program.

  • : In test case , is the time (in seconds) it takes to finish downloading all seconds equivalent continuous chunks from the file beginning.

  • : In test case , is the video duration of the download file.

In this metric, is the main factor determining the downloading speed and and are penalty factors for redundancy and out-of-order, respectively. More bandwidth used in downloading results in a higher . The purpose of is to set up check points every 10 seconds. If the actual downloading time is larger than the video duration, which means the video playback module has to pause and wait for data, the pause time will be added to the penalty. Since the on-demand video streaming usually has a pre-fetch buffer, we only check the downloading order every 10 seconds.

Test Environment

We provide two test platforms for all participants.

  • The local test platform requires a Linux server. We will provide detail instructions to set up a mininet environment and run different data node instances. The downloading program also runs in the mininet and requests data from different data node instances. This local platform is mainly used for debugging and functionality validation. However, you may also modify the mininet topology, add network restrictions, or change the number of data node instances to test algorithm performance in various settings. Note that the usage of all API calls and data chunk arrival are automatically recorded in logs, which can be used to calculate the downloading speed and bandwidth usage. Script tools will also be provided to analyze the logs and present performance results.

  • The other large-scale test platform uses the real data nodes of the commercial ByteDance MPD system deployed on the Internet. However, for security and performance concerns, we are not ready to offer free access to our test platform yet. Instead, you can submit your code and we will deploy and run the tests for you. Please follow the submission instructions to run tests on this platform. The final evaluation will also be held on this large-scale test platform.