Monday, October 02, 2017

Busy busy bee

How interesting that I'm more likely to post when super busy with a task that has a pending deadline. Anyway, just wanted to share a neat tool for combing segments of HTTP streaming files into one. Last night I decided to see if the content of a certain music video streaming site could be downloaded offline. The website can be clunky, and streaming over the internet is sometimes less preferable to watching offline and reducing system load (ish).

First, some basics. (This won't be an exhaustive list, just what I've encountered in the past.)

When you stream media, it will come to your browser via several methods.

One method is to simply provide a whole big file, and let the browser handle how to play it. Firefox has a built-in media player, and HTML5 has built-in tags, so why not have the user click on a link that hides an MP4 file and let the browser play it how it wishes.

An alternative method is to build your own serving capability and web application around serving media, such as Soundcloud, which appends all sorts of authentication and one-time-use tokens to actually get to a file, rather than just having a list of files.A corollary to this one is to encrypt the music and decrypt on-the-fly, such as with RTSP, which adds some level of restrictions against bypass, but can still be scripted away.

A third method, possibly more in use for large video files, is to "break the overall stream into a sequence of small HTTP-based file downloads, each download loading one short chunk of an overall potentially unbounded transport stream." (Wikipedia, HTTP Live Streaming) This works well for live video, such as on Youtube, Facebook Live, Snapchat, etc, where there is no defined end and therefore no file size to start with. It's also easier on the user, which only needs to load content chunk by chunk, and can be used in situations where the user's network might be unreliable.

Before the stream beings, an m3u8 playlist file is downloaded by the browser, with all the pieces listed. The segments have an extension of .ts, and if you look at just one, it'll be a few seconds or so of the overall piece.

Now, on to the process. If you open an intercepting proxy such as BurpSuite or Zap, load a webpage with streaming media on the associated browser, and watch the traffic, you'll see a series of requests being made. First the HTTP page where the media, for example's sake a piece of music, sits. Next will be some script and content style files which could have some use, but most likely not. Then a request will be made for something like "file_128kpbs.mp3", and a big file will show up, which is the entire mp3 itself. If we're loading a streaming movie, you'll see a request made for an m3u8 file with a response containing all 507 pieces.

There is a tool on Github called HLS-fetch, which can be given either the page with the content or the m3u8 link itself to download all the files and stitch them into a TS file. Plex can play that right off the bat, no transcoding.

No comments: