Download Analytics
The Podlove Publisher tracks download intents made by clients. It is only tracked that a download was started but not if it was completed. For brevity, this document will speak of “downloads”. Just be aware that what is tracked are actually only download intents. So when you are looking at your data, be aware that the numbers displayed do not represent the actual number of listening users.
Anatomy of Tracking URLs
The Publisher creates “tracking URLs”. For example, if your media file is this:
media.example.com/podcast/pod001.m4a
then a tracking URL might look like that:
example.com/podlove/file/646/s/feed/c/m4a/pod001.m4a
Requests on tracking URLs are intercepted by the Publisher, analyzed, and finally redirected to the actual physical file. On to a closer look at the URL components.
example.com/podlove/file/646/s/feed/c/m4a/pod001.m4a
This is your blog domain and a “podlove” URL prefix so tracking URLs don’t interfere with your blogs pages.
example.com/podlove
/file/646/s/feed/c/m4a/pod001.m4a
This identifies the actual file to download.
example.com/podlove/file/646
/s/feed/c/m4a/pod001.m4a
These are indicators for the source (s) and context (c) of the Download. This allows you to have separate analytics for downloads from feeds, the web player, etc.
example.com/podlove/file/646/s/feed/c/m4a
/pod001.m4a
What looks like the file name is purely of decorative nature. It makes the URL easier to read and some command line clients will use that part of the URL to auto-generate a filename. But it is irrelevant for the purpose of tracking.
Tracking Data
Only real download requests are tracked. In more technical terms: HEAD
requests are ignored. These are the analyzed and saved values:
UA (User Agent)
Based on the UA, facts about the client can be derived or guessed; such as:
- client name (Chrome, Firefox, iTunes, Instacast, …)
- operating system (Android, iOS, Mac, …)
- device (iPhone, Galaxy Nexus, …)
Furthermore, bots can be detected.
File ID
A reference to the downloaded file is kept. This allows to group tracking data by episode.
Request ID
The request ID is an artificial compound ID to identify identical requests anonymously. It is a hash from the combined IP address and user agent string.
Source & Context
Download source & context allows to analyze tracking data by how and where the file was requested. By default, the sources are:
- download: a direct download via the website
- feed: an automatic download via RSS feeds
- webplayer: the episode was played using the player on the website
To further drill down, each source has multiple contexts:
- download
- select-button: a direct download by clicking a download button
- select-show: the user obtained the URL by revealing it on the website
- for feed, the asset is saved (m4a, mp3, ogg, etc.)
- webplayer
- episode: the player on the episode page
- home: the player on the home page
- website: the player on any other page
Geo Location
Based on the IP address and the GeoLite2 Database by MaxMind location data is saved.
Data Cleanup
Before tracking data is presented in the analytics area, it is cleaned up. Cleanup involves the following steps:
HEAD
requests are ignored- requests for the first byte are ignored
- Based on the User Agent analysis bots are filtered out.
- Duplicate requests are filtered out. A request is considered a duplicate if it contains
- the same File ID
- and the same Request ID
- and was made within the same hour (can be changed to day in settings)
- Pre-Release downloads are filtered out. They may happen if you test downloads before publishing the episode.
IAB Conformity
Advertisers often ask for downloads numbers according to IAB Podcast Measurement Guidelines. The "data cleanup" section above explains how download numbers are treated, which is in fact according to the IAB guideline.
The only change you may need to do is set the deduplication window to a day instead of an hour, the Podlove Publisher default. You can do this in Podlove > Expert Settings > Tracking
.
Database Structure
If you would like to access the raw analytics data and process it yourself, you need to know where what data is and what it means.
wp_podlove_downloadintent
You probably want to use wp_podlove_downloadintentclean
instead. This table contains every tracked request. Columns are nearly identical to wp_podlove_downloadintentclean
.
wp_podlove_downloadintentclean
This table contains cleaned data from wp_podlove_downloadintent
. See section "Data Cleanup" for detail on how data is cleaned/aggregated. Each row represents one download. If you count the rows in this table you have the total number of downloads.
Column | Description |
---|---|
id | Unique, auto-incrementing integer id |
user_agent_id | reference to id in wp_podlove_useragent |
media_file_id | reference to id in wp_podlove_mediafile |
request_id | Artificial compound ID to identify identical requests anonymously. It is a hash from the combined IP address and user agent string. |
accessed_at | Date and time of the download |
source | One of "download", "feed", "webplayer". Specifies how the download was initiated. |
context | A more specific description for the source column. |
geo_area_id | reference to id in wp_podlove_geoarea |
lat | Location: latitude |
lng | Location: longitude |
httprange | HTTP "Range" header of the request, which may specify which bytes exactly were requested. |
hours_since_release | Amount of hours between episode release and download. May be useful for aggregation. |
wp_podlove_mediafile
This table holds metadata for each file, like the file size. It holds references to the episode and asset.
Column | Description |
---|---|
id | Unique, auto-incrementing integer id |
episode_id | reference to id in wp_podlove_episode |
episode_asset_id | reference to id in wp_podlove_episodeasset |
size | file size in bytes. -1 or NULL if unknown. |
etag | etag used in HTTP requests |
wp_podlove_episodeasset
This table holds metadata for assets. It is useful if you want to display asset names or need the reference to the file type.
Column | Description |
---|---|
id | Unique, auto-incrementing integer id |
title | Asset title |
identifier | Asset identifier for use in templates |
file_type_id | reference to id in wp_podlove_filetype |
suffix | Used for constructing download URLs |
downloadable | 1 if it appears in download dialogues, 0 otherwise. |
position | integer used for ordering |
wp_podlove_filetype
This table holds metadata for the file type associated to an asset. It is useful if you need to display the file extension or filter all downloads by type == audio
.
Column | Description |
---|---|
id | Unique, auto-incrementing integer id |
name | Filetype name |
type | One of "audio", "video", "ebook", "image", "chapters", "metadata", "transcript" |
mime_type | Mime Type |
extension | File extension |
wp_podlove_episode
This table holds metadata for the episode. It is complemented by data in the WordPress-native table wp_posts
, which you will need to access the episode title for example.
Column | Description |
---|---|
id | Unique, auto-incrementing integer id |
post_id | reference to id in wp_posts |
subtitle | Episode subtitle |
summary | Episode summary / description |
... |