Over Christmas, I decided to finally tackle learning Elixir. Since I learn new programming languages best by actually building something, I set out to implement a simple RSS reader/sync service, similar to Miniflux and FreshRSS.
Since I actually want to use this app, it needs to have a way to connect to my client apps (Reeder 5/Classic on iOS and macOS, as well as NewsFlash on Linux). Since the Google Reader API implementation is well-supported in most RSS clients, I set out to build an API compatible with the existing spec.
As it turns out, it’s hard: The API was never officially documented, and existing implementations rely on reverse-engineered specs. Even though robust open-source implementations exist along with some documentation, getting it working - especially with Reeder - was not easy. I had to do additional reverse engineering to get all the details right. The original API spec dates back to 2005, uses multiple response formats, and is not particularly well-structured compared to modern REST APIs.
Additionally, many reverse-engineered docs suggest implementations that might be compatible with the “original” Google Reader API spec but unfortunately do not work with many clients, which have evolved to support slight deviations of the spec. While this documentation is by no means complete, it should provide you with a functional baseline for the most essential features:
- Logging in
- Fetching subscriptions
- Fetching feed items
- Changing starred and read status
Below, I’ll do my best to document the most important parts of the API, including all the details needed to get things working.
You can find the almost “full” implementation (with some mock values for now) here: https://github.com/dstapp/feedproxy/blob/main/lib/feedproxy_web/controllers/greader_api_controller.ex
I don’t claim this implementation is complete or even bug-free, but it got me to a state where I can confidently use my RSS clients with it.
Concepts
In general, there are two entities: Feeds and Feed Items. A Feed is a subscription to a given RSS URL, while a Feed Item is a single news item inside the RSS feed.
URLs
Most of the URLs start with /reader/api/0
. The 0 is a version parameter, but there was never anything other than 0, so you can hardcode that value.
Authentication
There are two types of tokens that the user receives: a long-lived “auth token” and a short-lived “session token.”
The long-lived token, which the client gets back from /accounts/ClientLogin
, is sent in each subsequent request as an Authorization header:
Authorization: GoogleLogin auth={authtoken}
This token remains valid until the user “logs out” (according to reverse-engineered docs). In modern clients, they don’t. So, just provide a token that lets you verify the user’s identity.
The session token is provided via the T
request parameter in scenarios where data is being modified (e.g., marking items as read or starring items).
Realistically, we’re usually not dealing with classified material here. This is why existing implementations like FreshRSS and Miniflux don’t even bother generating a short-lived token. In FreshRSS, they generate a SHA-1 hash based on the system salt, username, and respective password hash, making the token effectively permanent, with no expiration.
If any request cannot be authenticated, return a 401 Unauthorized
HTTP response.
Streams
A stream describes a “category” of feed items to be returned. When you request data for a specific stream, you get back a list of feed items. These are the existing streams:
user/-/state/com.google/starred
: Returns feed items that are marked with a staruser/-/state/com.google/read
: One would expect this to return a list of read (not unread) items, but most clients rely on this to return everything, both read and unread items.user/-/state/com.google/reading-list
: Returns all feed items, regardless of their read/starred state.user/-/state/com.google/kept-unread
: I expose this, but in my case, it just returns the same as reading-list. I haven’t observed any client requesting it.
Depending on which stream is requested, you may need to adjust your database query to return only the relevant data.
Those strings are not only used as streams, but also as tags for feed items, to indicate they are starred or read, etc. But we’ll tackle that below.
Item IDs
According to the spec, there are multiple ways to represent Feed Item IDs. Clients may use different formats when requesting data, so your implementation must support all of them. When generating API responses, you must use specific representations for different endpoints. Internally, IDs should be stored as integers, as some clients cannot parse UUIDs or MongoDB-style IDs. These are the available representations:
Long-form hex representation: Looks like tag:google.com,2005:reader/item/000000000000001F
. It has a fixed prefix tag:google.com,2005:reader/item/
followed by a hexadecimal representation of your integer ID, left-padded with 0 to a length of 16 characters.
Short-form hex representation: I didn’t find this documented anywhere, but at least Reeder sometimes sends the 16-digit padded hexadecimal version without the tag:google.com,2005:reader/item/
prefix, like 000000000000001F
.
Short-form decimal representation: The plain decimal value of the internal integer ID.
You’ll need a parser function that can handle any of these ID formats and return the decimal value. Here’s mine:
defp parse_reader_id(id) do
case id do
"tag:google.com,2005:reader/item/" <> hex_id ->
# Handle long-form ID (hex)
case Integer.parse(hex_id, 16) do
{int_id, _} -> {:ok, int_id}
:error -> :error
end
hex_id when byte_size(hex_id) == 16 ->
# Handle short-form hex ID (16 chars, zero-padded)
case Integer.parse(hex_id, 16) do
{int_id, _} -> {:ok, int_id}
:error -> :error
end
raw_id ->
# Handle decimal ID
case Integer.parse(raw_id) do
{int_id, _} -> {:ok, int_id}
:error -> :error
end
end
end
Feed IDs
Feed IDs always follow the format feed/{id}
. I recommend using integer IDs, though it might work with unique string values. However, for best compatibility with existing clients, decimal integers are the safest choice.
Continuation Tokens
A continuation token is essentially a pagination parameter. You can implement this using timestamps or item counts. For example, if you return 20 items per page, you send 20 items and return a continuation token of 20. On the next request, the client sends 20, so you start at an offset of 20, fetch another 20, and return 40 as the next continuation token.
Dates
Dates and times are generally exposed as UNIX timestamps, either in seconds, milliseconds, or microseconds. They should always represent UTC time. Be sure to convert timezones when fetching feeds.
Also, sometimes they are ints, other times it’s strings. Make sure to get this right as clients are very picky about that.
Output Format
Most endpoints support an output parameter (either as a query param or in the body) that determines the response format. It can be json or xml, but both of my clients always request json. For now, I haven’t even implemented XML.
Endpoints
Below I’ll outline all the details for required endpoints, including some description for the corresponding functionality. There are more endpoints, but as far as I can tell, they are not required and both the clients that I tested worked fine without them, and existing server implementations also don’t seem to implement them.
I’ll put placeholder values in {}
and unless otherwise annotated, they are strings.
Authentication
By now, there are multiple ways to do Authentication for Google Reader APIs. While compatible implementations, e.g. in InnoReader seem to do OAuth, this is just the basic way, that should work with any Google Reader API-compatible client.
/accounts/ClientLogin
Authenticates the user based on username and password, and returns a auth token.
Apparently you can emit the expires_in
field, but I didn’t test that. authtoken
is a string and can basically be whatever you need to authenticate and authorize the request.
FreshRSS uses a {email}/
followed by a SHA-1 hash of system salt, email and hashed password.
Endpoint
POST /accounts/ClientLogin
Request
Clients send either application/json
or application/x-www-form-urlencoded
.
Payload:
Parameter | Required | Description |
---|---|---|
yes | Username | |
Passwd | yes | Password |
accountType | no | In my case always HOSTED_OR_GOOGLE |
service | no | In my case always reader |
client | no | Name of the client, e.g. Reeder |
output | no | Output format: json or xml |
In reality, anything but Email
and Passwd
can be ignored for authentication purposes.
Response
Expected response encoding: text/plain
Expected response:
SID={authtoken}
LSID=null
Auth={authtoken}
expires_in={expiry in seconds, e.g. 604800}
/reader/api/0/token
Returns the short-lived session token for the auth token.
Endpoint
GET /reader/api/0/token
Request
Nothing except what’s defined in concepts (auth token).
Response
Expected response encoding: text/plain
Example response:
16eec0206a01dc0cc6f7a362d907bfd2a0b731a1ZZZZZZZZZZZZZZZZ
The token should be padded if necessary to always be exactly 57 characters in size. For more details on what this actually is, see Concepts/Authentication above.
Sends an encoded token back, but in the request you’ll still get the Authorization
header, so that you can authenticate and authorize the request.
/reader/api/0/user-info
Returns user info based on the auth token. I have not seen any of that used anywhere, but some clients request it.
Endpoint
GET /reader/api/0/user-info
Request
Query params:
Parameter | Required | Description |
---|---|---|
output | no | Output format: json or xml |
Response
Expected response encoding: As per output
, I always use application/json
Example response:
{
"userId": "1",
"userName": "demo",
"userProfileId": "1",
"userEmail": "user@example.com"
}
Subscriptions
Subscription endpoints return a list of subscriptions and tags that are set up in the system. My implementation does not support tagging, so I’m just returning the standard tags required by Google Reader API clients.
/reader/api/0/subscription/list
Returns a list of subscriptions.
Endpoint
GET /reader/api/0/subscription/list
Request
Query params:
Parameter | Required | Description |
---|---|---|
output | no | Output format: json or xml |
Response
Expected response encoding: As per the output
query param, I always return application/json
.
Example response:
{
"subscriptions": [
{
"id": "feed/1",
"title": "Test feed",
"categories": [], // String IDs of categories but can be an empty array
"url": "RSS URL",
"htmlUrl": "Website URL",
"iconUrl": "Favicon URL"
}
]
}
/reader/api/0/tag/list
Returns a list of tags. If your system does not support custom tags, at least return the Google API system ones as seen in the example response below.
I don’t implement tagging, but those tags (responding with the streams) are the one that Google Reader API clients expect by default, so you can just return them as I do here.
Endpoint
GET /reader/api/0/tag/list
Request
Nothing special.
Response
Expected response encoding: application/json
Example response:
{
"tags": [
{"id": "user/-/state/com.google/starred"},
{"id": "user/-/state/com.google/read"},
{"id": "user/-/state/com.google/reading-list"},
{"id": "user/-/state/com.google/kept-unread"}
]
}
Content retrieval
Now this is where things get tricky: There are multiple ways to retrieve actual Feed Items from the API and, you can bet my two RSS clients (Reeder & NewsFlash) use different mechanisms.
NewsFlash uses the /reader/api/0/stream/contents/:streamId
endpoint to fetch all data for a given stream directly, while Reeder first fetches a list of ID for the given stream via /reader/api/0/stream/items/ids
and then sends batch calls to /reader/api/0/stream/items/contents
to load the actual contents for each ID.
/reader/api/0/stream/contents/:streamId & /reader/api/0/stream/items/contents
Returns contents of a given stream, so the actual Feed Items. Multiple endpoints and request formats for compatibility with different clients.
Endpoint
GET /reader/api/0/stream/contents/:streamId
ANDPOST /reader/api/0/stream/contents/:streamId
(it seems different clients use different methods to fetch it) as well as GET /reader/api/0/stream/items/contents
URL parameters:
streamId
: The ID of the stream as seen above, e.g.user/-/state/com.google/starred
but without URL encoding, so it’s just added as part of the URL (make sure your URL parser can handle that)
Request
Possible request params (either as query params or application/x-www-form-urlencoded
in the POST body):
Parameter | Required | Description |
---|---|---|
i | no | One or more individual Feed Item IDs (in any of the three before-mentioned formats). Note that this is not an array or anything. Because of the urlencoded nature, if it’s one item that’s requested, it looks like i=000000000000001F , if there are multiple, you just get the same query param multiple times, like i=000000000000001D&i=000000000000001E&i=000000000000001F . My body parser could not handle that, so I had to get the raw request and split it manually |
xt | no | “Exclude target”, so a Stream ID, or tag, that a Feed Item must not have to be returned |
n | no | Amount of items to request |
ot | no | “Older than”, UNIX timestamp to get increasingly older items |
c | no | “Continuation token”, so starting from item x for pagination. See above for more info. |
output | no | Output format, could be “xml” or “json”. Defaults to json |
Feed items should be returned by publishing date descending, so starting from the newest one for the query param ot
to make sense.
Response
Expected response format: Based on the output
. I always return application/json
.
Example response:
{
"id": "user/-/state/com.google/reading-list", // requested stream
"updated": 12312412, // UNIX timestamp of now as an int
"items": [
{
"id": "tag:google.com,2005:reader/item/000000000000001F", // long form ID
"title": "Example post",
"published": 123123123, // UNIX timestamp as int
"crawlTimeMsec": "123123123", // UNIX timestamp crawl time in milliseconds as string
"timestampUsec": "123123123000", // UNIX timestamp crawl time in microseconds as string
"alternate": [{
"href": "https://example.com/post" // url to the post
}],
"canonical": [{
"href": "https://example.com/post" // same as alternate
}],
"summary": {
"content": "" // excerpt or post body
},
"categories": [ // categories, but at least the standard tags that apply for the given post
"user/-/state/com.google/reading-list",
"user/-/state/com.google/read",
"user/-/state/com.google/starred"
],
"origin": {
"streamId": "feed/3", // ID of the Feed
"title": "sample feed",
"htmlUrl": "https://example.com"
}
}
]
}
Please make sure to have the given timestamps as int or string respectively, as pointed out in the example response.
/reader/api/0/stream/items/ids
Returns a list of item IDs for the given stream. Same logic as above as far as data fetching goes, but we only return the IDs as pure decimal values.
Endpoint
GET /reader/api/0/stream/items/ids
Request
Possible request params:
Parameter | Required | Description |
---|---|---|
s | yes | The ID of the stream as seen above, e.g.user/-/state/com.google/starred |
n | no | Amount of items to request, up 10000 |
ot | no | “Older than”, UNIX timestamp to get increasingly older items |
xt | no | “Exclude target”, so a Stream ID, or tag, that a Feed Item must not have to be returned |
it | no | “Include target”, inverse of xt |
r | no | If set to o -> reverse sort |
output | no | Output format, could be “xml” or “json”. Defaults to json |
Response
Expected response encoding: According to output
, for now I always use application/json
.
Example response:
{
"itemRefs": [
{ "id": "1" },
{ "id": "2" },
{ "id": "3" }
]
}
/reader/api/0/unread-count
Returns the unread count for each of the feeds individually, including the tmiestamp of the newest item. If all=1
, then you might have Feeds with a count=0
in your list.
Endpoint
GET /reader/api/0/unread-count
Request
Possible query params:
Parameter | Required | Description |
---|---|---|
all | no | Include feeds with no unread items if 1 is provided as a value |
output | no | Output format, could be “xml” or “json”. Defaults to json |
Response
Expected response encoding: According to output
, I always use application/json
.
Example response:
{
"max": 5, // integer representing the sum of all "count"s
"unreadcounts": [ // one item for each Feed
{
"id": "feed/1",
"count": 5, // int representing count of unread items
"newestItemTimestampUsec": "1231231230000" // UNIX timestamp in microseconds of the newest item as string or "0"
}
]
}
State manipulation
Changing read and starred states is generally done through the edit-tag
endpoint. That supports single and batch items through the i
parameter (see above). Additionally, there’s a mark-all-as-read
endpoint.
/reader/api/0/edit-tag
Edits a tag given in the request, so marking as read/unread or starred/unstarred.
Endpoint
POST /reader/api/0/edit-tag
Request
Request encoding: application/x-www-form-urlencoded
Request params:
Parameter | Required | Description |
---|---|---|
i | no | One or more individual Feed Item IDs (in any of the three before-mentioned formats). Note that this is not an array or anything. Because of the urlencoded nature, if it’s one item that’s requested, it looks like i=000000000000001F , if there are multiple, you just get the same query param multiple times, like i=000000000000001D&i=000000000000001E&i=000000000000001F . My body parser could not handle that, so I had to get the raw request and split it manually |
a | no | Tag ID to add (url-encoded) |
r | no | Tag ID to remove (url-encoded) |
Possible Tag IDs:
user/-/state/com.google/read
user/-/state/com.google/starred
Response
Response content type: text/plain
(Reeder needs that)
Response:
OK
This endpoint must return a plain-text “OK” for some clients to work, while others don’t care apparently.
/reader/api/0/mark-all-as-read
Marks all items in a stream as read, starting from the given timestamp.
Endpoint
POST /reader/api/0/mark-all-as-read
Request
Request encoding: application/x-www-form-urlencoded
Request params:
Parameter | Required | Description |
---|---|---|
s | yes | Stream ID to update |
ts | yes | UNIX timestamp to start from |
Response
Response content type: text/plain
(Reeder needs that)
Response:
OK
This endpoint must return a plain-text “OK” for some clients to work, while others don’t care apparently.