REST services and multiple representations of the same object with different fields

advertisements

Let's say you have an Person object, with several fields like first_name, last_name, age which are relatively small, and several large fields like life_story.

Most calls to retrieve Person objects do not require returning the life_story, so we would rather not return it on all calls to the Person endpoint. On the other hand, when POSTing a new Person, we would like to allow the client to include the life_story field.

One option would be to have a Person endpoint and a PersonDetailed endpoint, where all calls (GET/POST/PUT) to the Person do not handle the life_story field, and all calls to the PersonDetailed require all fields.

Finally we could fudge it and make POST and PUT methods on Person to allow clients to optionally include the life_story, but to not return it when making GET calls to endpoints like

API/Person/?last_name_like=La

I'm not a fan of having GET, POST and PUT methods on the same endpoint return objects with different fields, but it does keep the API simpler.

I've been looking for examples of how people deal with issues like this, but have not found any. Can anyone point to an article or book that discusses issues like this?


As requested by @jaco0646

TL;DR

  • core user resource with embedded sub-resoruces like address, groups, posts or pm. (/api/v1/users/{user_uuid})
  • users will also contain an embedded resource called views which handles the currently registered views (/api/v1/users/{user_uuid}/views/{some_view})
  • A view is created using POST request (i.e. from a HTML form) including the selected sub-resources
  • Each view contains the core user data and the data for the selected fields
  • Partial GET request may be used if all views start with the core user data to only download the required data; though may have its limits

Issues with current answers

Before I post my approach to tackle the filtering on certain properties, I want to give a quck insight why I do not agree with the currently given answers by @jaco0646, @Yoram and @JoseMartinez (which are all rather the same IMO)

Caching of response content

HTTP tries to reduce the network overhead by cacheing responses. A second lookup for the same resource should in best case result in a lookup from the local cache instead of actually querying and downloading the result from the server directly. This is especially helpful if the resource data does not change often.

With certain cache-control header and If-Modified-Since request header a client can take influence on wheter to use a cached content or refresh the cache by loading the current content and cache the response instead. However, GET requests with query parameter are often excluded from caching which therefore increase the overall network load. Replacing query parameter with matrix parameter may be quick fix to this issue but has a rather bad smell.

Partial GET request and use-cases

As mentioned by jaco in his post, the HTTP Protocol defines, besides the standard GET and conditional GET request, also a partial GET request which allows a client to request only a part of a resource instead of the full resource.

While this may sound great to start with, a partial GET request however has, at least in HTTP/1.1, the limitation that it only works on bytes.

The only range unit defined by HTTP/1.1 is "bytes".

The Range header allows to add multiple byte segments to the request to include multiple segments within the response:

GET /someResource HTTP/1.1
Host: http://some-host.com/
Range: 500-700,1200-

The partial request asks to download only the bytes between (and including) 500-700 and everything from byte 1200 till the end.

Usually a partial GET request is used to resume a broken download or for buffering a running stream as the exactly downloaded bytes are already known. But, how do you specify in advance the byte ranges of each filter-field? Without a-priori knowledge I don't think this will work.

URL size limitation

In case there are many fields which may be available for filtering, using a GET request with query or matrix parameter may cause certain browser issues as some browsers have a limitation of 2000 characters.

While this may not have an impact on the OPs issue, an other user who requires exhaustive filtering properties may run into this issue though.

Resources and sub-resources

ReST focus is on resources and the methods HTTP protocol offers to interact with them.

A user-resource i.e. has certain "core" data like the user name, an id and maybe other domain specific things. But it also has additional data like the address, ... which may be part of the user resource as well.

Instead of mingling every property into a single entity, ReSTfull applications try to have plenty of resources. Like in the sample above user and address are just two to name but there are many more for sure. If you start working on a ReSTfull design it might not be clear if certain data should be part of this resource or refactored to its own resource. Here a rule of thumb is, if you need certain data in at least two different resources refactor it and embedd it within these resources.

Dividing larg(er) resources into a hirarchy allows to easily update (in the pure HTTP sense of replacing what is available currently at resource X with the new content) sub-resources in case of changes (like an address change of a user) while having one big resource to handle all data requires to send the whole entity body (if used properly) to the server instead of only the change.

Entity formats

Plenty of "ReSTfull" services exchange data in application/xml or application/json format. However, both do not convey much semantic. They just lay out the used syntax rules which might be validated on client side. But they do not give any hint on the actual content. Therefore a client has to have also a-priori knowledge on how to process data received in one of these formats.

If JSON is the representation format of your choice, I'd use JSON HAL (application/hal+json) instead as this defines core data, links and embedded content which is quite usefull especially for the presented scenario IMO.

Proposed solution

The proposed approach has a core user resource which embedds the certain sub-resoruces like address, groups, posts or pm. It will also contain an embedded resource called views which handles the currently registered views for either a user or for users in general. A view is created by sending a POST request (i.e. from a HTML form) including the selected sub-resources to include within the response.

The core resource is a user resource, which might be available at /api/v1/users/{user_uuid} and by default only includes the user core data and links to the other resources

{
    "firstName": "Maria",
    "lastName": "Sample",
    ...
    "_links": {
        "self": {
            "href": "/api/users/1234-5678-9123-4567"
        },
        "addresses": [
            { "href": "/api/users/1234-5678-9123-4567/addresses/abc1" }
        ],
        "groups": [
            { "href": "/api/users/1234-5678-9123-4567/groups" }
        ],
        "posts": [
            { "href": "/api/users/1234-5678-9123-4567/posts" }
        ],
        ...
        "views: [
            { "href": "/api/users/1234-5678-9123-4567/views/view-a" },
            { "href": "/api/users/1234-5678-9123-4567/views/view-b" }
        ]
    }
}

Any sub-resource is available via the users resource URI: /api/v1/users/1234-5678-9123-4567/{sub_resource}, where sub_resource may be one of the following: addresses, groups, posts, ...

The actual sub-resource for an address i.e. may look like this

{
    "street": "Sample Street"
    "city": "Some City"
    "zipCode": "12345"
    "country": "Neverland"
    ...
    "_links": {
        "self": {
            "href": "/api/v1/users/1234-5678-9123-4567/addresses/abc1"
        },
        "googleMaps": {
            "href": "http://maps.google.com/?ll=39.774769,-74.86084"
        }
    }
}

while the user has two posts like these

{
    "id": 1;
    "date": "2016-02-21'T'14:06:20.345Z",
    "text": "Lorem ipsum ...",
    "_links": {
        "self: {
            "href": "/api/users/1234-5678-9123-4567/posts/1"
        }
    }
}

{
    "id": 2;
    "date": "2016-02-21'T'14:34:50.891Z",
    "text": "Lorem ipsum ...",
    "_links": {
        "self: {
            "href": "/api/users/1234-5678-9123-4567/posts/2"
        }
    }
}

A view (/api/users/1234-5678-9123-4567/views/view-a) which contains addresses and posts may look like this:

{
    "firstName": "Maria",
    "lastName": "Sample",
    ...
    "_links": {
        "self": {
            "href": "/api/users/1234-5678-9123-4567"
        },
        "addresses": [
            { "href": "/api/users/1234-5678-9123-4567/addresses/abc1" }
        ],
        "groups": [
            { "href": "/api/users/1234-5678-9123-4567/groups" }
        ],
        "posts": [
            { "href": "/api/users/1234-5678-9123-4567/posts" }
        ],
        ...
        "views: [
            { "href": "/api/users/1234-5678-9123-4567/views/view-a" },
            { "href": "/api/users/1234-5678-9123-4567/views/view-b" }
        ]
    },
    "_embedded": {
        "addresses:" : [
            {
                "street": "Sample Street"
                "city": "Some City"
                "zipCode": "12345"
                "country": "Neverland"
                ...
                "_links": {
                    "self": {
                        "href": "/api/v1/users/1234-5678-9123-4567/addresses/abc1"
                    },
                    "googleMaps": {
                        "href": "http://maps.google.com/?ll=39.774769,-74.86084"
                    }
                }
            }
        ],
        "posts": [
            {
                "id": 1;
                "date": "2016-02-21'T'14:06:20.345Z",
                "text": "Lorem ipsum ...",
                "_links": {
                    "self: {
                        "href": "/api/users/1234-5678-9123-4567/posts/1"
                    }
                }
            },
            {
                "id": 2;
                "date": "2016-02-21'T'14:34:50.891Z",
                "text": "Lorem ipsum ...",
                "_links": {
                    "self: {
                        "href": "/api/users/1234-5678-9123-4567/posts/2"
                    }
                }
            }
        ]
    }
}

An other view (i.e. /api/users/1234-5678-9123-4567/views/view-b) may only include posts done by the selected user:

{
    "firstName": "Maria",
    "lastName": "Sample",
    ...
    "_links": {
        "self": {
            "href": "/api/users/1234-5678-9123-4567"
        },
        "addresses": [
            { "href": "/api/users/1234-5678-9123-4567/addresses/abc1" }
        ],
        "groups": [
            { "href": "/api/users/1234-5678-9123-4567/groups" }
        ],
        "posts": [
            { "href": "/api/users/1234-5678-9123-4567/posts" }
        ],
        ...
        "views: [
            { "href": "/api/users/1234-5678-9123-4567/views/view-a" },
            { "href": "/api/users/1234-5678-9123-4567/views/view-b" }
        ]
    },
    "_embedded": {
        "posts": [
            {
                "id": 1;
                "date": "2016-02-21'T'14:06:20.345Z",
                "text": "Lorem ipsum ...",
                "_links": {
                    "self: {
                        "href": "/api/users/1234-5678-9123-4567/posts/1"
                    }
                }
            },
            {
                "id": 2;
                "date": "2016-02-21'T'14:34:50.891Z",
                "text": "Lorem ipsum ...",
                "_links": {
                    "self: {
                        "href": "/api/users/1234-5678-9123-4567/posts/1"
                    }
                }
            }
        ]
    }
}

On invoiking /api/users/1234-5678-9123-4567/views you may show a list of currently available views and also a HTML form (or some custom UI) where you have checkboxes for each available field you want to include or exclude. On sending the form data to the server, it will check if for the given properties already a view exists (if so 409 Conflict) and creates a new view which might be reused later. You might also name the views and include certain selected properties within the views segment within the _links section.

Instead of specifying a view per user, you can also create a general view once for all users and reuse them to your will.

As the views have no query parameters the whole response is cacheable. As you create a view using a POST request (if idempotancy is an issue use an empty POST request followed by a PUT request) you are good to go for almost infinite parameters. This HAL similar dialect uses its own logic for views. It, therefore, might be also a good idea to create an own content type like: application/vnd+users.views+hal+json

Concerning partial GET requests:

As the core user data is the same for every view, it might be possible to use the length of the core data (minus the closing bracket and any whitespace characters after the second last bracket) and issue a partial GET request to the server. It should respond with only the embedded data (and the final closing bracket), though I'm not sure if current browsers are actually able to update the current data accordingly, especially if certain bytes of the known content need to be removed like the final bracket of the core user data.