154 reads

Web Activities

by Dima VoytenkoJuly 14th, 2018

Too Long; Didn't Read

This is a bit of a back-to-basics topic. It’s a common task for a web page to open another page (a popup) that does some action and returns its result back to the caller page. This could be a same-origin or a cross-origin popup. The typical use cases include location picker, contact picker, payment form, social sign-in/SSO, and so on.

Companies Mentioned

Introduction

Android SDK has a simple and clean API for this — startActivityForResult. For instance, see Place Picker. Web, however, has no such clear API. Back in IE days, there was a non-standard showModalDialog API that was poorly designed and implemented, and has since been deprecated in most of browsers. So, if you were to try to support such a use case on the web today, you’d have to cover the following issues:

A popup can be opened using window.open(url, target) API.
Once the popup is opened, you can communicate with it using custom window messaging (postMessage) with window.opener. The popup must be opened correctly to ensure that the window.opener is available (no rel=noopener).
You’d also have to contend with a possible set of XSS attacks via window.opener. Obviously, you’d trust the page you open itself, but it’s possible that the popup site could be compromised, which would expose the caller site to attacks as well.
Mobile browsers unload pages in the background. Thus, the caller page might get unloaded and not able to receive messages from the popup.
Not all environments support multiple windows and popups. For instance, some Web Views or home-screen web apps, instead of opening a popup with the specified target, redirect the caller page as if target=_top. In this case, the window.opener would not be available and there would be no way to message to it. Typically, this is solved via use of redirects, again, using custom protocols by passing data via redirect URLs.
Some browsers allow opening a popup, but do not support window.opener.
Open a couple of pages and couple of popups from these pages and it’ll quickly become very overwhelming UX: which popup belongs to which page? A good implementation would shade the caller page, and when clicked, would try to bring focus back to the popup. That’s a lot of work for something very basic and expected.
What if the user closes the caller window?
Data passing via window messaging is secure via origin control. The redirects are not secure at all. Obviously SSL covers many data leak vectors, but in the end, the user may still copy/paste URL and send it via email. Some environments such as Web Views also can observe redirect URLs.
In a redirect case it’s also impossible to return back to “previous” history state. This precludes some uses for history.state and related APIs.

These are just some of the important issues a Web developer has to contend with. Different browsers introduce additional issues. This is hard to do, the solutions are often brittle and the resulting UX is poor.

Surely, Web platform could do better.

Popups vs iframes

Popups will be mentioned a lot in this article, so I’d like to immediately addressed the question: why not just use iframes in place of popups? With iframes, the parent page has more control over the popup’s positioning and sizing, and iframes are void of most problems described above — most notably there’s no redirect fallback or unloading, which are probably the hardest problems to solve. In fact, iframes are used in similar use cases on the web already. The use of popups has gone down quite a bit: partially because of the issues listed above, and partially because iframes provide a better UX and simpler implementation.

However, for many use cases, especially security-sensitive cases (such as SSO and payments), use of iframes is limited due to:

Click/input-jacking. A very common attack vector where an attacker would try to position content on top of an iframe in order to mislead user to activate a sensitive action or input a sensitive piece of data where they do not expect. For instance, one attack could layer “Get for free now” UI on top of an iframe’s “Buy” button, so that user would purchase something they didn’t expect. Another attack could position an input control over an iframe’s credit card input, and thus steal the credit card from the iframe.
Iframe restrictions when working with cookies and local storage. Popup windows, as top-level browsing context, get unpartitioned cookies and storage. Thus a social sign-in form does not need to constantly re-prompt user to enter the password. Iframes, however, are getting more and more restricted in this area in the post-ITP world.

Some proposals in the past considered creating a special subclass of a fullscreen mode for iframes to provide guarantees against click/input-jacking. However, given iframe cookie partitioning this would not be sufficient. On top of that, guaranteeing a non-clickjackable mode for an iframe requires participation of both client and server, which defines restrictions in the form of X-Frame-Options and CSP. The server stack would need to have some strict guarantees by the client to lift iframing restrictions — this would be a major backward compatibility problem.

Given these restrictions, we will mainly focus on popups as the solution space.

Other alternatives to popups

The Web Platform could continue with API path of supporting key use cases that desire popups. Just as was done with Payment Request UI, the Web Platform can define APIs for Social Sign-in, Calendar sharing, and so on. This would be a big improvement for the such use cases.

However, the design and implementation of such APIs takes time. The migration requires polyfills. And ultimately the Web Platform can’t support all such use cases. Thus I welcome new Web APIs, but the generic solution for “get a result” popups is still very much needed regardless.

Android Activities API

Before we start looking at different approaches to address “get a result” popups, let’s take a look at how Android APIs arrange such use cases.

Android provides the startActivityForResult API. It comprises three components: startActivityForResult, onActivityResult, and setResult. For instance, the Place Picker API can be used like this:

@Overrideprotected void onActivityResult(int requestCode, int resultCode,Intent data) {if (resultCode != RESULT_OK) {return;}if (requestCode == PICK_PLACE_REQUEST) {// Process result.}}

private void pickPlace() {startActivityForResult(new PlacePicker.IntentBuilder().build(this),PICK_PLACE_REQUEST);}

The Place Picker activity itself has to call setResult method:

private void clickHandler() {Intent intent = new Intent();intent.putExtra("place", selectedPlaceData);setResult(RESULT_OK, intent);}

The fact that this API explicitly exists is a good sign that Android takes these use cases seriously. But there are some important properties to consider:

API codifies a standard way to return and receive the activity result. There’s no need to involve some custom messaging protocol — that’d be an overkill where we simply need to forward a bag of data from the called activity to the caller.
This API naturally still works even if Android decides to unload the caller activity. The onActivityResult handler will be called when the caller activity is restarted.
The presentation of the opened activity window is separate from the mechanics of the data exchange. Android would typically show the new activity full screen, completely obscuring the caller. However, it could also partially obscure the caller, and so on.

Supporting activities on the Web

There are several approaches that the Web Platform could consider to make coding of such “get a result” popups easier and less brittle. There are two major directions here that would support each other: contextual popups and an API for the result exchange.

This is a specialized version of the window.open API (e.g. window.openInContext, or maybe using a new window.open option) for contextual popups — the, so to say, “picture-in-picture” mode for popups. Such popups would open in the context of the caller page. This API is constructed to avoid most of the pitfalls listed in the introduction section. In particular, there’s no need for the redirect fallback and the caller (context) page would never be unloaded. It would still rely on custom window messaging to communicate results back to the caller.

To illustrate this option, consider the current state of popups on the Web. The UX is invariably poor in desktop and mobile browsers: a separate tab or window is opened. If user switches to another tab it’s pretty hard to find the popup.

One model for this could be Payment Request UI. Both on desktop and on mobile this UX ensures:

It’s obvious that the payment UI is shown in the context of the caller page.
There’s still a clear origin attribution.
The contents of the popup can not be occluded by the page itself , or another popup — it’s a modal. So there’s an inherent click/input-jacking protection.
On desktop the user can switch tabs. The popup will stay docked in the tab and will be there when the user comes back. The user does not need to keep track of both windows.

It’d be great if the popup can be opened as a top-level browsing context, but still structurally and visually presented in the context of the caller page.

One important nuance: such popups cannot open nested popups — there’s no good way to make this a good UX. But this would also give browsers better performance parameters by slowing the proliferation of tabs.

The additional big promise: contextual popups could completely eliminate the redirect fallback, which could in turn also simplify many APIs. As you will see elsewhere in this document, the redirect fallback (either due to single-window browsers, or page unloading) is the major complication for “get a result” pattern.

Final word on this topic: Web Views. Web Views today serve a huge portion of the mobile traffic (by some estimates approaching 50%), and they significantly complicate this pattern. A Web View is by default a single-window browser. Supporting a multi-window mode is a very complicated task involving memory tradeoffs. Ideally, contextual popups would be implemented in modern Web Views out of the box. They don’t have to be enabled by default, but the implementation itself is important to reduce solution space fragmentation between browsers and Web Views.

API for the result exchange

When a popup can be actually opened, the window messaging could be used to return a result back. The caller and the popup would have to agree on a custom messaging protocol to do so. It’s a bit of an overkill for something as simple as returning a single result back and it’s full of security gotchas. But it works.

However, messaging is not always available. Some cases where messaging is not available:

In redirect mode the caller’s context is replaced by the popup. The window.opener field is not available and messaging is not possible. Notice that the redirect mode is sometimes a UX choice, but sometimes forced by the environment: a single-window browser, a popup blocker, etc.
The caller window could be unloaded on a mobile device making messaging impossible.

In the redirect mode, the only way to return data back to the caller is to redirect back to it with the result in the URL, e.g. in the URL fragment. When the result is sensitive, it has to be encrypted to avoid undesired leaks. Supporting the redirect mode correctly is an arduous task and any solution would be very brittle.

A couple of approaches to consider here: provide a safe way to return data in the redirect mode, or adopt a new API that supports both popups and redirects similar to Android Activity API.

We’d like to focus on the Activity API for Web, but first, a few words about redirect mode.

An additional API to return data in the redirect mode

We could provide a special API aimed specifically at the redirect mode to return data back safely and securely. For instance, we could extend Web History API. The popup would push data into the history stack before redirecting back to the caller:

history.pushResult(resultData, "https://caller.com");window.location.replace("https://caller.com/continue");

The caller will be able to read the data from the history stack as soon as it starts up:

var result = history.popResult("https://popup.com");if (result) {// Result is already available....}

Notice that both push and pop are restricted to specific origins of the caller and the popup to ensure that data is exchanged securely.

Proposal — Android-like Activities API for Web

Adopting the Android-like Activities API on Web is a more radical solution. But it could solve both the redirect and popup modes very naturally. It could also play well with contextual popups idea for an improved UX.

API

Following Android API, we could introduce similar API on the Web to include: window.openForResult(), window.onResult(), and window.setResult(). The Place Picker caller from above would look like this on the Web:

// Anticipate that the result might arrive at some point, even// if openForResult has not been called in the instance of this// page.window.onResult('pickPlace','https://maps.google.com',(response) => {if (response.ok) {// Process result.}});

// Call openForResult.button.onclick = () => {window.openForResult('pickPlace','https://maps.google.com/pickplace',target);};

The popup page would return the result like this:

window.setResult(ok, payload);

Let’s look at this API in more detail.

window.openForResult

window.openForResult(requestId: string,url: string,target: string,opt_options: string): void

This method is similar to window.open, however there are key differences as well. The arguments are:

requestId — the string ID that will later available in the window.onResult callback.
url — the popup URL.
target — the window target: this either makes it into a popup or a redirect.
opt_options — additional options, similar to window.open

Unlike window.open, neither the popup window reference nor window.opener are needed in window.openForResult API. While these still could be provided, removing them would reduce the surface for XSS exploits.

window.onResult

window.onResult(requestId: string,resultOrigin: string,callback: function(Response)): void

This method is used to register a callback for the requestId that will further be specified in the window.openForResult.

It’s very tempting to do away with this method and simply make window.openForResult return a promise. However, it’s not so simple because of redirects and page unloading. If the caller page has been redirected or unloaded, the result would be kept by the browser in some ephemeral storage (such as history stack) and as soon as the callback is registered using window.onResult it would be immediately called with the result. This feature would also come in handy for the redirect polyfill in one of the sections below. On the other hand, the contextual popups, if they ever became a reality, could help simplify this API.

Another nuance, resultOrigin parameter is generally not necessary since the origin is clear from the url in the window.openForResult call. However, it’s a good additional protection and would also be helpful for the polyfill.

window.setResult

window.setResult(ok: boolean,payload: Object): void

This method will be called by the popup once the result is available, and it will close the popup to return back to the caller.

The arguments are:

ok — the completion signal: true indicates success, false indicates cancellation or failure.
payload — the data payload of the action taken in the popup. In case of error — the reason of failure.

Polyfill

Polyfilling this API is challenging, but possible.

window.openForResult polyfill

The API call is:

window.openForResult(requestId, url, target, opt_options)

The polyfill will execute the following steps:

Set up a window messaging listener.
Call popup = window.open(url, target, opt_options). Add the #ACTIVITY={requestId, returnUrl} fragment to the url proactively, in case the browser environment will silently open the window as _top (see step 7).
If window.open failed or invalid popup object is returned, go to step 7.
Start keep alive polling to check for popup.closed. We need it in case the popup is closed by the user to produce “canceled” signal.
When “result” message arrives, pass the requestId and response{ok, payload} to the result processing code.
Send “result-acknowledge” message back to the popup. The end.
If the window.open call failed, redirect the current page to the url with an added fragment #ACTIVITY={requestId, returnUrl}. The caller page execution is aborted, but we expect to return once the popup page has completed via redirect. The end.

In case of window.open failure when the polyfill falls back to redirect (step 7), when the popup page redirects back to the caller:

Decode #ACTIVITY={…} fragment parameter which will contain structure {requestId, ok, payload}.
Check that the document.referer and url are from the same origin. If not the same origin, fail.
Pass the requestId and response{resultOrigin, ok, payload} to the result processing code. The end.

After all of these operations, the #ACTIVITY is erased from the fragment.

window.onResult polyfill

The API call is:

window.onResult(requestId, resultOrigin, function(response) {})

The polyfill execution depends on whether the response arrives before or after the window.onResult() is called.

If the response arrives before the window.onResult is called, polyfill simply would store the response in memory or history stack for the requestId. Once the window.onResult is called and the corresponding response is available, do the following steps:

Check that resultOrigin argument is the same as response.resultOrigin.
Call the callback(response{ok, payload}).
Remove the response from memory.

window.setResult polyfill

The API call is:

window.setResult(ok, payload);

The polyfill in the popup page considers two modes: popup or redirect. If the window.opener is available, assume the popup mode and execute these steps:

Set up message listener and wait for “result-acknowledge” message.
Send “result” message with {ok, payload} via window.opener.postMessage.
Once “result-acknowledge” message arrives, close self. The end.
If “result-acknowledge” does not arrive within timeout, assume the caller pager is unloaded and attempt to recovery using the redirect mode. Proceed with redirect steps.

If window.opener is not available or messaging failed, assume redirect mode and execute the following steps:

Parse #ACTIVITY={requestId, returnUrl}.
Check that returnUrl and document.referrer have the same origin. If not, fail.
If window.opener is not available, redirect to the returnUrl with added fragment #ACTIVITY={requestId, ok, payload}. The end.
If window.opener is available (i.e. messaging failed), open the same URL with target _blank, i.e. window.open(‘returnUrl#ACTIVITY={…}’, ‘_blank’). The end.

Polyfill, redirects and data sensitivity

Redirect really complicates mechanics of data exchange. Passing sensitive data as the payload in the redirect URL might be problematic in some use cases. If that’s considered to be a problem, the popup and the caller page must agree not to send data in plain text and instead use some form encryption. For instance:

window.onResult('sensitive-request','https://popup-domain',(response) => {if (response.ok) {fetch('https://popup.com/decrypt', {method: 'POST',body: response.payload.encrypted,}).then(response => {// Process the payload as the actual popup response.});}});

This way the sensitive data is never passed in the redirect URLs and service fetch can control for CORS origin. The https://popup.com/decrypt.json would decrypt and return the payload and rely on CORS Origin header to ensure origin-to-origin security. Critically, this request must also rely on some internal session identifier to prevent session fixation attacks (SFA):

app.post('/decrypt.json', (req, res) => {// "decrypted" is a structure:// {forOrigin: string, sessionId: string, data: Object}var decrypted = decrypt(req.body['encryped']);

// The CORS origin must match the indented origin:if (decrypted.forOrigin != req.headers.origin) {res.sendError(403);return;}

// The CORS cookie should correspond to the intended sessionId:if (decrypted.sessionId != getSessionId(req.cookies)) {res.sendError(403);return;}

// All good: send back the data.res.send(decrypted.data);});

Web Activities implementation

While not exactly the same API, the web-activities project in GitHub implements an API with a very similar shape.

Other considerations

Native integration and custom protocols

Currently, window.open could, in theory, be forwarded to a matching native activity on Android. However, there’s no way to return the result back to the caller page. Let’s imagine how this could be made possible.

If the browser implements the proposed API, it would be straightforward to extend it to native activities using the same API. How would, however, the openForResult API allow native invocation? Android already allows intent filters to intercept URLs, but additionally, we could extend openForResult to support alternative URLs for intent URLs and custom protocols. For instance:

window.openForResult(requestId,[// First try an intent URL.'intent://get-place/#Intent;scheme=a;package=com.a;end',// Then, try a custom protocol.'web+location://get-place',// Finally, fallback to web URL.'https://maps.google.com/placepicker']);

Note that in this case window.onResult would also accept an array of origins for the resultOrigin argument.

If the origin is important for sensitive communication the caller page may require “strict origin mode”, in which case the browser/Android platform can require origin verification similar to Android’s Digital Assets Links protocol. E.g.:

window.openForResult(requestId, url, {origin: 'strict'});

Native support is not polyfillable at this time. If multiple URLs are specified, the polyfill would simply use the last (fallback) URL in the array.

Summary

Implementing high-quality popup-for-result pattern on the Web today is too complicated. This is additionally exacerbated by the fact that many such cases have critical security and privacy needs. Web Platform could implement improvements to this pattern in the form of contextual popups, and/or by following the existing startActivityForResult protocol from Android SDK. Either or both could significantly improve development experience, stability, safety, cross-browser support, and UX.