Recently, I worked on a Playwright Java project. One of its characteristics is its single-thread nature. This means we have to use a Playwright instance in a single thread at a time. For the performance’s sake, we need to overcome this limit.
To get to the point, in the following demo, we’ll expose an HTTP endpoint that screenshots a web page given its URL. And we’ll build various solutions along with their benchmark when it’s useful to do so. These solutions will use the following interfaces. And they differ only by the Browser
implementation.
public interface Browser extends AutoCloseable {
Screenshot screenshot(URL url);
}
public interface Screenshot {
byte[] asByteArray();
String mimeType();
}
The first and unsafe (i.e., not thread-safe) solution could be the following one.
public final class UnsafePlaywright implements Browser {
private final Playwright playwright;
private final Page page;
public UnsafePlaywright() {
this(Playwright::chromium);
}
public UnsafePlaywright(final Function<Playwright, BrowserType> browserTypeFn) {
this(
browserTypeFn,
Playwright.create()
);
}
// this constructor pre-instantiate Playwright objects in order to avoid delay
UnsafePlaywright(final Function<Playwright, BrowserType> browserTypeFn, final Playwright playwright) {
this(
playwright,
browserTypeFn.apply(playwright).launch().newContext().newPage()
);
}
UnsafePlaywright(
final Playwright playwright,
final Page page
) {
this.playwright = playwright;
this.page = page;
}
@Override
public Screenshot screenshot(final URL url) {
this.page.navigate(url.toString());
return new Png(this.page.screenshot());
}
@Override
public void close() throws Exception {
this.playwright.close();
}
}
This solution by itself is useless except in a single-thread context. So, the next step is to build a thread-safe solution.
This solution implements the Browser
interface considering the single-thread nature of Playwright. It composes the UnsafePlaywright
and the general-purpose LockBasedBrowser
.
public final class LockBasedPlaywright implements Browser {
private final Browser origin;
public LockBasedPlaywright() {
this(Playwright::chromium);
}
public LockBasedPlaywright(final Function<Playwright, BrowserType> browserTypeFn) {
this(
new LockBasedBrowser(
new UnsafePlaywright(browserTypeFn)
)
);
}
LockBasedPlaywright(final Browser origin) {
this.origin = origin;
}
@Override
public Screenshot screenshot(final URL url) {
return this.origin.screenshot(url);
}
@Override
public void close() throws Exception {
this.origin.close();
}
}
final class LockBasedBrowser implements Browser {
private final Browser origin;
private final Lock lock;
LockBasedBrowser(final Browser origin) {
this(
origin,
new ReentrantLock()
);
}
LockBasedBrowser(final Browser origin, final Lock lock) {
this.origin = origin;
this.lock = lock;
}
@Override
public Screenshot screenshot(final URL url) {
this.lock.lock();
try {
return this.origin.screenshot(url);
} finally {
this.lock.unlock();
}
}
@Override
public void close() throws Exception {
this.origin.close();
}
}
We can use LockBasedBrowser
to implement another lock based Browser
implementation on top of an unsafe one. As an example, Selenium (a library similar to Playwright) is also not thread-safe by default.
Furthermore, the pattern used to compose objects is the alias pattern.
According to this implementation, all the screenshot requests will be serialized because of the lock. This will impact performance.
We’ll do three benchmarks with siege.
$ siege -c 25 -r 10 -b -H 'Accept:image/png' 'http://localhost:8080/screenshot?url=http%3A%2F%2Flocalhost%3A8080'
Transactions: 250 hits
Availability: 100.00 %
Elapsed time: 28.61 secs
Data transferred: 67.19 MB
Response time: 2.72 secs
Transaction rate: 8.74 trans/sec
Throughput: 2.35 MB/sec
Concurrency: 23.80
Successful transactions: 250
Failed transactions: 0
Longest transaction: 2.94
Shortest transaction: 0.20
$ siege -c 50 -r 10 -b -H 'Accept:image/png' 'http://localhost:8080/screenshot?url=http%3A%2F%2Flocalhost%3A8080'
Transactions: 500 hits
Availability: 100.00 %
Elapsed time: 57.03 secs
Data transferred: 134.37 MB
Response time: 5.42 secs
Transaction rate: 8.77 trans/sec
Throughput: 2.36 MB/sec
Concurrency: 47.50
Successful transactions: 500
Failed transactions: 0
Longest transaction: 5.83
Shortest transaction: 0.18
$ siege -c 100 -r 10 -b -H 'Accept:image/png' 'http://localhost:8080/screenshot?url=http%3A%2F%2Flocalhost%3A8080'
Transactions: 1000 hits
Availability: 100.00 %
Elapsed time: 115.19 secs
Data transferred: 268.74 MB
Response time: 10.95 secs
Transaction rate: 8.68 trans/sec
Throughput: 2.33 MB/sec
Concurrency: 95.08
Successful transactions: 1000
Failed transactions: 0
Longest transaction: 11.67
Shortest transaction: 0.18
The siege -c parameter sets concurrent users. While the -r parameter sets how many requests each user will make.
The throughput is pretty stable because that’s how much the single instance can handle. And, as a consequence, by increasing concurrent users, the response time increases. The behavior of this implementation is as we expected.
The next solution is to try to overcome the single thread limit in a naive way. On each request, it will create a new browser instance. This implementation composes UnsafePlaywright
with the general-purpose OnDemandBrowser
.
public final class OnDemandPlaywright implements Browser {
private final Browser origin;
public OnDemandPlaywright() {
this(Playwright::chromium);
}
public OnDemandPlaywright(final Function<Playwright, BrowserType> browserTypeFn) {
this(
new OnDemandBrowser(
() -> new UnsafePlaywright(browserTypeFn)
)
);
}
OnDemandPlaywright(final Browser origin) {
this.origin = origin;
}
@Override
public Screenshot screenshot(final URL url) {
return this.origin.screenshot(url);
}
@Override
public void close() throws Exception {
this.origin.close();
}
}
final class OnDemandBrowser implements Browser {
private final Supplier<Browser> browserSupplier;
OnDemandBrowser(final Supplier<Browser> browserSupplier) {
this.browserSupplier = browserSupplier;
}
@Override
public Screenshot screenshot(final URL url) {
return this.browserSupplier.get().screenshot(url);
}
@Override
public void close() throws Exception {
}
}
According to this implementation, all the screenshot requests will be parallelized. On the surface, this seems fine. In fact, this approach is the worst. This is because each request creates a new UnsafePlaywright
instance, so browser objects. This means the process will run out of memory in case of requests spike. Furthermore, we'll also delay each request because of the UnsafePlaywright
instantiation process.
The following benchmarks express the weaknesses of this approach. They also impact in case of a few requests.
$ siege -c 25 -r 10 -b -H 'Accept:image/png' 'http://localhost:8080/screenshot?url=http%3A%2F%2Flocalhost%3A8080'
Transactions: 250 hits
Availability: 100.00 %
Elapsed time: 68.59 secs
Data transferred: 67.29 MB
Response time: 6.56 secs
Transaction rate: 3.64 trans/sec
Throughput: 0.98 MB/sec
Concurrency: 23.91
Successful transactions: 250
Failed transactions: 0
Longest transaction: 9.28
Shortest transaction: 3.94
The previous two approaches have both pros and cons.
|
Pros |
Cons |
---|---|---|
Lock based solution |
Pre-instantiated Playwright objects reused across requests |
Serialized requests handling |
|
Predictable resource consumption |
|
On-demand solution |
Parallelized requests handling |
Delayed requests due Playwright instantiation process |
|
|
Unpredictable resource consumption |
Well, it seems they compensate each other. And this suggests to us another approach that mitigates weakness and boosts strengths. Indeed, we can say that a set of pre-instantiated Browser
objects consume a predictable amount of memory. And at the same time, they can parallelize a fixed number of requests.
The aforesaid ideas translate to a pool of Browser
objects.
In the following implementation, we’ll compose the general-purpose PoolBrowser
and LockBasedPlaywright
. We used the latter because Browser
objects in the pool are shared among the requests. So, we need thread-safety.
public final class PoolPlaywright implements Browser {
private final Browser origin;
public PoolPlaywright() {
this(8);
}
public PoolPlaywright(final Integer size) {
this(
size,
Playwright::chromium
);
}
public PoolPlaywright(final Integer size, final Function<Playwright, BrowserType> browserTypeFn) {
this(
() -> new LockBasedPlaywright(browserTypeFn),
size
);
}
PoolPlaywright(final Supplier<Browser> browserSupplier, final Integer size) {
this(
new PoolBrowser(
browserSupplier,
size
)
);
}
PoolPlaywright(final Browser origin) {
this.origin = origin;
}
@Override
public Screenshot screenshot(final URL url) {
return this.origin.screenshot(url);
}
@Override
public void close() throws Exception {
this.origin.close();
}
}
final class PoolBrowser implements Browser {
private final List<Browser> pool;
private final AtomicInteger last;
PoolBrowser(final Supplier<Browser> browserSupplier, final Integer size) {
this(
x -> browserSupplier.get(),
size
);
}
PoolBrowser(final IntFunction<Browser> browserFn, final Integer size) {
this(
IntStream.range(0, size)
.mapToObj(browserFn)
.toList(),
new AtomicInteger(-1)
);
}
PoolBrowser(final List<Browser> pool, final AtomicInteger last) {
this.pool = pool;
this.last = last;
}
@Override
public Screenshot screenshot(final URL url) {
return this.browser().screenshot(url);
}
private Browser browser() {
var max = this.pool.size();
return this.pool.get(
this.last.accumulateAndGet(0, (left, right) -> left + 1 < max ? left + 1 : 0)
);
}
@Override
public void close() throws Exception {
for (var browser : this.pool) {
browser.close();
}
}
}
This solution seems to resolve all our issues, but it’s not true. Indeed, in case of requests spike, we cannot predict how much time a request will wait before processing. That’s because each request will be processed by a single LockBasedBrowser
instance. And each instance will serialize the requests it will process. So, in case of requests spike, there will be too much contention on the single instance. It’s like we have an unbounded queue of requests per single Browser
instance. So, to resolve this issue, we can integrate a bound. In this way, we can guarantee we can process, at most, a fixed number of requests. And this is useful because we can predict better how much time a screenshot will take. Overall, we are improving the software's predictability and quality.
We have many choices to implement the bounded behavior. One of them is to use a Semaphore
to limit access to a single LockBasedBrowser
instance.
public final class SemaphorePlaywright implements Browser {
private final Browser origin;
public SemaphorePlaywright() {
this(Playwright::chromium);
}
public SemaphorePlaywright(final Function<Playwright, BrowserType> browserTypeFn) {
this(
browserTypeFn,
32
);
}
public SemaphorePlaywright(final Function<Playwright, BrowserType> browserTypeFn, final Integer maxRequests) {
this(
new SemaphoreBrowser(
new LockBasedPlaywright(browserTypeFn),
maxRequests
)
);
}
SemaphorePlaywright(final Browser origin) {
this.origin = origin;
}
@Override
public Screenshot screenshot(final URL url) {
return this.origin.screenshot(url);
}
@Override
public void close() throws Exception {
this.origin.close();
}
}
final class SemaphoreBrowser implements Browser {
private final Browser origin;
private final Semaphore semaphore;
SemaphoreBrowser(final Browser origin, final Integer maxRequests) {
this(
origin,
new Semaphore(maxRequests)
);
}
SemaphoreBrowser(final Browser origin, final Semaphore semaphore) {
this.origin = origin;
this.semaphore = semaphore;
}
@Override
public Screenshot screenshot(final URL url) {
if (this.semaphore.tryAcquire()) {
try {
return this.origin.screenshot(url);
} finally {
this.semaphore.release();
}
}
throw new IllegalStateException("Unable to screenshot. Maximum number of requests reached");
}
@Override
public void close() throws Exception {
this.origin.close();
}
}
Finally, we need to compose the last two Browser
implementations.
public final class PoolSemaphorePlaywright implements Browser {
private final Browser origin;
public PoolSemaphorePlaywright() {
this(
8,
32
);
}
public PoolSemaphorePlaywright(final Integer poolSize, final Integer maxRequestsPerBrowsr) {
this(
poolSize,
maxRequestsPerBrowsr,
Playwright::chromium
);
}
public PoolSemaphorePlaywright(
final Integer poolSize,
final Integer maxRequestsPerBrowser,
final Function<Playwright, BrowserType> browserTypeFn
) {
this(
new PoolPlaywright(
() -> new SemaphorePlaywright(browserTypeFn, maxRequestsPerBrowser),
poolSize
)
);
}
PoolSemaphorePlaywright(final Browser origin) {
this.origin = origin;
}
@Override
public Screenshot screenshot(final URL url) {
return this.origin.screenshot(url);
}
@Override
public void close() throws Exception {
this.origin.close();
}
}
In this implementation, each browser in the pool has a semaphore. We can also compose the other way around. This means a pool having a single semaphore.
The following three benchmarks are about the pool-based approach. The pool size is two.
$ siege -c 25 -r 10 -b -H 'Accept:image/png' 'http://localhost:8080/screenshot?url=http%3A%2F%2Flocalhost%3A8080'
Transactions: 250 hits
Availability: 100.00 %
Elapsed time: 14.73 secs
Data transferred: 67.19 MB
Response time: 1.40 secs
Transaction rate: 16.97 trans/sec
Throughput: 4.56 MB/sec
Concurrency: 23.81
Successful transactions: 250
Failed transactions: 0
Longest transaction: 1.60
Shortest transaction: 0.19
$ siege -c 50 -r 10 -b -H 'Accept:image/png' 'http://localhost:8080/screenshot?url=http%3A%2F%2Flocalhost%3A8080'
Transactions: 500 hits
Availability: 100.00 %
Elapsed time: 29.38 secs
Data transferred: 134.37 MB
Response time: 2.80 secs
Transaction rate: 17.02 trans/sec
Throughput: 4.57 MB/sec
Concurrency: 47.58
Successful transactions: 500
Failed transactions: 0
Longest transaction: 3.01
Shortest transaction: 0.17
$ siege -c 100 -r 10 -b -H 'Accept:image/png' 'http://localhost:8080/screenshot?url=http%3A%2F%2Flocalhost%3A8080'
Transactions: 1000 hits
Availability: 100.00 %
Elapsed time: 58.90 secs
Data transferred: 268.74 MB
Response time: 5.60 secs
Transaction rate: 16.98 trans/sec
Throughput: 4.56 MB/sec
Concurrency: 95.08
Successful transactions: 1000
Failed transactions: 0
Longest transaction: 6.01
Shortest transaction: 0.18
Its behavior is pretty stable. But with too many requests, the response time will start to degrade. It’s important to note we have doubled the throughput compared to the lock-based solution. This implementation worked as expected.
At this point, we can benchmark the pool and semaphore approach. The pool size is two, and the maximum number of requests bound is thirty-two. This means we can handle sixty-four requests in parallel. After reaching this limit, the service will return an unsuccessful response.
$ siege -c 25 -r 10 -b -H 'Accept:image/png' 'http://localhost:8080/screenshot?url=http%3A%2F%2Flocalhost%3A8080'
Transactions: 250 hits
Availability: 100.00 %
Elapsed time: 15.43 secs
Data transferred: 67.19 MB
Response time: 1.46 secs
Transaction rate: 16.20 trans/sec
Throughput: 4.35 MB/sec
Concurrency: 23.71
Successful transactions: 250
Failed transactions: 0
Longest transaction: 2.09
Shortest transaction: 0.60
$ siege -c 50 -r 10 -b -H 'Accept:image/png' 'http://localhost:8080/screenshot?url=http%3A%2F%2Flocalhost%3A8080'
Transactions: 500 hits
Availability: 100.00 %
Elapsed time: 29.33 secs
Data transferred: 134.37 MB
Response time: 2.79 secs
Transaction rate: 17.05 trans/sec
Throughput: 4.58 MB/sec
Concurrency: 47.52
Successful transactions: 500
Failed transactions: 0
Longest transaction: 3.03
Shortest transaction: 0.20
$ siege -c 100 -r 10 -b -H 'Accept:image/png' 'http://localhost:8080/screenshot?url=http%3A%2F%2Flocalhost%3A8080'
Transactions: 626 hits
Availability: 62.60 %
Elapsed time: 37.17 secs
Data transferred: 168.92 MB
Response time: 3.57 secs
Transaction rate: 16.84 trans/sec
Throughput: 4.54 MB/sec
Concurrency: 60.14
Successful transactions: 626
Failed transactions: 374
Longest transaction: 3.97
Shortest transaction: 0.00
As expected, with this solution, we didn't improve performance but predictability. Indeed, the throughput is comparable to the previous solution. At the same time, the response time didn't degrade too much in case of spike. Furthermore, the concurrency value in the last benchmark reflects the maximum parallelizable requests. The price to pay to have greater predictability is a lower availability value (i.e., successful responses).
Another option to implement the last solution is to use the event loop pattern. Where each loop enqueues the screenshot request, then each instance in the pool dequeues a request and fulfills it.
We can consider ourselves satisfied. We overcame the single-thread limit by reaching a good level of performance. We also implemented and compared many solutions.
For simplicity’s sake, we omitted a few features required in a production-ready system. They are timeouts, asynchronous requests, exceptions handling, and a self-healing Browser
implementation. Indeed, sometimes, Playwright objects fail. And the objective of a self-healing implementation is to restore them.
To conclude, OOP has a bad reputation in terms of performance. It’s like a rule of thumb, but it’s not true. With proper objects, we can achieve great performance by not penalizing code elegance and readability. This is not at all obvious, and it’s a great gift of OOP.