Use Feature Toggle to Safely Release Updates On Your Server

Often, developers create technical solutions using whatever is available to get the functionality up and running. Especially when it is necessary to test some product hypothesis and it doesn't make much sence to spend a lot of time and resources on it. After the hypothesis is confirmed, an even more interesting time begins - the search for tradeoffs: you need to release features quickly, adapting to the users number growth.

It is very handy to be able to change the behavior of the service at any time without restarting the servers, or to turn off a broken functionality or to conduct AB-testing. To do this, developers usually use Feature Toggles. If your startup has not a giant, but a small group of enthusiasts - you will have to solve this problem using available tools. In this article, I will show you a simple way of how to implement Feature Toggle using a shared SQL database.

Description of a typical system

No matter how much effort is put into the development of fault-tolerant self-sustainable systems, manual interventions are inevitable:

fixing data inconsistencies,
triggering tasks,
recovering data after an outage,
backfilling or migrations,
displaying business operations and related data.

If a problem could be fixed easily, a developer can adjust data directly in the database, or write a bash/python/whatever script. We usually avoid direct accessing the data, but circumstances could make you do it. But there is a more reliable way - an application with internal use - Administration Panel.

Benefits of having Admin Panel:

authorized access;
logging and auditing all the actions done by a user;
its code usually reviewed and tested;
accessible not only by users, but by tech support.

Often admin panel is designed to access database of business-logic servers. This approach has some disadvantages, but is a good tradeoff: providing an acceptable level of security through simple implementation. Thus, most probably, you can use the main database to store the settings, including the feature-toggle and access them from your servers.

Get your hands dirty with code

Simple toggle

The simplest use case is a two position toggle - either enable for all or disable for all.

private final FeatureToggleService featureToggleService;

public void businessLogic() {
   if (featureToggleService.isFeatureEnabled(IMPORTANT_FEATURE)) {
       doImportantFeatureStuff();
   } else {
       log.info("Feature disabled {}", IMPORTANT_FEATURE);
   }
}

If I implemented it in memory, I'd check if there is a key in a hash set. If there is one - feature is enabled, otherwise - disabled. In an SQL database there could be a table, which contains the names of the features - that's all what we need here. We can query the keys and that's how it could look like:

public class FeatureToggleService {

   private static final String STATE_QUERY = "SELECT id FROM features WHERE name = (?)";

   private final JdbcOperations jdbcOperations;

   private final LoadingCache<String, Boolean> toggleStateCache = CacheBuilder.newBuilder()
           .expireAfterWrite(60, TimeUnit.SECONDS)
           .build(new CacheLoader<>() {
               public Boolean load(String featureName) {
                   return jdbcOperations.queryForList(STATE_QUERY, String.class, featureName)
                              .size() > 0;
               }
           });

   public boolean isFeatureEnabled(String featureName) {
       boolean state = toggleStateCache.getUnchecked(featureName);
       return Boolean.TRUE.equals(state);
   }
}

In the example above, I used a cache from Google Guava. It’s a simple local cache, but in 99% of cases it will probably meet your requirements. Feature are toggled infrequently, there is no point in making requests often. I chose 60 seconds as an example - but it’s a good trade off, not too many requests and not too long switching.

User specific toggle

A slightly more complicated case. You want to test functionality in production, but you don't want it to be available to all users. Instead, you want to enable the feature only for a set of users,

who are considered to be testers.

private final FeatureToggleService featureToggleService;

public void businessLogic(LoggedUser loggedUser) {
   if (featureToggleService.isFeatureEnabled(IMPORTANT_FEATURE, loggedUser.id())) {
       doImportantFeatureStuff();
   } else {
       log.info("Feature disabled {}", IMPORTANT_FEATURE);
   }
}

The third state of the feature has appeared - testing. So states are:

Enabled for all
Disabled for all
Enabled for testers

If feature is enabled for testing - you should check if current user is a tester. In other words whether user is in a list of persons who should have access to the feature.

The database will require a second column that will store the state of the feature (it could be a number or a meaningful name). And you will also need to store a list of testers somewhere, for example, it could be another table in the database.

And the implementation could be:

public class FeatureToggleService {

   private static final String STATE_QUERY =
            "SELECT enabled_for FROM features WHERE name = (?)";
   private static final String TESTERS_QUERY = "SELECT id FROM test_users";
   private final JdbcOperations jdbcOperations;

   private final Supplier<Set<Long>> testUsersSupplier =
           memoizeWithExpiration(this::extractTesters, 60, TimeUnit.SECONDS);

   private final LoadingCache<String, String> toggleStateCache = CacheBuilder.newBuilder()
           .expireAfterWrite(60, TimeUnit.SECONDS)
           .build(new CacheLoader<>() {
               public String load(String featureName) {
                   return jdbcOperations.queryForList(STATE_QUERY, String.class, featureName)
                           .stream().findFirst().orElse("NONE");
               }
           });

   public boolean isFeatureEnabled(String featureName, long userId) {
       String state = toggleStateCache.getUnchecked(featureName);
       return "ALL".equals(state) ||
               ("TEST".equals(state) && testUsersSupplier.get().contains(userId));
   }

   private Set<Long> extractTesters() {
       return Set.copyOf(jdbcOperations.queryForList(TESTERS_QUERY, Long.class));
   }
}

First we fetch state of the feature by it’s name (either from database or cache), then if it’s ALL consider it enabled for all, or if it’s TEST - consider it only enabled for testers. If so we fetch list of testers (either from database or from cache) and check if current user is in the list. If none of above - consider feature disabled for all.

What could go wrong with the implementations above

There is one un-obvious problem with local cache which you should be aware of:

09:45:27 you’re setting feature enabled for all
09:45:47 request 1 has come to server 1 and feature worked as enabled
09:45:51 request 2 has come to server 2 and feature didn’t work as disabled

Why is that so? Local caches don’t know anything about other ot servers, in most cases it could not be a problem at all. Otherwise, you should do one of following:

continue using local cache but evict it by a notification (using a pub sub system or broadcast messages);
use a distributed cache (eCache or similar) and evict it on each change;
use an in-memory databases (Redis, Zookeeper or similar) to store features and provide low-cost access to it;
use a paid service which provides this functionality (there are a number of them, no adds and preferences here);
come up with your own brilliant idea, there could be a plenty of them :)

Conclusion

Many developers think that it’s cool to use a fancy framework, but you need to assess if it's really justified. It's often easier to write a simple module that satisfies specific needs of your application, develop it your own way. But here you need to feel the boundary - writing your own HashMap from scratch does not make any sense.