Nowadays, every project collects analytics data. Therefore, it can be easy to understand users and their needs based on the data. For example, one of the everyday tasks in this area is to count unique visits to web pages. Let’s imagine that a popular media resource is being developed. The website traffic is approximately equal to 500 million unique visitors per day. And there is a task to cache the number of visits of each page in Redis with the ability to write/read as fast as possible and obtain general statistics for multiple pages. An IP address identifies each unique visit. Redis Sets At first, you can use the built-in Sets data structure in Redis. "Sets" is a data structure with unique values and useful functions to count intersections. 127.0.0.1:6379> sadd page:1 113.145.236.211 159.54.101.236 207.47.30.26 (integer) 3 127.0.0.1:6379> sadd page:2 113.145.236.211 36.186.119.48 (integer) 2 127.0.0.1:6379> sinter page:1 page:2 1) "113.145.236.211" The Sets data structure seems to be an excellent solution for the case. But it’s not. Redis Sets can be used only in small or medium projects. Considering that the task includes 500 million visits per day, the resource is under a high load. To store all the data in Sets, you need a lot of RAM. Also, Redis would consume a huge amount of time to process millions of items. Redis HyperLogLog Fortunately, Redis has the HyperLogLog data structure to store many unique events, and it takes up a constant amount of memory. In addition, HyperLogLog is a probabilistic structure, which means that with a large data set, the count of the number of elements can have an error of up to 0.81%. Data writing To write data to HyperLogLog, use the command: pfadd key [element [element ...]] 127.0.0.1:6379> pfadd page:1 158.58.0.86 148.240.139.178 74.81.90.212 33.244.76.56 23.83.156.65 (integer) 1 127.0.0.1:6379> pfadd page:2 41.64.240.230 243.171.182.196 74.81.90.212 33.244.76.56 23.83.156.65 (integer) 1 127.0.0.1:6379> pfadd page:3 158.58.0.86 148.240.139.178 74.81.90.212 225.109.160.131 85.83.185.103 (integer) 1 If new values are successfully written, one is returned. However, if you try to insert an existing value, 0 will be returned: 127.0.0.1:6379> pfadd page:1 158.58.0.86 (integer) 0 Data reading To get the number of unique visitors, use the command: pfcount key [key ...] 127.0.0.1:6379> pfcount page:1 (integer) 5 127.0.0.1:6379> pfcount page:2 (integer) 5 127.0.0.1:6379> pfcount page:3 (integer) 5 You can calculate the number of unique visitors to several pages with the command: pfmerge destkey sourcekey [sourcekey ...] 127.0.0.1:6379> pfmerge pages page:1 page:2 page:3 OK 127.0.0.1:6379> pfcount pages (integer) 9 The command merges several HyperLogLog keys into a single one. The merge result has been stored in the key. pfmerge pages Conclusion Use Redis Sets to count unique events, but not when there is a lot of data. Redis HyperLogLog is a probabilistic data structure that efficiently stores and reads a large number of unique events To add data to HyperLogLog, use the command pfadd The command can calculate the HyperLogLog cardinality pfcount User can merge multiple HyperLogLog structures into a single one by pfmerge