Gorilla: efficient encoding of time series
In order to collect, store, and process measurements from its infrastructure, Facebook has designed a system called Gorillai. Among its innovations is efficient, lossless data encoding.
- A predictable world
All computer compression techniques are based on the idea that the data to be encoded is not completely random: it has a structure that can be exploited to provide an efficient representation. When measuring the temperature of a processor or the RAM usage of a virtual machine, this data changes continuously, but often in a predictable way. For example, temperature only changes within the limits allowed by physics and control systems.
In Gorilla, each sequence of measurements (called a time series) is encoded separately from the others. Dates and values are also encoded separately. In both cases, encoding is based on the last one or two recorded values, which limits the complexity of the encoder. This approach is appropriate for encoding measurements on the fly.
- Date encoding
Facebook engineers noticed that the vast majority (96.4%) of data arrives at perfectly regular intervals. When time shifts occur, they are usually minimal.
For example, a measurement may be taken every 60 seconds, with occasional delays or advances of one second. As a result, rather than encoding entire dates, delta-of-delta dates are recorded.

The first date in a time series is encoded using 14 bits. Subsequently, for a date tn, we calculate d1= tn – tn-1 and d2 = tn-1- tn-2. The encoded value is then the difference between d1 and d2. In other words, only the difference between the actual variation and that predicted by the two previous values is encoded. The encoding itself uses a variant of zig-zag ii, which requires fewer bits for small values than for large ones.
- Value encoding
The values themselves tend to change relatively slowly. In addition, some of the values are, in fact, integers. In both cases, this means that most of the bits representing the values do not change from one measurement to the next. Furthermore, these unchanged bits are located at the beginning and end of the binary representation of the value, while the changed bits (known as significant bits) are located in the middle of the representation.

Unchanged values are encoded with a single ‘0’ bit. For the others, we determine whether the range of significant bits of the current value is included in that of the previous value. If so, the value is encoded with ‘10’, followed by the significant bits. Finally, if the range is not encompassed by the previous one, Gorilla encodes ‘11’, followed by the number of zeros preceding the range (on 5 bits), the size of the range (on 6 bits), and finally the significant bits.
- Effectiveness and adoption of encoding
Facebook reports an average encoding of 1.37 bytes per measurement, which is a twelvefold reduction in storage volume compared to naive encoding. In addition to reducing operational costs, this compression significantly improves performance during playback.
The generic nature of the algorithm makes it a good candidate for implementation in a software library. Variants of the algorithm have also been implemented in the Prometheus and VictoriaMetrics databases, which specialize in time series.
- Conclusion
Gorilla’s encoding algorithm enabled Facebook engineers to meet the needs of both the social network’s telemetry system and the analysts who use it. The simple and effective approach proved interesting enough to be adopted in other time series management systems. The refinements made since the algorithm was published demonstrate the interest it has generated.
Source : Mathieu Goeminne, https://mgoeminne.com/2025/10/04/gorilla-un-encodage-efficace-des-series-temporelles/