From JSON to FlatBuffers: Enhancing Performance in Data Serialization

FlatBuffers outperforms JSON and Protobuf in speed and memory efficiency, making it ideal for resource-constrained devices and high-performance applications.

Ilia Ivankin

Jun. 28, 24 · Review

Like (2)

Save

6.8K Views

A client approached us with a three-month timeline for launching an MVP to be tested by real users. Our task was to develop a relatively straightforward backend for a mobile application. From the outset, the client provided detailed requirements, specifications, and integration modules. The primary goal was to collect data from the mobile application, review it, and send it to the specified integrations. Essentially, our role was to be a validating proxy service that recorded events.

What’s the usual challenge we face? It’s either cranking out a quick microservice or a combo of services that’ll catch requests from the app. Most of the time, our clients are rocking top-notch gear and flagship devices.

But What if Our Case Is

A feeble AWS cluster that needs to squeeze in over ten logic services plus monitoring.
Our phones are like unique Android gadgets with no more than 4GB RAM, often tablets.
We’re frequently shooting snapshots from the app to the backend.
We must validate a chunk of data before pushing it further down the business flow.

So, let's start with a simple docs example which we should process:

    JSON
   
 

   {
  "docs": {
    "name": "name_for_documents",
    "department": {
      "code": "uuid_code",
      "time": 123123123,
      "employee": {
        "name": "Ivan",
        "surname": "Polich",
        "code": "uuidv4"
      }
    },
    "price": {
      "categoryA": "1.0",
      "categoryB": "2.0",
      "categoryC": "3.0"
    },
    "owner": {
      "uuid": "uuid",
      "secret": "dsfdwr32fd0fdspsod"
    },
    "data": {
      "transaction": {
        "type": "CODE",
        "uuid": "df23erd0sfods0fw",
        "pointCode": "01"
      }
    },
    "delivery": {
      "company": "TTC",
      "address": {
        "code": "01",
        "country": "uk",
        "street": "Main avenue",
        "apartment": "1A"
      }
    },
    "goods": [
      {
        "name": "toaster v12",
        "amount": 15,
        "code": "12312reds12313e1"
      }
    ]
  }
}
  

For instance, we have a compact service with just two methods:

Save docs and validate department code, delivery company, and address.
Find all with limit/offset pagination.

MVP V1: REST and JSON

I decided to create a new service with gin and nothing else. As a good example, “Golang RESTful API”: click here.

    Go
   
 

   const (
 post = "/report"
 get  = "/reports"
 TTL  = 5
)

func main() {
 router := gin.Default()
 p := ginprometheus.NewPrometheus("gin")
 p.Use(router)

 sv := service.NewReportService()
 gw := middle.NewHttpGateway(*sv)

 router.POST(post, gw.Save)
 router.GET(get, gw.Find)

 srv := &http.Server{
  Addr:    "localhost:8080",
  Handler: router,
 }
}
  

And started benchmark tests.

    Go
   
 

   // BenchmarkCreateAndMarshal-10       168706       7045 ns/op
func BenchmarkCreateAndMarshal(b *testing.B) {
 for i := 0; i < b.N; i++ {
  doc := createDoc()
  _ = doc.Docs.Name // for tests

  bt, err := json.Marshal(doc)
  if err != nil {
   log.Fatal("parse error")
  }

  parsedDoc := new(m.Document)
  if json.Unmarshal(bt, parsedDoc) != nil {
   log.Fatal("parse error")
  }
  _ = parsedDoc.Docs.Name
 }
}
  

This code represents a benchmark for the `BenchmarkCreateAndMarshal` function, measuring the performance of create and marshal operations.

BenchmarkCreateAndMarshal-10: This is the output line provided by the Go testing tool.
168706: This is the number of iterations that were executed during the test.
7045 ns/op: This is the average time taken for one iteration in nanoseconds. Here, ns/op stands for nanoseconds per operation.

Thus, the result indicates that the BenchmarkCreateAndMarshal. The function executes at approximately 7045 nanoseconds per operation over 168706 iterations.

This is where we began our journey, and now we are considering the first key point on our path. Did it suffice to launch? The answer is yes! But for how long? The answer is no. From here, a new branch of our exploration opens up. Why add memory when we can use some processes more efficiently? Yes, we're talking about serialization, and the second chapter begins, significantly speeding up our processing.

MVP v2: gRPC and Protobuf

Protocol Buffers require the deserialization of data before it can be used, meaning that data must be unpacked into objects before access. This requires additional time and memory to create objects. Protocol buffers also support many languages, including C++, Java, Python, Go, Ruby, Objective-C, C#, and Dart. The support might be more comprehensive for some languages.

Protocol Buffers actively support schema evolution, allowing new fields to be added and maintained using optional and required fields. This makes Protocol Buffers more convenient for long-term projects with changing requirements. It also uses a compact binary format but may include additional metadata, which can slightly increase data size. Good performance for both writing and reading data, but with extra overhead from deserialization.

gRPC provides more efficient and compact binary communication compared to the textual nature of HTTP.

Type: Oriented towards transferring binary data and structured messages.
Protocol: Supports state and duplex communication.
Data Format: Protocol Buffers (protobuf) — a binary data serialization format.
Transport: Uses HTTP/2 as the transport protocol.

Here's an easy example. Let’s write a file example.proto:

    ProtoBuf
   
   syntax = "proto3";

message Person {
 required string name = 1;
 required int32 id = 2;
 optional string email = 3;
}

Each field will be represented as a tagged element when this object is serialized into binary format. In this case, the tags are the numbers 1, 2, and 3. After serialization, the binary data stream might look something like this (in a simplified form):

    Plain Text
   
   08 4A 6F 68 6E 20 44 6F 65 
10 7B 
1A 14 6A 6F 68 6E 40 65 78 61 6D 70 6C 65 2E 63 6F 6D

08 represents tag 1 (the name field), followed by the field’s length.
4A 6F 68 6E 20 44 6F 65 represents the ASCII codes for the string “John Doe.”
10 represents tag 2 (the id field), followed by the value 123 in variable-length encoding.
1A represents tag 3 (the email field), followed by the string length 20 and the ASCII codes for the string “john@example.com.”

And now, write some tests:

    Go
   
 

   // BenchmarkCreateAndMarshal-10       651063       1827 ns/op
func BenchmarkCreateAndMarshal(b *testing.B) {
 for i := 0; i < b.N; i++ {
  doc := CreateDoc()
  _ = doc.GetName()
  r, e := proto.Marshal(&doc)
  if e != nil {
   log.Fatal("problem with marshal")
  }

  nd := new(docs.Document)
  if proto.Unmarshal(r, nd) != nil {
   log.Fatal("problem with unmarshal")
  }
  _ = nd.GetName()
 }
}
  

This code represents a benchmark named BenchmarkCreateAndMarshal, which measures the performance of creating and marshaling operations. The results show that, on average, the benchmark performs these operations in 1827 nanoseconds per iteration over 651063 iterations.

MVP v3: FlatBuffers

FlatBuffers is an efficient data serialization library developed by Google that allows objects to be serialized into a compact binary format and allows very fast data access without the need for deserialization. The main features and operation of FlatBuffers include the following aspects:

Speed: Fast access to serialized data without the need for prior deserialization.
Memory: Minimal storage overhead due to compact binary format.

Zero-copy access - the key. FlatBuffers allows direct access to serialized data without the need for deserialization. This provides very fast data access and reduces overhead from deserialization. Since data does not need to be copied or unpacked, this also reduces memory usage. It stores data in a compact binary format without additional metadata, which can result in smaller data sizes in some cases.

Firstly, Flatbuffers protocol looks like protobuf. Let's try to create a schema (person.fbs):

    Plain Text
   
   // person.fbs
namespace Example;

table Person {
  id: int;
  name: string;
  age: int;
}

root_type Person;

Indeed, let’s represent the serialized bytes in hexadecimal format for the given Person structure:

    Plain Text
   
 

   // Serialized bytes (hexadecimal representation)
// (assuming little-endian byte order)
1B 00 00 00    // Data size (including this byte)
7B 00 00 00    // ID (123 in little-endian byte order)
09 00 00 00    // Name string length (including null-terminator)
4A 6F 68 6E    // Name ("John" in ASCII, including null-terminator)
20 00 00 00    // Age (30 in little-endian byte order)
  

In this example:

The first 4 bytes represent the data size, including this byte. In this case, the size is 27 bytes (0x1B).
The following 4 bytes represent the id (123 in little-endian byte order).
Following that, 4 bytes represent the length of the name string (9 bytes).
The subsequent 9 bytes represent the name string “John Doe,” including the null-terminator.
The last 4 bytes represent the age (30 in little-endian byte order).

    Go
   
 

   // BenchmarkCreateAndMarshalBuilderPool-10      1681384        711.2 ns/op
func BenchmarkCreateAndMarshalBuilderPool(b *testing.B) {
 builderPool := builder.NewBuilderPool(100)

 for i := 0; i < b.N; i++ {
  currentBuilder := builderPool.Get()

  buf := BuildDocs(currentBuilder)
  doc := sample.GetRootAsDocument(buf, 0)
  _ = doc.Name()

  sb := doc.Table().Bytes
  cd := sample.GetRootAsDocument(sb, 0)
  _ = cd.Name()

  builderPool.Put(currentBuilder)
 }
}
  

Since we’re in the “do-it-yourself optimization” mode, I decided to whip up a small pool of builders that I clear after use. This way, we can recycle them without repeatedly allocating memory. It’s a bit like having a toolkit that we tidy up after each use — it keeps things tidy and efficient. Why waste resources on creating new builders when we can repurpose the ones we’ve got?

Time To Check Results

Now, let’s dive into the results of our tests, and here’s what we see:

protocol	iterations	speed
json	168706	7045 ns/op
proto	651063	1827 ns/op
flat	1681384	711.2 ns/op

Flat is the speed monster here, leaving the others in the dust by a factor of T. The numbers don’t lie, and it seems like our DIY optimization is paying off big time!

Conclusion

Use FlatBuffers if you need to save memory on the device and can wait a bit for processing on the server. It stands out among others, demonstrating significantly lower execution time — around 711.2 nanoseconds per operation in the same stress test.

If we need to save memory and use HTTP/2, we can use Protobuf. It demonstrates high efficiency, surpassing JSON, with an execution time of about 1827 nanoseconds per operation in the same test.

JSON: In stress tests of the save method with a load of 1000 requests per second, JSON demonstrates stable results, with an execution time of approximately 7045 nanoseconds per operation. It's slow but still good for common issues.

These results emphasize that FlatBuffers provides a significant performance advantage over JSON and Protobuf. Despite requiring more complex training and usage, its real efficiency underscores that investments in performance optimization can pay off in the long run.

Code samples and tests can be found here.

FlatBuffers JSON Serialization Performance

Published at DZone with permission of Ilia Ivankin. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending