If you found this post useful, how about one of these resources to read next? When they're not, that's a more difficult assignment but the tips shared here will hopefully set you on a road to success. When the consumers of the data are known, it's easier to plan for their context and likely use cases. With the size limitations on the payloads supported by Apache Kafka, it's important to only include fields that can justify their own inclusion. ISO 8601 format including timezone information, I should not have to know where on the planet on which day of the year this payload was created.If I could make rules, I'd make rules about timestamp formats! The only acceptable formats are: Also since using Apache Kafka allows additional consumers to reprocess records later, a timestamp can give a handy insight into progress through an existing data set. However it can also be useful to include your own timestamps for some situations, such as when the data is gathered at a different time to when it is published, or when a retry implementation is needed. Kafka will add a publish time in the header of a message. If it's available for yours, it's worth a look. It achieves similar goals by generating code to use in your own application, making it available on fewer tech stacks. There are other alternatives, notably Protocol Buffers, known as ProtoBuf. Mostly they send sensor_reading events, but they can also send alarm events, that are like a paper jam in the printer but on a factory scale! Using a key like this will give us a LOT of data on one partition: Keep in mind that the contents of each partition will be processed in order, so it still makes sense to keep logical groupings of data.įor example, consider a collection of imaginary factories where all the machines can send events. When this happens, try adding more fields to give more granular partition routing. If the key you're using doesn't vary much, your events can get bunched into a small number of partitions (rather than spread out). For a lot of unrelated events in a stream, this makes good use of your resources. If the key isn't set, then the data will be spread evenly across the partitions using a round-robin approach. The key usually defines which partition is used. When a producer sends data to Kafka, it specifies which topic it should be sent to. The keys in Apache Kafka typically do get more attention than the headers, but we should still make sure we are using them as a force for good. Being able to access just a couple of fields while keeping the system moving can help performance, too. Also consider that, for larger payloads, the overhead of deserializing can be non-trivial. In secure systems, intermediate components may not have access to the whole payload, so putting the data in the header can expose just the appropriate fields there. It can also be useful to duplicate some of the fields from the payload itself, if they are used for routing or filtering the data. Use the header for metadata about the payload, such as the OpenTelemetry trace IDs. The most scalable systems use all these features appropriately. The events that we stream with Kafka can support headers as well as keys and the main body of the payload. Use all the features of Apache Kafka records This article will pick out some of the best advice we have for getting your Apache Kafka data payloads well designed from the very beginning of your project. Getting the different components in your systems talking nicely to one another relies on a rather mundane but crucial detail: a good data structure in the message payloads.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |