Cdc etl

8/18/2023

What happens if your message commits to your database but not to the messaging queue? What happens if the message gets sent to the services but it doesn’t actually commit in your database? Typically, an individual service within an event-driven architecture needs to commit changes to both that service’s local database, as well as to a messaging queue, so that any messages or pieces of data that need to be sent to another service can do so. In event-driven architectures, one of the hardest things to accomplish is to safely and consistently deliver data between service boundaries. Because you’re continuously streaming data from your database to your data warehouse, the data in your warehouse is up-to-date, allowing you to create real-time insights, giving you a leg up on your competitors because you’re making business decisions on fresher data.Because the data is sent continuously and in much smaller batches, you don’t need to provision as much network in order to make that work, and you can save money on network costs.While changefeeds are not free, they are cheaper and they are spread out evenly throughout the day. CDC does not require that you execute high load queries on a periodic basis, so you don’t get really spiky behaviors in load.Using change data capture to stream data from your primary database to your data warehouse solves these three problems for the following reasons: So if you update your data every night that means you can’t query what happened yesterday until the next day. Delayed business decisions: Business decisions based on the data are delayed by your polling frequency.And because you have big spikes in network costs and bytes that you’re sending over the network, you have to provision your network to be able to handle peak traffic and peak batch sending of data. Network provisioning: Sending all that data puts a lot of strain on your network.

Periodic spikes in load: These large queries impact the latency and ultimately the user experience, which is why a lot of companies tend to schedule spikes in low traffic periods.Either way there are three big downsides to this process: You would achieve this by either doing a nightly job where you do one big query to extract all the data from your database to then refresh your data warehouse, or you poll your database on some periodic cadence, for instance every half hour or an hour, to get the new data and just load that new data into your data warehouse. Traditional ETL is based on the batch loading of data. ELT is a more common concept these days, where instead of transforming before you load, you actually load the raw data into your data warehouse -then do those aggregations and joins later.ETL stands for Extract Transform Load whereby you take the data from your primary database, extract it, do some data transformations on it (aggregations or joins) and then put those into your data warehouse for the purposes of analytics queries.CDC can make this process more efficient. Streaming data from your database into your data warehouse goes through a process called ETL or ELT. Use CDC For Streaming Data to Your Data Warehouse

0 Comments

Cdc etl

Leave a Reply.

Author

Archives

Categories