- When and How to scale Self Hosted IR for Data Factory?
- Some best practices you should consider for Self Hosted IR
- Disaster Recovery Solutions for Self Hosted IR
Scaling of Self Hosted IR
Identifying Self Hosted IR bottlenecksHow do we know if there are some bottlenecks with Self Hosted Integration runtime? How do we know that we need to scale the nodes included in the runtime environment? There are 4 metrics that can help you measure the performance or your hosted IRs.
- Average Integration runtime CPU utilization – Average CPU utilization of nodes in IR. A threshold continuous breaching 80% mark indicates a serious CPU bottleneck.
- Average Integration runtime available memory. Average memory available on the nodes. A lower value indicates memory congestion.
- Average Integration runtime queue length – Average number of pipelines or activity runs in a queue. A higher count indicates IR resource congestion\bottleneck.
- Average Integration runtime queue duration – Average Amount of time the job waits in the job before it gets scheduled. Higher values indicate congestion.
When to Scale Out?When processor usage is continuously hovering over the 80% mark, available memory is low on the self-hosted IR, and average queue length and duration indicates higher value count, add a new node to help scale out the load across machines. If activities fail because they time out or the self-hosted IR node is offline, it helps if you add a node to the gateway.
When to Scale Job Capacity for Nodes?When the processor and available RAM aren’t well utilized, but the execution of concurrent jobs reaches a node’s limits, scale up by increasing the number of concurrent jobs that a node can run. You might also want to scale up job capacity when activities time out because the self-hosted IR is overloaded. As shown in the following image, you can increase the maximum capacity for a node:
When to Scale Up?The maximum nodes you can add to Self Hosted IR for HA and Scalability is 4. Once you can’t scale and add more nodes consider increasing the CPU and RAM of the nodes
- For Azure VM – Change the Node Size to larger size.
- For On Prem Nodes – Move to a large machine with more RAM and CPU.
Best Practices for Self Hosted IR
- Configure a power plan on the host machine for the self-hosted integration runtime so that the machine doesn’t hibernate. If the host machine hibernates, the self-hosted integration runtime goes offline.
- Ensure the Auto Update is set for Integration Runtime Environment and all nodes are up-to-date. A node which is not updated to the latest version will be offline and not be used for scheduling.
- You can use a single self-hosted integration runtime for multiple on-premises data sources. You can also share it with another data factory within the same Azure Active Directory (Azure AD) tenant.
- It is recommended that you install the self-hosted integration runtime on a machine that differs from the one that hosts the on-premises data source. When the self-hosted integration runtime and data source are on different machines, the self-hosted integration runtime doesn’t compete with the data source for resources.
- If you have multiple nodes in Self hosted IR and they are underutilized or to ensure optimal utilization of resources you can consider reusing an existing self-hosted integration runtime infrastructure. This reuse lets you create a linked self-hosted integration runtime in a different data factory by referencing an existing shared self-hosted IR.
Disaster Recovery (Or High Availability)There is no out of the box disaster recovery feature available with Integration runtime currently. If service stops due to any error, you will have to manually restart the service. You should ideally set up multiple nodes for Integration Runtime. This avoids having a single point of failure and provides higher throughput, as all nodes are set up as active. Please refer to the below illustration for more detailed information on the setup. We can have up-to 4 Nodes associated with a self-hosted integration runtime spread across on-premises and Azure. To ensure maximum availability for Azure nodes we can have the nodes configured with either Availability Sets or Availability Zones (if supported by region). To ensure maximum availability for on-premises nodes they should be created on separate racks\hardware. This availability helps ensure continuity when you use up to four nodes. This setup also offers scalability, improved performance and throughput during data movement between on-premises and cloud data stores. For more details, refer to the link here that points to the section on High Availability and Scalability, with details on setting up multiple nodes (up-to 4). Note: Before you add another node for high availability and scalability, ensure that the Remote access to intranet option is enabled on the first node. To do so, select Microsoft Integration Runtime Configuration Manager > Settings > Remote access to intranet. You will need to ensure that your network setup allows on-premises data sources to be accessible from Azure vnet and vice versa.
Want to talk with an expert? Schedule a call with our team to get the conversation started.