Performance Tuning for Informatica PowerCenter
What Affects Performance? These can make a substantial performance difference:
• Database Tuning – Sources, Targets, Repositories, Query Tuning, Statistics Collection, Indexes, other techniques
• Network - Capacity, Utilization • Disk – Speed, Space, Tuning, Storage Type, Mount Type • Server – Database, Informatica – RAM, U, •
Configuration PowerCenter add on features – Grid, Partitioning
Informatica Platform Architecture Informatica Analyst
Analyst Service
Model Repository Service
Informatica Developer
Reposito ry Data Integration Service
Informatica
ISP
Profile Service
Mapping Service
Staging Data
SQL Service
Integration Service
Data Object Cache Profile Warehouse
Workflow Manager
Mapping Designer
Repository Service Reposito ry
Metadata Manager
Metadata Manager Service
MM Warehouse
Comparison Across Tools Most
techniques are the same except:
PowerCenter • Intended for larger data volumes • Use the Source Qualifier , SORT and FILTER features instead of additional ER, SORTER or FILTER transformations in the mapping (Why: Includes the , sort and filter conditions in the WHERE clause in the Source Qualified database query)
• Cache Lookup transformations (Why: Prepare for managing larger amounts of data from Source)
• Memory Settings in Session
Data Services • Intended for small data volumes • Use additional ER, SORTER or FILTER transformations in the mapping instead of placing in the READ transformation (Why: Uses native Pushdown Optimization and avoids blocking pushdown on subsequent transformations)
• Do not cache Lookup Transformations (Why: minimize I/O to a few rows of required data from Source based on request. Select only what is used for those rows.)
• Memory settings in DIS
Key Techniques
• Adjusting Memory for DTM Buffer (PC and IDS) • Prevent Blockage of Pushdown (IDS, and PC if you have PDO)
• • • •
Lookup Caching (PC) Cache and Optimization (IDS) Avoid Unnecessary I/O Avoid Duplicate Work
7
Anatomy of a Session Integration Service or Data Integration Service Data Transformation Manager (DTM) DTM Buffer Source data READER
WRITER
TRANSFORMER
Target data
Transformation caches
PC Session Memory Settings
• Generally set the DTM Buffer Size to Auto
• For large jobs, set as large as needed, defining it using all numeric values (2000000000 for 2 GB)
• Set to fit 100
rows or more
Maximum Memory Settings
System uses the smaller of the two Applies to DTM Buffer Size, Buffer Block Size, and transformation caches set to Auto Small changes can have a huge impact on performance, as it provides more DTM Buffer space
Throughput and Busy Percentage in Log • View Throughput values in PowerCenter Workflow Monitor • Values less than 5000 rows/sec are slow and need tuning, unless the table size is small too
• Review log file Busy Percentage to evaluate performance/bottleneck status • Source, target and transformation Busy Percentage • Even distribution – all 100% -- is ideal
• One section busy and others not busy indicates a bottleneck
• Other memory efficiency messages in the log – review and modify memory accordingly
Lookups
• Minimize Lookup calls with Unconnected Lookups when a conditional lookup is required
• Eliminate Records from the Lookup • Leverage the Lookup source filter property • SQL Override to include only rows in the Source for insert/update check on target or referential integrity checks
• For large Lookups, consider ing the rows instead • Uncached lookups require queries of the source for each lookup row
• Cached lookups query the source once for a complete set of data
Lookup Cache
•
Reuse Cache within a Mapping
• Structure the lookups identically, same ports, same lookup condition, and it will be reused automatically
• Include additional ports needed by all lookups to the table, even if not used in all to enable reuse
•
Reuse Cache Between Mappings – Named Cache
•
Enable concurrent caches
•
• Build cache first time and use across several mappings • In the first session, indicate to rebuild the cache if it exists
• Add pipelines to build cache at the start of the mapping for all lookups concurrently, instead of default sequentially when lookup process the first row
Increase the index and data cache sizes - hold all data in memory without paging to disk
Determine Potential Cache Size Source
Lookup
Lookup
Table
Items
Manufacturer
DIM_Items
Rows
5000
200
100000
Example 1 – Lookup Manufacturer Cached Build Cache
Example 2 – Lookup Dim_Items Uncached Cached 200
Uncached
0
100000
0
Read Source Records 5000
5000
5000
5000
Execute Lookup
0
5000
0
5000
Total Disk Reads
5200
10000
105000
10000
In this example, for a small lookup table, caching is best, for a large one, uncached is best
Reduce I/O and Reduce Number of Rows
• Reduce amount of data moving across the network - Filter data in the database
• Reduce the amount of data selected from the source Reduce the number of ports selected in sources, lookups
• Select Distinct in Source Qualifier – Filter rows • Reduce writes to the log file • Set Trace Level lower (e.g. to WARN) • Fix Transformation Errors • Fix Predicate syntax to avoid messages (view in IDS logs)
• Single Reads
• Read source once and write to multiple targets
Reduce I/O and Reduce Number of Rows
• Remove unnecessary ports and links to unnecessary ports
• Reduce the number of transformations - keep the Source and Target anchors
• Tune Database Queries and SQL statements – see log file for SQL
• Reduce Rows in Pipeline • Move transformations that decrease rows upstream in mapping (Filter, er, Aggregator, Rank)
• Move transformations that increase rows downstream (Normalizer, er, SQL, Java, Lookup with Return All Rows selected)
Avoid Duplicate Work
• Place common logic upstream • Perform functions needed in each data stream prior to the split to reduce duplicating logic or function calls
• Calculate once, use many times
• Avoid calculating or testing the same value over and over • Calculate it once in an Expression transformation, and set a true/false flag
• Within an Expression, use variables to calculate a value used several times
Additional Informatica Resources • More resources available at: • • • • •
http://my.informatica.com http://velocity.informatica.com/ http://marketplace.informatica.com PowerCenter and Data Services Help Product manuals
• PowerCenter
and Data Performance Tuning Guides
Services
• PowerCenter and Informatica Developer Transformation Guides
• PowerCenter and Informatica Guides
• Developer Community • Informatica Professional Services
Additional Performance Improvement Tips/Techniques
Efficient Data Formats
• Flat File vs. Database Table - Flat file located on the server loads faster than database table located on the server
• Fixed-width files are faster than delimited - Delimited files require extra parsing
• UNICODE vs. ASCII data movement mode - Unicode data requires 2 bytes per character; ASCII requires 1 byte per character
• Disable High Precision - Applies to “decimal” data type. • Use native drivers instead of ODBC drivers
Transformation Tips
• Avoid Stored Procedures, External Procedures, and Advanced External Procedures - Slow and has overhead
• Use sorted input for Aggregators, Union and ers • Connect only the ports that will be used downstream Lookup and other transformations
• Use the where clause in the Lookup Filter instead of a Lookup SQL Override
• • • •
Sort in Source Qualifier (PC) in Source Qualifier (PC) Pushdown Sort (IDS) with Sorter transformation Pushdown (IDS) with er transformation
Mapping Tips
• Simplify nested functions
instead of: IIF(condition1,result1,IIF(condition2, result2,IIF… ))))))))))))
•
try:
•DECODE (TRUE, condition1, result1, … , conditionn, resultn) DECODE processing stops as soon as a condition is matched
•Entire IIF statement evaluated even if earlier condition succeeds
• Functions are more expensive than operators • Use || instead of CONCAT()
Mapping Tips
• Avoid Data Type Conversions • Avoid implicit conversion, especially repeated ones • Avoid repeated conversions by explicitly converting and managing the data type as it moves through the mapping
• Trim spaces before comparing CHAR and VARCHAR ports
• Use Integer values for Filter and Lookup conditions, even •
if datatype conversion is required, instead of Char or Data Minimize the number of targets per mapping
• More targets = poorer performance
• Reduce the use of function calls • SUM(Col A) + SUM(Col B) Can be SUM(Col A + Col B)
Mapping Tips • Use Integers instead of Strings or Dates and Avoid Complex Expressions
• Comparison in Lookups, Filters and Routers • Group By in Aggregator
• XML Reader / Mid-Stream XML Parser • Remove unprojected groups • Don’t leave port size lengths as infinite.
• Minimize use of Update Strategy transformations
Mapping Tips
• Avoid Unnecessary Sorting on already sorted input • Replace multiple filter transformations with a router transformation - Reduces the number of transformations in the mapping
• Use Filters or Routers to drop rejected rows from an Update Strategy transformation
• Select the appropriate Master Source in er • Source with fewer rows (unsorted er) • Source with fewer duplicate keys (sorted er)
• Normal s are faster • in database
Mapping Tips
• Reusable Sequence Generator - Number of Cached
Values Property • Setting too low causes frequent repository access, which impacts performance – set to 1000 or similar • Unused values in a block are lost leading to gaps in the sequence
• Change unused lookup ports to be neither input nor •
output – reduces generated SQL results Use a SQL override with an ORDER BY clause to override the default ORDER BY for lookups with many output ports
Mapping Tips
• Minimize unconnected lookup execution in IIF expression - Set up the condition so unconnected lookup is in the THEN clause
• Use "Return any matching value" in Lookup whenever possible - Creates index based on key ports rather than all lookup ports
• Place Lookup Conditions in the following order: • Equal to (=) (numeric ports first, then string or date) • Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=)
• Not equal to (!=)
Session/Runtime Settings
• Increase the Target Commit Interval - Reduce database overhead
• Use pre-post session commands to drop and rebuild indexes and constraints - Improves load time when loading many rows, balance with index/constraint rebuild time