Performance Tuning In Informatica.ppt q2u6h

Performance Tuning for Informatica PowerCenter

What Affects Performance? These can make a substantial performance difference:

• Database Tuning – Sources, Targets, Repositories, Query Tuning, Statistics Collection, Indexes, other techniques

• Network - Capacity, Utilization • Disk – Speed, Space, Tuning, Storage Type, Mount Type • Server – Database, Informatica – RAM, U, •

Configuration PowerCenter add on features – Grid, Partitioning

Informatica Platform Architecture Informatica Analyst

Analyst Service

Model Repository Service

Informatica Developer

Reposito ry Data Integration Service

Informatica

ISP

Profile Service

Mapping Service

Staging Data

SQL Service

Integration Service

Data Object Cache Profile Warehouse

Workflow Manager

Mapping Designer

Repository Service Reposito ry

Metadata Manager

Metadata Manager Service

MM Warehouse

Comparison Across Tools Most

techniques are the same except:

PowerCenter • Intended for larger data volumes • Use the Source Qualifier , SORT and FILTER features instead of additional ER, SORTER or FILTER transformations in the mapping (Why: Includes the , sort and filter conditions in the WHERE clause in the Source Qualified database query)

• Cache Lookup transformations (Why: Prepare for managing larger amounts of data from Source)

• Memory Settings in Session

Data Services • Intended for small data volumes • Use additional ER, SORTER or FILTER transformations in the mapping instead of placing in the READ transformation (Why: Uses native Pushdown Optimization and avoids blocking pushdown on subsequent transformations)

• Do not cache Lookup Transformations (Why: minimize I/O to a few rows of required data from Source based on request. Select only what is used for those rows.)

• Memory settings in DIS

Key Techniques

• Adjusting Memory for DTM Buffer (PC and IDS) • Prevent Blockage of Pushdown (IDS, and PC if you have PDO)

• • • •

Lookup Caching (PC) Cache and Optimization (IDS) Avoid Unnecessary I/O Avoid Duplicate Work

7

Anatomy of a Session Integration Service or Data Integration Service Data Transformation Manager (DTM) DTM Buffer Source data READER

WRITER

TRANSFORMER

Target data

Transformation caches

PC Session Memory Settings

• Generally set the DTM Buffer Size to Auto

• For large jobs, set as large as needed, defining it using all numeric values (2000000000 for 2 GB)

• Set to fit 100

rows or more

Maximum Memory Settings

System uses the smaller of the two Applies to DTM Buffer Size, Buffer Block Size, and transformation caches set to Auto Small changes can have a huge impact on performance, as it provides more DTM Buffer space

Throughput and Busy Percentage in Log • View Throughput values in PowerCenter Workflow Monitor • Values less than 5000 rows/sec are slow and need tuning, unless the table size is small too

• Review log file Busy Percentage to evaluate performance/bottleneck status • Source, target and transformation Busy Percentage • Even distribution – all 100% -- is ideal

• One section busy and others not busy indicates a bottleneck

• Other memory efficiency messages in the log – review and modify memory accordingly

Lookups

• Minimize Lookup calls with Unconnected Lookups when a conditional lookup is required

• Eliminate Records from the Lookup • Leverage the Lookup source filter property • SQL Override to include only rows in the Source for insert/update check on target or referential integrity checks

• For large Lookups, consider ing the rows instead • Uncached lookups require queries of the source for each lookup row

• Cached lookups query the source once for a complete set of data

Lookup Cache

•

Reuse Cache within a Mapping

• Structure the lookups identically, same ports, same lookup condition, and it will be reused automatically

• Include additional ports needed by all lookups to the table, even if not used in all to enable reuse

•

Reuse Cache Between Mappings – Named Cache

•

Enable concurrent caches

•

• Build cache first time and use across several mappings • In the first session, indicate to rebuild the cache if it exists

• Add pipelines to build cache at the start of the mapping for all lookups concurrently, instead of default sequentially when lookup process the first row

Increase the index and data cache sizes - hold all data in memory without paging to disk

Determine Potential Cache Size Source

Lookup

Lookup

Table

Items

Manufacturer

DIM_Items

Rows

5000

200

100000

Example 1 – Lookup Manufacturer Cached Build Cache

Example 2 – Lookup Dim_Items Uncached Cached 200

Uncached

0

100000

0

Read Source Records 5000

5000

5000

5000

Execute Lookup

0

5000

0

5000

Total Disk Reads

5200

10000

105000

10000

In this example, for a small lookup table, caching is best, for a large one, uncached is best

Reduce I/O and Reduce Number of Rows

• Reduce amount of data moving across the network - Filter data in the database

• Reduce the amount of data selected from the source Reduce the number of ports selected in sources, lookups

• Select Distinct in Source Qualifier – Filter rows • Reduce writes to the log file • Set Trace Level lower (e.g. to WARN) • Fix Transformation Errors • Fix Predicate syntax to avoid messages (view in IDS logs)

• Single Reads

• Read source once and write to multiple targets

Reduce I/O and Reduce Number of Rows

• Remove unnecessary ports and links to unnecessary ports

• Reduce the number of transformations - keep the Source and Target anchors

• Tune Database Queries and SQL statements – see log file for SQL

• Reduce Rows in Pipeline • Move transformations that decrease rows upstream in mapping (Filter, er, Aggregator, Rank)

• Move transformations that increase rows downstream (Normalizer, er, SQL, Java, Lookup with Return All Rows selected)

Avoid Duplicate Work

• Place common logic upstream • Perform functions needed in each data stream prior to the split to reduce duplicating logic or function calls

• Calculate once, use many times

• Avoid calculating or testing the same value over and over • Calculate it once in an Expression transformation, and set a true/false flag

• Within an Expression, use variables to calculate a value used several times

Additional Informatica Resources • More resources available at: • • • • •

http://my.informatica.com http://velocity.informatica.com/ http://marketplace.informatica.com PowerCenter and Data Services Help Product manuals

• PowerCenter

and Data Performance Tuning Guides

Services

• PowerCenter and Informatica Developer Transformation Guides

• PowerCenter and Informatica Guides

• Developer Community • Informatica Professional Services

Additional Performance Improvement Tips/Techniques

Efficient Data Formats

• Flat File vs. Database Table - Flat file located on the server loads faster than database table located on the server

• Fixed-width files are faster than delimited - Delimited files require extra parsing

• UNICODE vs. ASCII data movement mode - Unicode data requires 2 bytes per character; ASCII requires 1 byte per character

• Disable High Precision - Applies to “decimal” data type. • Use native drivers instead of ODBC drivers

Transformation Tips

• Avoid Stored Procedures, External Procedures, and Advanced External Procedures - Slow and has overhead

• Use sorted input for Aggregators, Union and ers • Connect only the ports that will be used downstream Lookup and other transformations

• Use the where clause in the Lookup Filter instead of a Lookup SQL Override

• • • •

Sort in Source Qualifier (PC) in Source Qualifier (PC) Pushdown Sort (IDS) with Sorter transformation Pushdown (IDS) with er transformation

Mapping Tips

• Simplify nested functions

instead of: IIF(condition1,result1,IIF(condition2, result2,IIF… ))))))))))))

•

try:

•DECODE (TRUE, condition1, result1, … , conditionn, resultn) DECODE processing stops as soon as a condition is matched

•Entire IIF statement evaluated even if earlier condition succeeds

• Functions are more expensive than operators • Use || instead of CONCAT()

Mapping Tips

• Avoid Data Type Conversions • Avoid implicit conversion, especially repeated ones • Avoid repeated conversions by explicitly converting and managing the data type as it moves through the mapping

• Trim spaces before comparing CHAR and VARCHAR ports

• Use Integer values for Filter and Lookup conditions, even •

if datatype conversion is required, instead of Char or Data Minimize the number of targets per mapping

• More targets = poorer performance

• Reduce the use of function calls • SUM(Col A) + SUM(Col B) Can be SUM(Col A + Col B)

Mapping Tips • Use Integers instead of Strings or Dates and Avoid Complex Expressions

• Comparison in Lookups, Filters and Routers • Group By in Aggregator

• XML Reader / Mid-Stream XML Parser • Remove unprojected groups • Don’t leave port size lengths as infinite.

• Minimize use of Update Strategy transformations

Mapping Tips

• Avoid Unnecessary Sorting on already sorted input • Replace multiple filter transformations with a router transformation - Reduces the number of transformations in the mapping

• Use Filters or Routers to drop rejected rows from an Update Strategy transformation

• Select the appropriate Master Source in er • Source with fewer rows (unsorted er) • Source with fewer duplicate keys (sorted er)

• Normal s are faster • in database

Mapping Tips

• Reusable Sequence Generator - Number of Cached

Values Property • Setting too low causes frequent repository access, which impacts performance – set to 1000 or similar • Unused values in a block are lost leading to gaps in the sequence

• Change unused lookup ports to be neither input nor •

output – reduces generated SQL results Use a SQL override with an ORDER BY clause to override the default ORDER BY for lookups with many output ports

Mapping Tips

• Minimize unconnected lookup execution in IIF expression - Set up the condition so unconnected lookup is in the THEN clause

• Use "Return any matching value" in Lookup whenever possible - Creates index based on key ports rather than all lookup ports

• Place Lookup Conditions in the following order: • Equal to (=) (numeric ports first, then string or date) • Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=)

• Not equal to (!=)

Session/Runtime Settings

• Increase the Target Commit Interval - Reduce database overhead

• Use pre-post session commands to drop and rebuild indexes and constraints - Improves load time when loading many rows, balance with index/constraint rebuild time

Performance Tuning In Informatica.ppt q2u6h

Overview 26281t

More details 6y5l6z

Related Documents 3h463d

Performance Tuning In Informatica.ppt q2u6h

Performance Tuning In Informatica 6hs73

Performance Tuning In Obiee 11g.docx 6e4nz

Understanding Performance Tuning In Oracle 505o2d

Performance Tuning Interview Questions 4z2j6z

Tibco Businessworks Performance Tuning 1dt2m

More Documents from "tejs" 3j71q

Performance Tuning In Informatica.ppt q2u6h