Netezza Tips & Tricks
Agenda • • • •
Optimizing Tables/Distribution of Data Optimizing Queries/Co-location Using Nexus with Netezza Query History/Query Statistics
2
Optimizing Tables/Distribution of Data • What is Distribution? – Twin Fin 6 has 46 blades on which data is distributed. – Optimal distribution has same amount of data on each of the 46 blades – Distribution key is used in query plans to enable co-located queries
3
Optimizing Tables/Distribution of Data • Non-Optimal Distribution Practices – Using “… Distribute on RANDOM” • Will always result in some data movement to tables. • OK for small dimension tables • Non-optimal for large fact tables
4
Optimizing Tables/Distribution of Data • Non-Optimal Distribution Practices – Using “… Distribute on (multiple,columns)” • condition MUST include all of the columns specified in the distribution key, or else the data will need to be redistributed. • condition MAY include additional columns that are not part of the distribution key without penalty. • Usually best to specify a single column distribution key that is most commonly used in conditions. This allows you to use either a single or multicolumn condition. 5
Optimizing Tables/Distribution of Data • Using different data types – Bad practice – Must re-cast data type – Could cause redistribution
6
Netezza SQL Optimization • Co-located Queries – Relationship to distribution – Examples • GC_HEADER, TLD_FACT – Both distributed on DW_GC_HEADER
• SubQueries vs Temp tables – In-flight skew – Creating temp table and forcing distribution can create co-location – Example 7
Using Nexus with Netezza • • • •
Viewing DDL Viewing active queries View most recent query history Saving query results to local database
8
Using Query History Table • Viewing query history • Gathering query run statistics – Queue time – Actual run time – Total run time
• Can be used to gather most questions about “load on the box” at any point in time. 9
Using QueryHistory Table SELECT (extract( hour from age(qh_tend,qh_tsubmit)) * 3600 ) + (extract( minute from age(qh_tend,qh_tsubmit)) * 60) + extract( second from age(qh_tend,qh_tsubmit)) as runsecond, extract(month from qh_tsubmit) as month, extract(day from qh_tsubmit) as day, extract(hour from qh_tsubmit) as hour, extract(minute from qh_tsubmit) as minute, qh_sessionid,qh_planid,qh_clientid,qh_cliipaddr,qh_database,qh_,qh_tsubmit,qh_tstart, qh_tend,qh_priority,qh_pritxt,qh_estcost from dba..nz_query_history where ( qh_tend between '06/12/2010 08:30:00' and '06/12/2010 11:30:00' -- and qh_ = 'OMRSOS' -- and runsecond > 30 ) -- Queries that started before our range and ended after the range. These also ran in the window -- that we are interested in. or ( qh_tsubmit < '06/12/2010 08:30:00' and qh_tend > '06/12/2010 11:30:00' ) order by qh_tsubmit;
10
Netezza Tips & Tricks Questions and Answers