This document is written for people who monitor performance and tune the components in a BI environment. To maximize Oracle® Business Intelligence Enterprise Edition performance, you need to monitor, analyze, and tune all the Fusion Middleware / BI components. This guide describes the tools that you can use to monitor performance and the techniques for optimizing the performance of Oracle® Business Intelligence Enterprise Edition components. In order to have an organized way to analyze the performance of an OBIEE installation, we will take a three pronged approach so that all the components can be tested, their performance analyzed and subsequently altering the settings to fine tune performance. This approach involves the following steps:
Creating a test scenario – A predefined, repeatable and variable workload model Executing and Measuring – Running the tests and collecting the data Analysis – Analyzing the test results and performing any changes to the configuration and repeating the cycle to fine tune the system.
Defining a performance test The first step in assessing the performance of OBIEE is defining the test we are going to use. For instance, troubleshooting a performance problem in Production, this might be a single report. In case of a new system it would be more complex. The process to define it is the same though. The performance test should be a validation or possibly determination of some or all of the following, which may or may not be formally documented in a project as ‘Non-Functional Requirements’ (NFRs): Report response times concurrency levels Hardware utilization The testing involves 3 components which help define our tests better. They are: 1) Repeatability. You must be able to rerun the test, so that you can a) Prove bugs/performance problems have been fixed b) Re-use the test as a baseline for future measurements 2) Accuracy. This can be difficult with a stack as complex as the OBIEE one, but you must do all you can to avoid conflation and misinterpretation of data. 3) Ease of Analysis. If the result of your test is unsuccessful, you will have to start analyzing the issue. If the test is unnecessarily big or complex then it will be very difficult and time consuming to pare it back to find root cause.
Define a The first step in defining a good test is to define extremely precisely and clearly what it is we are testing. When we consider how to model an OBIEE in a performance test, we have the following challenges:
Navigation paths In a BI system, once a has run their first report, which will they run then? Maybe a different report, but quite likely it will be navigation within the existing report (e.g. drill down, prompt changes, etc.) Ad hoc Whilst some reporting will be fixed reports, much business value from BI comes in the analysis and exploration of data in an ad hoc manner to discover trends etc. to give competitive advantage. These ad hoc queries cannot, by definition, be known in advance, so the performance test definition must find a way to address this. Think about how many types of s present in the system, and how to characterize and group them. There will be different sets of data they access, and varying degrees of interactivity with which they use the tool. Some will be using prebuilt reports with fixed filters, other may be doing extensive ad-hoc analyses, and somewhere in the middle will be pre-built reports but with a fair amount of interactivity through functionality such as prompts and selection steps.
Define a Report For each of the types that we define, we need to precisely define the workload for them. This will be made up of the execution of one or more dashboards/reports. As well as the distinction to make between individual reports (Analyses/Answers) and Dashboards, we also need to consider: Filters How do we model a report which has six possible filters? We can’t just call that ‘report A’, because each filter permutation could cause different SQL to be generated by the BI reporting tool and consequently vastly different execution plan(s) on the database where the query runs. Aggregate navigation One of OBIEE’s many strengths is its aggregate navigation capability, enabling the to drill down into data whilst in the background the tool switches between pre-defined aggregate tables on the database to pick the fastest one to use. For performance testing we need to make sure that within a report each possible aggregate is considered separately. Any report or dashboard that’s using prompts needs to have those prompt values stated in the test definition. The execution of a query on the database can vary considerably depending on the predicates supplied, and it is in part from the report/dashboard prompts that these predicates come.
Workload Having precisely defined the reports which are going to be run, we can sequence them together to represent different types of s. Clearly this is going to be a generalization, an estimation, of how a particular group of s is going to tend to use the reports. Unlike an OLTP transaction, where the steps are much more predefined and free from possible deviations, s of OBIEE have the flexibility to run any number of reports, drill within existing reports, and so on. Some types may just run one report and exit; they don’t have to have a complex sequence. We should model several different instances of each group in order or increase the test coverage and realism of
different reports running concurrently. There should be awareness that the more types and tests to be done, the longer it will take to build, and the more complex the troubleshooting could be.
Concurrency Once we have designed a set of individual workloads, we can bring these together into a larger test to assess the performance of the system as a whole under load. The aim of load testing is to assess what happens to the response time of reports, and hardware utilization, as the number of concurrent s on the system increases. There are three key things to define in a load test: Workload of each Different types of Number of concurrent s The first two of these are discussed above. When it comes to concurrency, we need to be very careful how we derive our figure, because it has a big impact on the scale of the tests that we run and the discussion of the outcomes. First, ensure that definition of concurrency that is used is concise. These include:
Concurrent sessions logged into OBIEE, viewing dashboards, formatting reports. Concurrently executing queries at the BI Server level, which could be a single running one report or one dashboard. Concurrently executing queries on the database, which could be a single in OBIEE running a single report or dashboard requiring more than one SQL request to get its data.
Each of these have their own particular load on the system, but can easily differ by orders of magnitude. Typically a concurrency figure will be derived from the number of s. From this, a reckoning is made on how many will be using the system at one time. So unless it’s qualified otherwise, we need to work with the first definition – concurrent sessions logged into OBIEE. If you have an existing system then it is easy to determine your actual concurrency through either Usage Tracking or the OBI DMS metrics. If your concurrency figure is based on s logged into OBIEE, not necessarily running reports, then the metric Oracle BI PS Sessions: Sessions Logged In should be appropriate. For the number of queries (logical SQL) concurrently running on the BI Server, use Usage Tracking, or DMS metric The number of concurrent queries on the database can either be obtained from the database monitoring tool, or through the Oracle BI DB Connection Pool DMS metrics. To estimate concurrency from Usage Tracking, you can use the following query. It will analyze Usage Tracking to give an approximate number of concurrent s, but this is based on individual queries run on the BI Server, so it not the same as s simply logged into OBIEE, nor the same as dashboard refreshes (since one dashboard could be many queries): -- Usage Tracking Concurrency usage estimate -- Per minute granularity SELECT TO_CHAR(start_ts, 'YYYY-MM-DD HH24:MI') AS start_ts_min, COUNT(DISTINCT _name) AS distinct_s FROM s_nq_acct
GROUP BY TO_CHAR(start_ts, 'YYYY-MM-DD HH24:MI') ORDER BY 1 DESC
Tools & Designs for OBI performance testing Traditional load testing tools such as JMeter, Oracle Application Testing Suite (OATS), and HP LoadRunner can be used to simulate a clicking on a report in a web browser and recording how long it takes for the report to run. They can have multiple report definitions, and run many concurrently. This is usually sufficient for a comprehensive performance test The BI Server s ODBC or JDBC for communication with clients (such as Presentation Services). We can use a client to send Logical SQL requests to it and record the response time. Because Presentation Services also sends Logical SQL to BI Server, this test approach is a good one because from the BI Server + Database point of view, the behavior is exactly the same whether the Logical SQL comes from Presentation Services or another client. ‘nqcmd’ is the ODBC client that is provided with OBIEE. We can wrap this in a series of shell scripts to create a complete test, or by building a test using JMeter with the OBIEE JDBC driver for a more flexible and scalable test rig. What this method doesn’t test is Presentation Services and upstream, so the additional processing of result sets, ing the data back to the web browser, and the rendering of it. But, if we are targeting testing at just the RPD and Database then this is a very good way to avoid unnecessary complexity. Having designed and built the tests, we now move on to looking at the real nitty-gritty – how we run them and collect data. The data that we collect is absolutely crucial in getting comprehensible test results and as a consequence ensuring valid test conclusions. There are several broad elements to the data collected for a test: Response times System behavior Test details The last one is very important, because without it we just have some numbers. If someone wants to reproduce the test, or if we want to rerun it to check a result or test a change, we need to be able to run it as it was done originally. A set of response times in isolation is interesting, sure, but unless they can be traced back exactly how they were obtained so that they can: a. Ensure or challenge their validity b. Rerun the test then they’re just numbers on a piece of paper. The test should be done on as ‘clean’ an environment as possible. The more contaminating factors there are, the less confidence we can have in your test results, to the point of them becoming worthless. Work with a fixed code version:
Database, including: o DDL o Object statistics BI Server Repository (RPD) Dashboard and report definitions (Webcat)
For example, if a change was made to the RPD but in a different Business Model from the one you are testing then it may not matter. If, however, they have partitioned an underlying fact table, then this could drastically change your results to the extent you should be discarding your first results and retesting.
Make sure the data in the tables from which reporting is done is both unchanging and representative of Production. If you are going live with 10M rows of data then it would be pretty wise to do our utmost to run the performance test against 10M rows of data. Different types of reports might behave differently, and this is where judgment comes in. For example, a weekly report that is based on a fact table partitioned by week might perform roughly the same whether all or just one partition is loaded. However, the same fact table as the source for a historical query going back months, or a query cutting across partitions, is going need more data in to be representative. The configuration of the software should be constant. This means obvious configuration such as BI Server caching, but also things like version numbers and patch levels of the OBIEE stack. Consider taking a snapshot of all main configuration files (NQSConfig.INIinstanceconfig.xml, etc) to store alongside your test data. It is advisable to turn off BI Server caching for the initial tests, and then re-enable it if required as a properly tested optimisation step.
Measure Before executing a performance test, we need to work out what data is to be collected and how to collect it. The data to collect for the test itself includes: Response times at the lowest grain possible/practical. Response times should be 2dimensional; transaction name, plus time offset from beginning of test. Number of test s running over time (i.e. offset from the start of your test) Environment details – a diagram of the top-to-bottom stack, software versions, code levels, and data volumes. Anything that is going to be relevant in assessing the validity of the test results, or rerunning the test in the future. System metrics – if response times and numbers are the eye-catching numbers in a test, system metrics are the oft-missed but vital numbers that give real substance to a test and make it useful. If response times are bad, we need to know why. If they’re good, we need to know how good. Both these things come from the system metrics. Query Metrics – depending on the level at which the testing is being done, collecting metrics for individual query executions can also be vital for aiding performance analysis. Consider this more of a second round, drill down, layer of metrics rather than one to always collect in a large test since it can be a large volume of data. Here is a checklist to work through for executing our test: Clear down the logs of any component going to be analyzed (eg nqquery.log, sawlog.log) Record any configuration/environment changes from the original baseline Record test start time Restart the OBIEE stack (i.e. WLS, OPMN) Start OS metric collection Start OBIEE metric collection (if used) Run the test.
Monitor for errors and excessively bad response times Record test end time Record any errors observed during test Copy all logs, metric outputs, etc. to a named/timestamped folder
Analyzing Results Analyzing the data breaks down into several stages, and is often an iterative process: 1. Analyze the net response time. Is it as fast as it needs to be, at the required level of concurrency? 2. If the response time is too slow (“too slow” being defined by us or the s, in advance of the test), then diagnose to determine why. This is another phase of analysis, breaking down the net response time into its constituent parts, analyzing system and OBI metrics for signs of a bottleneck. The output of this phase will be a hypothesis as to the cause of the performance problem 3. Based on the diagnosis of the issue, apply one change to the system to improve it, that is, resolve the performance issue. Having made one change (and one change only), the original test should be repeated and the analysis cycle repeated to determine the impact of the tuning.
Diagnosing poor OBIEE performance Get to the root of the problem So, the test results showed that the dashboard(s) run too slowly. Now, we need to work out why there is a performance problem. Attempting to fix a performance problem without actually understanding exactly what the problem is a bad idea. In determining the root cause, we will learn more about OBIEE. This better understanding of OBIEE will mean it is less likely to make performance errors in the future. We will also become better at performance diagnostics, making solving live problems in Production as well as future performance tests easier and faster to resolve.
Performance vs Capacity Performance issues can be local to a report, or global to a system implementation and exacerbated by a particular report or set of reports – or both. If an individual dashboard doesn’t perform with a single running it, there is clearly a performance problem in the design (of the dashboard, RPD, or physical data model design or implementation). However, if an individual dashboard runs fine with a single but performance gets worse and worse the more s that run it concurrently, this would indicate a capacity problem in the configuration or physical capacity of the system. An easy way to find out is this: before launching into multi--concurrency tests, test the dashboard with a single . Is the response time acceptable? If not, then we have a performance problem. We have eliminated concurrency from the equation entirely. If the response time is acceptable, then move onto the concurrency tests.
If a concurrency test is performed and are trying to identify whether the issue is performance or capacity, then look at what happens to the response time compared to the number of s running. If the response time is constant throughout then it indicates a performance problem; if it is increasing as more s are added it shows a capacity (which can include configuration) problem.
Generally good design principles to observe in OBIEE
Ensure that the RPD has a correct dimensional Star model in the Logical (BMM) layer Push the dimensional Star schema into the database; don’t build against a normalized data model if performance is key requirement Create aggregate tables to queries Use Usage Tracking to spot dimension levels that would benefit, or Summary Advisor on Exalytics Aggregate Persistence Wizard can do the initial hard work in plumbing the metadata into the RPD. The aggregate needs to be maintained. Need to create associated aggregated dimension tables. These are particularly useful for good performance of prompts where a distinct list of values at a level in a dimension are returned. Make sure that aggregates get used when appropriate. Check the SQL that OBIEE is generating isn’t using a less efficient source table. If s really want a data dump, consider doing that for them outside of the tool, for example with sql*plus. To still make use of the metadata model in the RPD, but without causing big problems in Presentation Services, use an ODBC or JDBC call into the BI Server directly to get the data dump out. Using this method, we could hook in Excel directly to the BI Server. The fastest query is one that never runs – challenge dashboard & reports designs. Analyze the ’s workflow, to see if the reports built can and make more efficient what the does. Generally, we should avoid building the RPD against database views, as they can hinder OBIEE’s SQL generation with the result of sub-optimal queries. Database views can also hide inefficient or unnecessary s and logic. Put the underlying tables into the Physical layer of the RPD instead and let OBIEE work with them. Minimize the work being done by the BI Server. When using federation to data across databases it is unavoidable but generally it is to be avoided if within the same database. Wherever possible, all work should be seen to be pushed down to the database. Check how much data the BI Server pulls back from the database as a percentage of rows returned to the (low % is bad). Monitor the BI Server’s temp directory – if this is filling up with large files it means that the BI Server is having to crunch lots of data How many database queries does one report trigger? (higher is generally less efficient). Size hardware correctly to the BI Server and Presentation Server (based on past experience and/or Oracle’s documentation)
Optimising OBIEE further The above section outlines some of the principles that should always be aimed at following, or have a real reason why not. There are some other techniques that can be worth investigating when looking to optimize the performance of OBIEE further, discussed below.
BI Server Caching Caching is not the bread and butter of good performance. Caching should not be the sole means of ensuring good performance. That is not to say caching is bad. Caching is good, when its use is thought through and evaluated carefully. Caching has an overhead in of management, so we need to manage the cache to make sure that stale data is not ed on to the end s. It might be fast, but it’ll be wrong. Caching can improve performance for several reasons:
It is typically faster to return the results of a query already run and stored in the cache than it is to re-execute the necessary Physical SQL against the source database(s) By not executing the Physical SQL on the database(s), we reduce both the load on the database, and the amount of network traffic, freeing up these resources for non-cached queries The cache stores the results of a Logical SQL query sent to the BI Server, not the data that the database returns. If the BI Server is doing lots of work, for example, federating and aggregating lots of data across sources, then by caching the result post-processing, all of this work can be avoided by subsequent cache hits.
Pre-seeding the cache is a good idea, so that s all benefit from the cache, not just those who come along after the first has run a report that gets stored in the cache. There are a couple of ways to pre-seed the cache: 1. A BI Delivers Agent with the Destination set to System Services : Oracle BI Server Cache. This agent could optionally be set to run straight after your ETL batch has loaded the data. 2. Directly from an ETL tool via ODBC/JDBC using the SASeedQuery statement. The flip-side of pre-seeding the cache is purging it, and there are two sensible ways to do this : 1. Event Polling Table 2. ODBC/JDBC command to the BI Server, triggered by the completion of a data load (ETL)