Saturday, September 17, 2011

Performance Tuning of ATG application

Before you start troubleshooting ATG application issue, you may do the following:
-Get the problem definition.
-Gather all possible information about it like affected transaction , when they detect it on load or with single page, in certain time or all over the day, in one managed server or all servers , when it start to occur , what changed before that time, any applied patch or new driver, ...etc...
-Get all possible tools and methods ready to use.
-Get the application code in-hands.
-Start building your investigation plan.
-Plan can have trial elements, gather information elements, possible fix elements and permanent fix/conclusion.
-If the issue is in the production env, try to replicate it in other env, so you can try fixing it without any possible business impact.

Here is a guide for your investigation and setup of the production environment for ATG application build from my experience plus ATG documentations :

1) ATG Application side Recommendations:

-Enabling liveconfig Settings:
When you’re ready to deploy your Nucleus-based application
in a production environment, enable the settings in the liveconfig configuration layer. This layer overrides many of the default configuration settings with values that are more appropriate for a deployed site. For example, the liveconfig configuration layer improves performance by reducing error checking and detection of modified properties files.

To enable liveconfig, you can use the –liveconfig argument for runAssembler
Or
or add the following line to the WEB-INF/ATGINF/
dynamo.env file in the atg_bootstrap.war module of your EAR file:
atg.dynamo.liveconfig=on

a) Disabling Checking for Changed Properties Files:
This property controls whether or how often ATG rereads .properties files or .java files the next time that an instance of a given component is created (non-global)
The default is 1000. This feature is useful during development, but we recommend disabling it once a site goes live for better performance. The value -1 disables the reloading of properties and .java files altogether.

b) Disable Performance Monitor:
/atg/dynamo/service/PerformanceMonitor) can be used to gather statistics about the performance of specific operations in ATG components.
You can disable the Performance Monitor by setting its mode property to 0:
mode=0
The Performance Monitor is disabled in the liveconfig configuration layer.

c) Adjusting the pageCheckSeconds Property:
ATG’s Page Processor compiles JHTML pages into .java files (JSP compilation is handled by your application server). The page processor, located at /atg/dynamo/servlet/pagecompile/PageProcessor, checks for new Java Server Pages that need to be compiled. You can improve performance by increasing the Page Processor’s pageCheckSeconds property. The page compile servlet uses this property value to determine whether to check for new Java Server Pages that need to be recompiled. If a request occurs within this time interval (measured in seconds) for the same page, ATG will not check the date on the file. This improves performance in serving pages.
A value of 0 causes ATG to check for new pages on each request. The default value is 1. The liveconfig value is 60.

-Fine-Tuning JDK Performance with HotSpot
Refer to Oracle Hot Sport performance tuning for more details.

-Configuring for Repositories:
a) Enable Caching:
Specify the correct values of cache according to your data size, example of this calculation:



b) Setting Cache Modes:
Select the proper cache mode
Remember that if you use locked mode caching, you must also enable lock manager components.

c) Populating Caches on Startup:
This benefit may come at the cost of slower startup times.
You can pre-populate caches in a SQL Repository by using tags in a repository definition file.

d) Configuring Repository Database Verification for Quicker Restarts:
By default, each SQL Repository component verifies each of the tables in its database on startup with a simple SQL query. These verification queries can slow the ATG startup routine.
you may wish to set the updateSchemaInfoCache property to true in your atg.adapter.gsa.GSARepository components, such as /atg/dynamo/service/jdbc/ProfileAdapterRepository.

e) Configure proper caching timings:
**item-cache-timeout :
This attribute defines how long (in milliseconds) a repository item can exist in the item cache without having been accessed before it needs to be reloaded from the database. Effectively, there is a "last touched" timestamp associated with each item cache entry; if the time since that item was last touched is greater than the item-cache-timeout setting, then it's properties are loaded from the database instead of the in-memory cache, which is then updated with the values from the database.

**item-expire-timeout :
This attribute defines how long (in milliseconds) a repository item can remain in the item cache before it needs to be reloaded from the database. Effectively, there is a "time loaded" timestamp associated with each item cache entry; if the time since that item was cached is greater than the item-expire-timeout setting, then it's properties are loaded from the database instead of the in-memory cache, which is then updated with the values from the database.

**query-expire-timeout :
This attribute is the same as the item-expire-timeout but for query cache entries. This attribute defines how long (in milliseconds) a query can remain in the query cache before it needs to be reloaded from the database. Effectively, there is a "time loaded" timestamp associated with each query cache entry; if the time since that entry was cached is greater than the query-expire-timeout setting, then it's properties are loaded from the database instead of the in-memory cache, which is then updated with the values from the database.

Note that the query-expire-timeout attribute only applies when you have query caching enabled.

-Setting Logging Levels :
If you want to disable logging entirely, or specify different logging levels, you can do that in the
GLOBAL.properties file. For example:
loggingError=true
loggingWarning=true
loggingInfo=true
loggingDebug=false

**Your application code must follow the standard in checking on the level before logging the log messages.

-Limiting Initial Services for Quicker Restarts
This is configured using initialServices property of the /atg/dynamo/Initial component.

-Disabling Document and Component Indexing
The ACC creates and maintains indexes of documents and components. For sites with large numbers of documents or components, indexing can take time and CPU resources. Once your site is deployed and relatively stable, you may want to limit or eliminate the indexing of documents or components.
The document and component indexes are maintained incrementally once built, and are rebuilt completely once a day at 1 a.m. by default. An index is rebuilt at startup only if it does not exist at all.
You can selectively exclude portions of the document tree from indexing by adding absolute pathname prefixes to the excludeDirectories property of the /atg/devtools/DocumentIndex component.
The same is true for component indexing, but the component is /atg/devtools/ComponentIndex instead. To improve performance on a live site, you can turn off all document and component indexing by setting the enabled property of the DocumentIndex and ComponentIndex components to false.

-Compress Content:
Compress pages (remove white-spaces) and static content (pictures, JS, CSS) which will speed up download time from the browser in your Web Server.
Compressing HTML/JavaScript/CSS/XML/JSON content can significantly reduce response times. GZIP reduces the size of responses by between 50% and 80%, depending on the type of content. Only turn on GZIP compression for text/html, text/plain, text/json, text/css, and application/x-javascript mime types.
To verify that gzip is being used, “Accept-Encoding: gzip,deflate” should show up in the request header and “Content-Encoding: gzip” should show up in the response header.

-Re-structure your page to have:
*Move .css to the top of all pages.
*Move script files to the button of all pages whenever possible.
*Move all embedded script into external files.

-Ajax Cache:
Cache your ajax response whenever possible to speed us the user experience whenever the results can be cached.

-Pre-Compiling JSPs:
Might slow deployment/server start-up but will speed up the 1st page request.

-Session Stickiness:
It’s important that session stickiness is working properly. Not having it working could result in sessions being continually restored after each request.

-Keep ATG Patched:
When possible, the latest version of ATG should be used. As cumulative patches are released, they should be applied.

-HTTP Connection Reuse:
Using the “Keep-Alive” header allows browsers to reuse the same TCP connection for multiple request/response pairs. Not re-establishing TCP connections for each HTTP request/response helps Reduce network traffic, Reduce the load on the SSL accelerator and Improve performance due to not having to setup and tear down TCP connections for each HTTP request/response.
A good Keep-Alive value for CSC is 300, which is 5 minutes.
Both Internet Explorer 6 and 7 forcibly terminate TCP connections after one minute, regardless of what Keep-Alive is set to. See http://support.microsoft.com/kb/813827 for a workaround.

2) Server\JVM\Operating system Configurations:
-Setup JDBC connections with number matching the expected concurrent users of the site.
-Setting the JTA timeout to 120 seconds whenever possible.
-Increase the max concurrent open files by the operating system.
-Compress the web application output.
-Configure Max Thread Stuck to a propert time to catch where the threads are usually stucked..

3) Using Performance measure tools:

A) ATG built-in Features:

1) Performance monitor:
*Adding PerformanceMonitor Methods to your Code
To enable the Performance Monitor to monitor a section of your Java code:
1. Import the atg.service.perfmonitor.* package.
2. Declare an opName parameter to label the section of the code. This parameter is displayed in the Performance Monitor page under the Operation heading.
3. (Optional) Declare a parameter name if you want to gather data on individual executions of an operation.
4. Call the startOperation method at the beginning of the operation whose performance you want to be able to measure.
5. Call the endOperation method at the end of the operation whose performance you want to be able to measure.
6. Optionally, call the cancelOperation method if an exception occurs. This causes the results of the current execution to be ignored.

These methods can be nested with different or the same opNames.
Example:
PerformanceMonitor.startOperation(opName, parameter);
try {
... code to actually render foo.jsp
} catch (Exception e) {
PerformanceMonitor.cancelOperation(opName, parameter);
exception = true;
} finally {
if (! exception)
PerformanceMonitor.endOperation(opName, parameter);
}

*Performance Monitor Modes:
You can set the Performance Monitor’s operating mode by setting the mode property of the component at /atg/dynamo/service/PerformanceMonitor:
disabled 0 (default)
normal 1
time 2
memory 3
You should use 2=Time to get accumulated results, also try to enable it after warm up the site to exclude extreme reading for loading caches ,...etc.

*View the Results:
You can view the information collected by the Performance Monitor on the Performance Monitor’s page
of the Dynamo Administration UI at:
http://hostname:port/dyn/admin/atg/dynamo/admin/en/performance-monitor.jhtml

2) Using the VMSystem Component :
The ATG component located at /VMSystem provides a way for you to access the Java memory manager.
You can monitor the status of the Virtual Machine and call methods on it. An interface to the VMSystem component is included in the Dynamo Administration UI at:
http://hostname:port/dyn/admin/nucleus/VMSystem/
From this page, you can conduct the following VM Operations:
• Perform garbage collection
• Run finalizations
• Show memory information
• List system properties
• List thread groups
• List threads
• Stop the VM

3) Sampler:
When testing your site, it is useful to automatically sample performance to understand throughput as a function of load. ATG includes a Sampler component at /atg/dynamo/service/Sampler.
Starting the Sampler You can start the Sampler component by opening it in the ACC and clicking the Start button.
You can also start the Sampler component from the Dynamo Administration UI by requesting this URL:
http://hostname:port/dyn/admin/nucleus/atg/dynamo/service/Sampler
The first time you request this page, ATG instantiates the Sampler component, which begins recording statistics.
You can configure ATG to start the Sampler whenever ATG starts by adding the Sampler to the initialServices property of the /atg/dynamo/service/Initial component:

The Sampler outputs information to the file /home/logs/samples.log. For each system variable that it samples, it records the following information in the log file:
• the current value
• the difference between the current value and the value recorded the last minute
• the rate of change of the value
You can adjust values recorded by the Sampler, but the default set is comprehensive in monitoring ATG request handling performance.

B) Log Files:

* Access Logs:
You may enabled access logs on your application sercer or web server to ensure that the time is really consumed inside your application not in network traffic (download time)..

* Application Logs:
Application logs could point to system calls or external systems timeout or DB issues , exceptions , .... alot of useful information could be detected from the application logs.
You might see a “server not responding” message or an OutOfMemory error

C) Thread Dump:
Thread dumps can be useful to see where these threads are
waiting. If there are too many threads waiting, your site’s performance may be impaired by thread context switching. You might see throughput decrease as load increases if your server were spending too much time context-switching between requests. Check the percentage of System CPU time consumed by your JVM. If this is more than 10% to 20%, this is potentially a problem. Thread context switching also depends
in part on how your JVM schedules threads with different priorities

Thread dumps can be taken from Admin console of the command line.
Thread dumps could also point easily to deadlocks and infinite loops and other issues related to bad buggy code.

D) Garbage Collection:
Check the JVM parameters that affect Garbage Collection including gc policy, try to set max and min heap size the same value.
Check the garbage collections Logs; typically when you see spikes or abnormal behaviors, the most important is the full garbage collection runs.

*Phases that stop the threads with the parallel garbage collection algorithm:
tail -f | grep 'Full'
*Phases that stop the threads with the Concurrent Mark Sweep algorithm:
tail -f | grep -E '(CMS-initial-mark)|(Rescan )|(concurrent mode
failure)|(Trying a full collection)|(promotion failed)|(full)|(Full)'

If excessive pausing is noticed, one of several things could be wrong:
- JVM arguments may need to be tuned
- There may be a memory leak
- Repository/droplet/atg.service.cache.Cache caches may be over-utilized
- Load balancing could be not working properly, which would result in more sessions than normal hitting one instance.

While thread pauses are a normal part of garbage collection, excessive pauses must be minimized.

E) Cron Jobs:
Check the running cron-jobs on the server and try to only setup them to run during non-load hours of the servers.

F) Profiling Tools:
-Netbeans Profiler
-JProfiler
-Eclipse TPTP
Any other profiling tools that give you detailed information about consumed in any operation and memory tracing ,...etc.

G) Load testing Tools:
*) URLHammer (ATG load tool):
To run the URLHammer program:
1. Set your CLASSPATH to include the directory /DAS/lib/classes.jar.
2. Run the following command:
java atg.core.net.URLHammer [arguments]
For example:
java atg.core.net.URLHammer http://examplehost:8840/ 5 10 -cookies
This creates five different threads, each of which represents a separate session that requests the specified
URL 10 times (50 requests total).

You can also run a script by editing the format yourself or by using RecordingServlet to create it..

*) Apache JMeter (open source)
*) HP Load Runner (commercial)

H) Client Side Performance Tools:
-Firebug (FF plugin)
-Fiddler (Standalone or plugin)
-DynaTrace (Standalone or plugin)
The most important is to identify if a certain resources (esp outside your domain) is taking much of the time.
Invalid configuration also could be detected , pointing to another env.
JS performance might be a reason for end-user bad performance.

I) Operating system performance Tools:
Monitoring System Utilization Use a program like top (on Solaris), the Windows Performance Monitor, or a more sophisticated tool to keep track of information like:
• CPU utilization
• paging activity
• disk I/O utilization
• network I/O utilization
*If you are getiing Can't create native thread exception , you should know that your main system memory is low , as each native thread reserve around 1 MB of memory for its stack trace.
*You can detect a file descriptor leak in two different ways:
• You may notice a lot of IOExceptions with the message “Too many open files.”
• During load testing, you periodically run a profiling script, such as lsof (on UNIX), and you notice that the list of file descriptors grows continually.

J) DB performance tuning:
- Take DB snapshots and analysis them.
- Take the most consuming sql and do execution plan for them, possibly you could find a missing index.
- You may need a DBA to analyze possible DB issue.
- Enable JDBC logging and retrieve the SQL queries and try to optimize them outside the application (debug level need to be set to 15, you can also retrieve this queries from DB monitor tools)

K) Adjusting the FileCache Size :
ATG’s servlet pipeline includes servlets that are used for JHTML pages, and which use a FileCache component to store files that ATG has read from disk, so that subsequent accesses for those files can be delivered directly from memory instead of being read from disk. Using the FileCache component improves performance by reducing disk accesses. For maximum performance, you want the FileCache to be large enough to hold all the files that ATG serves frequently.
Set the totalSize property of this component at:
/atg/dynamo/servlet/pipeline/FileCache
to an appropriate value, measured in bytes, such as the following:
# size in bytes (2 million bytes)
totalSize=2000000

One approach in sizing the FileCache is to batch compile the entire document root and set the file cache to the resulting size. Make sure, however, that you account for the size of your FileCache when
you set the size of your JVM. You can preload the FileCache by creating a script that accesses every page on your site and running the script on startup.
You can view statistics on how the file cache is used, as well as the contents of the FileCache in the Dynamo Administration page at
hostname:port/dyn/admin/nucleus/atg/dynamo/servlet/pipeline/FileCache.

L) Code optimization:
You may scan the code in the high transaction scenarios, with special care to the pipelines and component scopes to identify possible performance issues..
You may use tools for code optimizations.
Some finding may need changes to component scope (the global is best performing being loaded only once, while the request scoped is initialized per each request, so you need to minimize it)
Avoid resolving the components from the code, instead using ATG property files to inject them , in case you do not have that ability like derived properties , you may have static method to get a reference to them (if global components).
Follow the code standards and best practices..
Once possible reason for application bad performance is not following the logging best practices by checking the log level before logging the message.

M) Use Cache Droplet in your JSP pages:
Cache Droplet caches content that changes infrequently used especially if it includes a lot of processing or DB interactions (Component /atg/dynamo/droplet/Cache)
**Required Input Parameters key
Lets you have more than one view of content based on a value that uniquely defines the view of the content. For example, if content is displayed one way for members and another for non-members, you can pass in the value of the member trait as the key parameter.
**Optional Input Parameters
-hasNoURLs
Determines how cached URLs are rendered for future requests. By setting hasNoURLs to false, you specify that subsequent requests for the cached content causes URLs to be rewritten on the fly, assuming URL Rewriting is enabled. A setting of false for hasNoURLs causes URLs to be saved and rendered exactly as they are currently (without session or request IDs) regardless of whether URL rewriting is enabled.

-cacheCheckSeconds
The interval after content is cached until the cached is regenerated. If omitted, the interval is set from the defaultCacheCheckSeconds property in the Cache servlet bean’s properties file.

**Open Parameters output
The code enclosed by the output open parameter is cached.

**Clearing the cache:
You can determine how often data is flushed for a given Cache instance on a JSP or for all instances of Cache. To remove cached content associated with a particular instance of Cache, set the cacheCheckSession input parameter in the Cache instance to the frequency by which associated data should be expired. If you omit this parameter, the Cache.defaultCacheCheckSeconds property is used (default value is 60 seconds) .
The Cache.purgeCacheSeconds property determines how often content cached by any Cache servlet bean is flushed. The default is 21600 seconds (6 hours). Cache purging also occurs when a JSP is removed or recompiled.

N) Hardware limited capacity:
This case for development environment where you may consider moving one application outside this box, like the DB , Merch , ...etc..
Also you can shutdown the Merch if you do not need it up and running all the time.
Another trick -strictly for Dev box- is decreasing session timeout into 5 minutes or decrease the thread reserved memory usage.


O) ATG recommended Check List:
The following checklist can help you identify the most common sources of performance problems:
• Have you properly configured memory for your Java Virtual Machines? Have you set your -Xms and -Xmx arguments the same? Do all ATG heap sizes fall within the limits of physical memory?
• Has one or more servers stopped responding? There could be a number of causes, including a Java deadlock.
• Are you seeing many IOExceptions with the message “Too many open files”? You may have a file descriptor leak.
• At maximum throughput, look at the CPU utilization, database CPU utilization, I/O activity, and paging activity.
• If CPU utilization is low, then you may have an I/O or database bottleneck.
• If CPU utilization is high, then the bottleneck is most likely in the application code. Use a performance profiling tool to try to locate bottlenecks in the code. Review your code to make sure it uses good Java programming practices.
• If paging is occurring, adjust the memory allocated to your Java Virtual Machines.
• Look at the I/O and CPU utilization of the database. If utilization is high, database activity is probably slowing down the application.
• Are you receiving page compilation errors? You may not have enough swap space for page compilation.


Reference: ATG Platform documentation set : Version 9.1 - 7/31/09 and others..
For More information, refer to Java EE 7 performance tuning and optimization book: The book is published by Packt Publishing: http://www.packtpub.com/java-ee-7-performance-tuning-and-optimization/book

1 comment:

  1. We are seeing the following warnings in our logs:
    10.10.13.133 /ATG/jboss/jboss-as/server/prod-ps-ps1/log/server.log 2013-02-07 09:49:20,524 WARN [nucleusNamespace.atg.dynamo.servlet.pagecompile.DAFDropletEventServlet] (ajp-0.0.0.0-8109-110) Added 29057 form elements to form shop/giftList/viewGiftList.jsp.addItemsToCart last element has name /project/commerce/giftList/GiftListFormHandler.giftItemValueObject.quantity.27025374. Form elements are registered permanently so if this table keeps growing, it is a memory leak. Recode your page to use consistent names for form elements.

    Could u give some pointers

    ReplyDelete