Why we need to collect and benchmark gateway profiles
Any given piece of software running on any operating system will use a certain amount of resource from that OS. As its state, configuration and the load it is put under changes so that resources foot print will change. In addition to the OS resources, the software may also expose a number of metrics which highlight its internal state, and any application specific resources it is using and needs. For example a JVM uses the memory and CPU of the OS, white it also exposes dozens of metrics that show the internal state of the Virtual machine. Applications that run low, or run out of the resources will ultimately fail, or act unpredictably, a situation that the end user will perceive as defective.
The use of the resources is normal for software, it should have access to what it needs, and we should be able to understand what that normal usage looks like. This is the purpose of the operational profile system detailed on this page.
A step in the right direction would be to have better tools and support for what the normal profile for any given (specific instance) of a ITRS component looks like. Then we can:
- We can start collecting data for specific software components, in given configurations in given environments.
- We could identify the effect on the software of any given change on the software binaries, configuration, load and so on.
- We could identify if the environment that they are running the software in is fit for purpose (I.E. has it got enough resources, or is it resource staved?)
The solution provided here is a template for the building of the Standard Operating profile. Guidelines exist for understanding the content and recommendations for specific conditions (including alerts when known issues are likely to occur, or have occurred). While this may not solve the recommendation challenges, it should help identify the health of a component and start building a real world profile of our software components resource usage under specific conditions, and identify for anyone using the template the effect of any given change to the software (be it internal or external factors).
|
Getting the gateway operational Profile
The following allows you to measure the operating profile of a gateway
Prerequisites
- Gateway and Netprobe versions must be GA3.6.0 or later
- Have the ability to add samplers to at least one Physical and one Virtual Netprobe on the component host (for example the server that the gateway is running on)
- A working database logging for the target gateway.
- The ability to add an include file and modify the database logging on the target gateway.
Setup files
You will need the following file/s
Step-by-step installation instructions
- Open the target gateway in the GSE
- Install the Standard Operating Profile Include file
To create an include file (doesn't require host access):
- In the gateway settings Include Section right-click and select "New Include"
- Name it "operating_profile.xml", and a priority higher than 1 to avoid conflict with the main file(e.g. 999)
- Click Load include file. If prompted to Create a new one, select yes
- In XML mode, copy and paste the contents of "operating_profile.xml" (See link above) to the new include
If you have access to the host command line:
- Copy the include xml to the host, in your desired path
- In the gateway settings, copy that file name and path in a new include file
- Edit Database Logging Connection details
- Go to the Database Logging section and tick the "Enabled" checkbox to enable Database Logging
- Fill in the appropriate database connection details
- Important: Replace the probe (ChangeMe) used by entity "Host Hardware Profile" to a physical probe running in the gateway's host. You only need to do this once per host, see step no. 6 for multiple gateway setups.
- Edit the Gateway Process details (Optional, only if gateway process results in a count other than 1); The gateway name is included in the gateway process search string.
- Go to the Standard Operating Environment (in the operating_profile.xml include file, "Environments" section)
- Replace the "gateway_process" variable with the target gateway process string
- For multiple gateways running in a host, enable Gateway Sharing in the Imported data section by editing the hostname and port of the target gateway host where a physical probe is attached.
- Save your changes
- Verify that the Standard Operating Profile Entities appear with all the included data views (See below)
How the content will appear in the gateway?
Expected Data views
There shall be two Managed Entities namely "Standard Operating Profile" and "Host Hardware Profile", with Virtual and Physical Probes attached respectively. See below for the expected dataviews.
Figure 1.1 Expected View for Standard Operating Profile Managed Entity with Virtual Probe
Figure 1.2 Expected View for Standard Operating Profile Managed Entity with Physical Probe
Definition of Terms
Spec |
Unit Value |
Originating Sampler |
Description |
conflationTime |
conflation time / total (processing) time |
orbStats |
This is a measure of the time spent waiting for conflation to complete. Conflation means that the gateway would deal with the backlog of data queues by discarding out of date cell updates and only processing and publishing the latest cell values. Conflation works best when it is preventing stale data from building up rather than clearing large backlogs (not only does it have fewer backlogged messages to process, but it minimises the amount of updates conflated away). |
cpuUtilisation |
percent Utilisation of the Host |
hardwareProfile |
A measure of the total CPU utilisation of the Host |
dbLogging |
dbLogging time / total (processing) time |
gatewayComponents |
Ratio of cpu time spent on dbLogging against the total cpu processing time, time units vary per platform. |
directory |
directory time / total (processing) time |
gatewayComponents |
Ratio of cpu time spent on directory related operations against the total cpu processing time, time units vary per platform. Includes constructing and modifying the state tree among other tasks. |
freeSpacePct |
sum of free disk space / total disk space |
diskProfile |
Percentage of total free disk space on all mounted partitions in the host. Some partitions are excluded (see operating_profile.xml "diskProfile" sampler for the list of excluded partitions) |
maxDataAge |
headline max data age in milliseconds |
probeData |
The maximum age of backlogged updates (as displayed by the probeData plugin). Normalised to milli seconds. |
memoryAvailablePct |
memoryAvailable / totalPhysicalMemory |
hardwareProfile |
Total available memory of host to all applications.
|
messagesQueued |
sum of mesages queued in the gateway |
connectionStats |
Total count of all messages made to all connections to the gateway, uniquei IP address and port. |
percentCPU |
gateway CPU usage percentage of total |
gatewayProcess |
Percentage of total CPU used up by the gateway as reported by the process sampler. |
percentMemory |
gateway Memory usage percentage of total |
gatewayProcess |
Percentage of total Memory used up by the gateway as reported by the process sampler. |
probeManagement |
probeManagement time / total (processing) time |
gatewayComponents |
Ratio of cpu time spent on probe management against the total cpu processing time, time units vary per platform. Includes establishing and maintaining communication with Netprobes. |
queueMem |
sum of Mem (in KB) |
connectionStats |
Total memory spent in processing all the messages in the queue with respect to unique connections of the gateway |
roles |
roles time / total (processing) time |
gatewayComponents |
Ratio of cpu time spent on roles against the total cpu processing time, time units vary per platform. Includes the time spent on Hot-Standby functionality. |
rules |
rules time / total (processing) time |
gatewayComponents |
Ratio of cpu time spent on rules against the total cpu processing time, time units vary per platform. |
schema |
schema time / total (processing) time |
gatewayComponents |
Ratio of cpu time spent on schema validation and changes against the total cpu processing time, time units vary per platform. |
swapUsed |
swapUsed percentage of the Host |
hardwareProfile |
Percentage of total swap Memory used by the OS. |
Viewing the data via a dashboard
When the operational profile data is displayed in the data views it will just be as values at that point in time. The primary motivation and benefit however is to see the profile of the gateway over time, from busy trading days to calmer weekends. Assuming you have not changed the names of the managed entities, samplers and data views then you can use the following to get that historical data quickly and easily.
- Download the dashboard adb file ( ADB file) and via the Active Console, use the file -> import function (selecting either the default or target dashboard dockable of your choice).
- ENSURE YOU ARE CONNECTED ONLY to the gateway you want the stats for. You may need to disconnect your other gateways while performing step 3, though can can reconect after once you are looking at the data.
- On the imported dashboard right click and select 'Repopulate all charts with historic data --> Week' (or day etc based on your requirement). This will populate the all the charts with a common time period of data, An example of which is shown below
The scrollbars at the top of the charts can be manipulated to adjust the time period that you can see the data for (so you can zoom in on specific events).
|
Comments
0 comments
Please sign in to leave a comment.