Oracle Intelligent Agent Users Guide
Release 8.1.5

A67825-01

Library

Product

Contents

Index

Prev Next

3
Troubleshooting

This chapter covers generic troubleshooting strategies in the event your Intelligent Agent does not function properly. The following topics are discussed:

Troubleshooting the Intelligent Agent

Under most circumstances, the Intelligent Agent itself requires very little in the way of configuration. In order to function properly, however, the Agent must be able to communicate with the managing host and managed services. If you are familiar with Oracle and your operating system, using the following abbreviated checklists will likely solve problems that can interfere with Agent operation.

Quick Checks

The following checklists cover the areas most likely to affect Agent operation. Agent troubleshooting checklists have been divided according to the two most common platforms on which the Agent is run: Windows NT and UNIX. The Checklists are abbreviated and assume knowledge of both Oracle, the operating system, and related communication protocols. Specific troubleshooting procedures are covered in detail later in this chapter.

Quick Checks for the Windows NT Agent

If you are running an Agent on a Windows NT system, use the following checklist.

  1. Make sure the agent service is up by checking the OracleAgent service in your control panel. If the agent did not start up, use any of the following hints listed below.

  2. Check for messages written to the NT Event Viewer (under Administrative Tools) since this is where the NT agent writes any problems associated with startup.

  3. Check if snmp_ro.ora, snmp_rw.ora, and services.ora are created by the agent on startup. snmp_ro.ora and snmp_rw.ora are in the ORACLE_HOME\NET80\admin directory, and services.ora is in the ORACLE_HOME\NET80\agent directory.

    Compare the services listed with the services which are available on the machine. Please refer to Appendix A, "Configuration Files" for valid sample files.

    If services are missing, check the following files for inconsistency or corruption:

    • listener.ora

    • tnsnames.ora

  4. Check that you do not have a system path set to external drives.

    The agent is a service and runs by default as SYSTEM. It also needs DLLs from the ORACLE_HOME/BIN directory. If you need mapped drives in your path, you MUST NOT set them in the SYSTEM path.

    To set your own path:

    1. Move mapped drive paths out of SYSTEM path variables and into your own.

    2. Reboot to "unset" the systems path.

  5. Check if you have TCP/IP installed. TCP/IP is a requirement.

  6. If you still do not know why the agent did not start, trace the agent.

    1. Set the following variables in snmp_rw.ora:

      nmi.trace_level=admin (or 16 if you want maximum information)

      nmi.trace_directory=<any directory in which the Oracle user has write privileges>

    2. Restart the agent.

    3. Check the log files located in the ORACLE_HOME/NET80/LOG directory.

      NMI.LOG should show general agent problems.

      NMICONFIG.LOG should show problems with auto-discovery.

  7. Ensure that the DNS Host entry is set to the node name in the listener.ora and tnsnames.ora files.

    1. Run the start button-> settings-> control panel-> network-> protocol-> TCP/IP properties.

    2. Check the DNS Host entry. For example, make sure that the entry is not set to the name of the previous engineer.

  8. Turn on tracing for the daemon.

    1. Open net80/admin/sqlnet.ora and add the lines daemon.trace_level=13 and daemon.trace_directory=e:\orant\net80\trace.

    2. Close the console to stop the daemon.

    3. Open the console to restart the daemon in trace mode.

    4. Submit a job and view the daemon.trc file for daemon and console problems.

Quick Checks for UNIX Agents

If you are running an Agent on a UNIX system, use the following checklist.

  1. Make sure agent listener is working. Enter the command:

    lsnrctl dbsnmp_status
        
    
    
    

    If your agent is running, you should see something similar to the following:

    LSNRCTL for Solaris: Version 8.1.3.0.0 - Production on 04-NOV-98 
    18:44:15
    
    (c) Copyright 1997 Oracle Corporation.  All rights reserved.
    
    The db subagent is already running.
    
    
  2. Check the ORACLE_HOME/NETWORK/log/dbsnmp*.log file for errors on UNIX.

  3. Check that the Oracle user has write permissions to ORACLE_HOME/AGENT/LOG as well as ORACLE_HOME/NETWORK/AGENT.

  4. Check snmp_ro.ora, snmp_rw.ora, and services.ora for the entries created by the agent. snmp_ro.ora and snmp_rw.ora are in the ORACLE_HOME/NETWORK/ADMIN directory, and services.ora is in the ORACLE_HOME/NETWORK/AGENT directory.

    Compare the services listed with the services which are available on the machine. Please refer to Appendix A, "Configuration Files" for valid sample files.

    If services are missing, check the following files for inconsistency or corruption:

    • listener.ora

    • tnsnames.ora

    • oratab

  5. If you still do not know why the agent did not start, trace the agent by setting the following variables in snmp_rw.ora:

    • nmi.trace_level=admin (or 16 if you want more information)

    • nmi.trace_directory=<any directory which the Oracle user can write to>

    • nmi.trace_file=agent

  6. If you have upgraded the database software and one of your machines is having problems with the generated snmp_ro.ora, snmp_rw.ora or services.ora file, follow the instructions below:

    1. Run catsnmp.sql under the INTERNAL or SYS account (NOT the dbsnmp account). Normally the catsnmp.sql script is run from catalog.sql upon database creation but since this is an upgrade, you may not have run this script yet. If the necessary scripts have not been run, the dbsnmp account is not created.

    2. If you have more than one SID or older SIDs referenced in the oratab, run catsnmp.sql against each of the databases.

    3. The snmp_ro.ra file is a read only file which means that all changes to the file will be overwritten each time the agent is started. You can make changes (if needed) to the snmp_rw.ora file.

    If you are trying to do backups, you must run backupts.sql with the dbsnmp/dbsnmp account.


Warning:

Please do not modify the Tcl scripts (job and events scripts written in Tool Command Language) that come with the agent. If you want to submit a job different from the ones that are predefined with the agent, use the TCL Job where you are allowed to pass in arbitrary scripts and have the agent execute them.  


Questions and Answers

If after going through the troubleshooting checklists your Agent still is not functioning correctly, use the following section to cover other areas of Agent operation that are less probable causes of Agent operating problems. In addition, many of the steps in the checklists are covered in greater detail for those users who may be less familiar with Oracle and/or the operating system on which the Agent is running. The following questions are coverered in this section:


Note:

You do not need to remove all ".q" files from the $ORACLE_HOME/network/agent directory in order to debug the Agent. Although this approach was recommended in the past, troubleshooting more recent versions of the Intelligent Agent no longer requires this action. There are exceptions to this rule, which will be pointed out later in the chapter.  


Is TCP/IP configured and running correctly?

One of the most common problems that prevents the Agent from starting is TCP/IP configuration. To check whether your TCP/IP setup is configured correctly, issue the following commands at the command line:


Note:

To determine the hostname of a Windows NT system, type "hostname" at a command prompt.  


Correcting TCP/IP configuration problems

  1. (Windows NT) Edit the WINNT\system32\drivers\etc\hosts and lmhosts files.

    If these files have never been used, only sample files will exist in the directory. Either rename or copy the .sam files to just the file name with no extension.

    (UNIX) Log in as root and edit the /etc/hosts file.

  2. Verify that the IP address and host information for each system are correct.

    Example: (Windows NT)

    (Replace the information in brackets with the actual host information for that system.)

    HOSTS file: 
            <122.111.111.111>   <hostname>
    
    LMHOSTS file: 
            <122.111.111.111>   <netbios name or hostname>  #PRE
    


Note:

You can also verify this information through the Windows NT Control Panel -> Network property sheet.  


  1. Delete the $ORACLE_HOME\network\agent\*.q and services.ora files.


Note:

The *.q files contain information about current jobs and events. Do not delete these files without first removing all jobs and events registered against this Agent.  


  1. Delete the $ORACLE_HOME\network\admin mp_ro.ora and

        $ORACLE_HOME\network\admin mp_rw.ora files. 
    

  1. Restart the agent.

Do the DNS Name and the Computer Name Match? (Windows NT)

Before Release 8.0.4 of the Agent, the NT Agent required the DNS Hostname and the Computer Name to be identical. These parameters can be checked/changed from the following Windows NT Control Panel property sheets.

To verify the computer name:

To verify the DNS Name:

Are the Net8 configuration files correct?

In addition to proper network configuration, which allows nodes in your network to communicate, components of your Oracle environment must also be able to communicate with each other. Net8 provides the session and data communication medium between client machines and Oracle servers, or between Oracle servers. For this reason, proper Net8 configuration is a prerequisite for Agent communication. This section covers the most common problems that can occur when Agent communication fails.

Net8 configuration files are found in $ORACLE_HOME/Net80/admin, or $TNS_ADMIN (Windows NT) or $ORACLE_HOME/network/admin (UNIX).

Primary configuration files are:

See Appendix A, "Configuration Files" for information and examples of the above files.

TNS_ADMIN variable usage during Agent Discovery

(UNIX)

All versions of the Unix discovery script allow the use of the TNS_ADMIN variable to locate input files (listener.ora and tnsnames.ora). Only post-7.3.3 versions of the Agent correctly write the output files (snmp_ro.ora and snmp_rw.ora) into TNS_ADMIN, if set.

(Windows NT)

Beginning with version 8.0.5, the discovery script also reads the TNS_ADMIN value from the NT Registry.

The Agent also uses the TNS alias information found in the listener.ora file. The Agent does so even within an Oracle names environment. This behavior is intentional since an Oracle Names server may be temporarily unavailable and the Agent needs to be able to resolve names at all times. Check the following to make sure the local translation of the TNS alias takes place:

  1. Verify that the listener.ora file has the following for each instance:

    • Two IPC entries

    • One TCP entry

    Do not activate the listener on port 1748, since agent is listening on this port. (This is the reason you can use TNSPING against the agent; TNSPING cannot differentiate between a listener and an agent)

    The agent requires IPC entries and TNS alias definitions on the server, in addition to alias definitions from the console, to perform alias translations. This correct IPC entries and TNS alias definitions are essential for correct Agent/Console (V1) or Agent/Management Server (V2) communications.

  2. Ensure that the DNS Host entry is set to the node name in the listener.ora and tnsnames.ora files.

    1. From the Windows NT menu bar, click Start -> Settings -> Control Panel

    2. Double-click on the Network icon

    3. Click on the Protocols tab

    4. Select TCP/IP Protocol and click Properties.

    5. Check the DNS Host entry.


Note:

When using the 7.3.3 Oracle Intelligent Agent on a Windows NT system that has 2 NIC cards, create only one service descriptor in the tnsnames.ora containing the IP address of only one of the NIC cards. Do not create separate service descriptors for each NIC card and do not put both IP addresses in the address_list of the single service descriptor.  


Is Net8 functioning properly?

If your Net8 configuration is correct and you are still unable to contact the Agent, the next step is to determine whether services in your Net8 network can be reached. You can use the TNSPING utility on each database you want to access by entering the following at the command prompt:

tnsping <network service name>

If you can connect successfully from a client to a server (or from a server to a server) using TNSPING, the command will return an estimate of the round trip time (in milliseconds) it takes to reach the Net8 service. This indicates Net8 is functioning properly.

Next, add the following alias (agent debug entry) to the Console's tnsnames.ora file:

        agent_<sid>.world= 
           (DESCRIPTION = 
               (ADDRESS_LIST = 
                   (ADDRESS = 
                       (COMMUNITY =TCP.world) 
                       (PROTOCOL = TCP) 
                       (Host = <your-agent-hostname>) 
                       (Port = 1748) 
                   ) 
               ) 
           )

Then ping the agent from the OEM console using:

tnsping agent_<sid>

or

tnsping80 agent_<sid> 

If the TNSPING command does not work, add the above alias to the agent machine's tnsnames.ora file and try using TNSPING from the machine on which the agent resides. Every agent must be TNSPING-able using this alias.

Did the Agent startup successfully?

Check whether the Agent process is running:

UNIX Agents

From a command prompt type:

lsnrctl dbsnmp_status
     

The status returned should read:

The db subagent is already running
Windows NT Agents

  1. From the Start menu, select Settings-->Control Panel

  2. Double-click on Services

  3. Verify that the OracleAgent service has been started.

If the agent did not start up, use any of the hints listed in the following table:

Table 3-1 Troubleshooting an Agent that Will Not Start
UNIX  Windows NT 

Check the

$ORACLE_HOME/network/log/dbsnmp*.log

file for errors  

Check for messages written to the NT Event Viewer (under Administrative Tools) since this is where the NT agent writes any problems associated with startup.  

Check the

$ORACLE_HOME/network/log/nmiconf.log

file for errors.  

Check the

$ORACLE_HOME/network/log/nmiconf.log

file for errors.  

Check that the Oracle user has write permissions to the following directories:

$ORACLE_HOME/agent/log

$ORACLE_HOME/network/agent  

Check the properties of the Agent Service to verify the OS account used by the agent (default is 'System') Check that the Agent user has write permissions to the following directories:

$ORACLE_HOME/agent/log

$ORACLE_HOME/Net8/agent  

Check snmp_ro.ora, snmp_rw.ora, and services.ora for the entries created by the agent. The snmp_ro and snmp_rw.ora files are located in the $ORACLE_HOME/network/admin directory, and services.ora is in the $ORACLE_HOME/network/agent directory.  

Check if snmp_ro.ora, snmp_rw.ora, and services.ora are created by the agent on startup.The snmp_ro and snmp_rw.ora files are located in the $ORACLE_HOME\network\admin directory, and services.ora is located in the $ORACLE_HOME\network\agent directory.  

Compare the services listed with the services which are available on the machine. See Appendix A for valid sample files. If services are missing, check the following files for inconsistency or corruption:

  • listener.ora

  • tnsnames.ora

  • oratab

 

Compare the services listed with the services which are available on the machine. See Appendix A for valid sample files. If services are missing, check the following files for inconsistency or corruption:

  • listener.ora

  • tnsnames.ora

 

Check if you have TCP/IP installed. TCP/IP is a requirement. See Is TCP/IP configured and running correctly?  

Check if you have TCP/IP installed. TCP/IP is a requirement. See Is TCP/IP configured and running correctly?  

If you still do not know why the agent did not start, turn on tracing. (see Tracing the Intelligent Agent)  

Check that you DO NOT have a systems path variable containing external drives. The agent is a service and runs by default as SYSTEM. It also needs DLLs from the $ORACLE_HOME/bin directory. If you need external mapped drives in your path, you MUST NOT set them in the SYSTEM path. To set your own path:

  1. Move external mapped drive paths out of systems path variable and into your own.

  2. Reboot to "unset" the systems path.

 

If you still do not know why the agent did not start, turn on tracing. For more information on setting up Agent tracing, see "Tracing the Agent")  

Did the Agent connect to ALL instances on its node?

To test whether an Agent can connect to the database(s) it monitors on a given node, try connecting to each database with the following connect string:

dbsnmp/dbsnmp@address_list 

You must perform this test on the node where the Agent resides.


Note:

Agents prior to 7.3.3 maintain two permanent connections to its local databases. Post 7.3.3 Agents maintain only one permanent connection.  


(UNIX) Is the agent running with the correct permissions?

To verify whether the Agent has the correct user permissions, see "Installing the Oracle Intelligent Agent on UNIX" .

(Windows NT) Does the OS user exist and does it have the correct permissions?

An OS user needs to be specified for the node and must have the following permissions:

Are you still using a 7.3.3 or earlier agent?

Proper operation of the Oracle Enterprise Manager Job and Event systems requires you run version 7.3.4 or later of the Intelligent Agent. Running a 7.3.3 or earlier version of the Agent will limit available Job and Event system functionality.


Important:

It is highly recommended that you upgrade 7.3.3 or earlier agents to 7.3.4 or later versions.  


Are there errors?

(Windows NT) Check the NT EVENT VIEWER -> APPLICATIONS -> LOG for any errors starting the DBSNMP process.

(Windows NT and UNIX) Check the $ORACLE_HOME/network/log/nmiconf.log file for discovery errors.

Why doesn't the Agent send status notifications back to the Enterprise Manager Console even though the jobs have run?

Most likely the job does actually run, but the agent is unable to contact the console to send back notifications. Verify that hostname resolution can occur. Verify that the IP and hostname of the Windows NT machine running the console is in the /etc/hosts file on the Unix box or the hostname can be resolved via DNS/NIS. Retry the job.

To test the TCP/IP resolution, perform the following tests from a command prompt:

ping <hostname> 
ping <ipaddress>

If the server is running telnet or ftp services(UNIX):

telnet <hostname> 
ftp <hostname>

Since PING uses IP and not TCP, it is a good way of determining if the problem is in the packet routing.

To determine if the problem is actually with TCP, use the telnet or ftp utilities.

Alternatively, you can perform the following:

  1. Be sure the name and IP address of the OEM console machine is in the /etc/hosts file on the Sun server, otherwise the agent is not able to return messages to the console because it can not resolve the name of the machine to an IPADDRESS.

  2. Open the Oracle Enterprise Manager's Daemon Manager, under CONFIGURATION PARAMETERS and specify the LISTENING ADDRESS parameter to contain the full IP address of the console machine. This forces the agent to use the IP address in order to contact the console machine.

The default listening address (TNS format) is:

LISTENING ADDRESS = (ADDRESS=(PROTOCOL= TCP)(Host=machine_name)(Port=7770)) 

If a job stays in the scheduled status, repeatedly delete it using the DEL key. Restart the job. Sometimes it takes several submits until it starts up a delay of up to a minute until a job starts is common, especially the first time an agent tries to sync with the OEM console with old agents (7.3.2)

Intelligent Agent Startup Problems and Solutions

The following section covers specific problems, situations, and errors that may be encountered while trying to start the Intelligent Agent.

Generic Agent

Multiple Listeners

The Intelligent Agent currently does not support multiple listeners for a single database on one machine. The services.ora contains the information that is used to communicate discovery information from agent to daemon. It does not support multiple listeners for a single database. Due to a limitation in the discovery of the services, there can be only one listener present on the machine per database you wish to monitor. If two listeners are listening for the same database, the agent returns errors, or refuses the discovery.

Agent consuming too much memory

Prior to Versions 7.3.4, 8.0.3.1.1 on NT, and 8.0.3 on Solaris, the agent had a memory leak, causing it to use more and more memory. This leak occurs if an alias/service is specified which the agent cannot contact (i.e.: database is down, listener is not started, etc.). Each time the agent tries to contact this service, the memory associated with this request is not freed. It also loses handles with the events dbprobe and listenerupdown.

Upgrade the Agent. If this is not possible, make sure only running services are monitored. Verify the memory usage of the Agent. If it becomes too high, stop and start the Agent.


Warning:
  • Do not modify the supplied TCL scripts.

The Tcl (Tool Command Language) scripts supplied with the Intelligent Agent are used with the Oracle Enterprise Manager Job and Events system. If you want to submit a job different than the ones that are predefined with the Agent, use the TCL Job where you are allowed to pass in arbitrary scripts and have the agent run them.  


Not all services are discovered.

Check the services.ora file to determine which services have been discovered.

All the services the Agent finds on a machine, must be defined in the relevant SQL*Net/Net8 configuration files. If the service(s) are not defined, service discovery will fail and, in the worst case, the agent will hang or return errors.

Windows NT: Beginning with version 8.0.4, the agent searches for service names that begin with 'OracleService' or 'OracleService<SID>'. Every entry beginning with 'OracleService' is considered to be a database running on this machine. Every SID encountered by the Agent must be defined in the relevant SQL*Net/Net8 files.

UNIX: The oratab file is used to determine which SIDs are present. For 7.3.3 Agents and earlier, discovery fails if it encounters a SID that is not accurate (like in a Developer 2000 environment). To work around this problem, the environment variable $ORATAB can be used to access an alternate oratab file which contains only the databases you wish the agent to see.

For the remaining databases, check the oratab file, and the SQL*Net/Net8 files to see if these files exist and that all definitions are present. Make sure that all of the databases are listed in the listener.ora file. For more information, see "Are the Net8 configuration files correct?" and "Is Net8 functioning properly?" .

The agent doesn't start correctly anymore

If the agent already started previously, and now refuses to start correctly, it may be that something has changed in the environment. Usually, a good thing to try is to let the agent completely rediscover all its services again.

Delete the files snmp_ro.ora, snmp_rw.ora, and services.ora and restart the agent. If that does not fix the problem, remove those files and also delete the $ORACLE_HOME/network/agent/*.q files.

Warning: This deletes all of your jobs and events. Make sure and delete these from the Console first.

'Invalid service name' or 'File operation error' while registering a job or event.

This error is usually seen when the services on the console and the services discovered by the Agent are out of sync. For example, if you have an event registered against TESTDB and someone changes the name of the database to PRODDB, that Agent and Console are out of sync.

To fix this start by removing all job and event registrations from this service and dropping the node where the services exist from the console. Rediscover the node from the console using the auto-discovery wizard.

NOTE: With 7.3.2 the alias are case sensitive.

If you have a NT Agent please refer to 'Invalid service name' while registering a job or event.

The DBSNMP.EXE utilizes 100% of CPU on Winsiqa NT Enterprise Edition 4.0.

The problem may occur when the NT Service OracleServiceORCL is not running or not there (because the customer creates his own database, with a SID other than ORCL) and the Oracle Performance Utility (not the performance pack but the Oracle8 performance utility) is installed.

If you do not have a database on your machine with SID = ORCL you will experience this problem, as the Oracle Performance Utility has hard coded a BEQ connect descriptor referencing the ORCL SID, in the NT registry during installations. The location of the connect descriptor is (HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Oracle80\Performance). Check the value of Username, Password and Hostname.

Solution is to deinstall the Oracle8 Performance utility.

Oracle 8 database switches Redo-Log every minute when the version 8.x Agent is running

If you have jobs or events scheduled at a very low interval (30 seconds), this causes activity on your system. For example, if you have the event 'USER BLOCKS' registered, the agent checks concurrent/waiting locks by building temporary table, deleting the tables, building them again - for every check.

Solution: Lower your interval or disable logging on the underlying table.

DBSNMP ERROR: Multiple Listeners found

The fact that only one listener is running does not mean that other listeners do not exist! The agent finds all listeners that are configured (via listener.ora files), not just the ones running. Also, the agent does not restrict its search to the directory pointed by $TNS_ADMIN. It checks all possible locations for listener.ora files.


Note:

For UNIX machines, if more than one listener appears to be configured to serve the same SID (located in the oratab file), the agent picks the first one it finds, but also displays the message so the user is aware of this.  


If the listener picked up by the agent is not what you intended, correct the problem and restart the agent.

No snmp_ro.ora and snmp_rw.ora are generated.

This error can occur if the Agent cannot write to $ORACLE_HOME\network\admin. Refer to the $ORACLE_HOME\netowrk og\nmiconf.log for errors. For more information on Agent startup problems, see "Did the Agent startup successfully?".

'Failed to authenticate user' error when running a job

In order for the agent to execute jobs on a managed node, the following conditions must be met:

'Login denied', 'Invalid username/password' messages in trace files

This usually only happens if you have a databases prior to 7.3.3 on the machine. From V7.3.3 onwards, a script called CATSNMP.SQL is included in the CATALOG.SQL dictionary script. This script is responsible for creating the DBSNMP user the agent needs to connect. Older databases did not have this script yet.

Verify if the user 'DBSNMP' exists. If not, run the catsnmp.sql script.

'Transport read error' or 'Transport write error' messages

This indicates a problem with the TCP/IP layer. Most obvious cause for this is that the IP address and the hostname do not reference the same physical machine.

Verify that TCP/IP is configured and running correctly. (See Is TCP/IP Installed and Running Correctly)

'Oralogin failed in orlon'

You may receive this error while executing a TCL script using the oratcl verb oralogon through the Software Developer's Kit. "Oralogin failed in orlon" means that the connect string is either wrong or for some reason, the account used cannot logon to the database.

'Listener not found for SID' when starting Agent

When attempting to start the Intelligent Agent, the following error occurs: Listener Not Found for SID. The SID listed is always the last SID in the Oratab file. The Listener.ora and tnsnames.ora files contains valid TNS descriptors for the SIDs. The oratab file does not have invalid SIDs and all SIDs have a dbsnmp account. This has been fixed for Agent versions 7.3.4 and later.

The 7.3.3 nmiconf.tcl script parses the listener.ora file looking for uppercase: ADDRESS, SID_LIST_, SID_DESCRIPTION and SID_NAME. Change the parameters listed above to uppercase, and discovery works.

'ORACLE_HOME does not exist' when starting the Agent

This message comes from the discovery script, nmiconf.tcl. Make sure you have $ORACLE_HOME environment variable set to the ORACLE_HOME of the Agent and re-start the agent.

The agent is only finding one database on a certain node

If you have more than one database on a single node, then you need to make sure that each instance has a unique GLOBAL_DBNAME in the listener.ora. You may have to define this manually in the listener.ora.

NT Agent

'Failed to connect to agent' error.
Jobs that remain in submitted status

There are in fact two hostname definitions on NT: One NETBios one, used for the NT's internal Named Pipes protocol, which is always installed. The other is the TCP/IP hostname, which is only configurable when you install TCP/IP on NT.

To find the NT NetBios hostname:

To find the TCP/IP hostname:

On an NT server, you can 'ping' the two names, even if they are configured different. Other clients, however, only 'ping' real TCP/IP hostnames. If the Agent is using local IPC connections, it uses Named Pipes. Therefore the NetBios name, while all external connections will use the TCP/IP name.

A mismatch in these names leads to 'unable to contact agent', or forever pending jobs in the console. Therefore, make sure that the NetBios and the TCP/IP hostname are identical.

'Invalid service name' while registering a job or event.

If you have a 8.0.4 Agent, you may experience this problem. If you have a default domain other than ".world" the agent tries to append a ".world" to the database name during discovery. For example, if your default domain is nl.oracle.com and you define your GLOBAL_DBNAME = database.nl.oracle.com, the agent defines the database name to be database.nl.oracle.com.world. This problem only occurs when the Agent and Console reside on the same machine (they share the some configuration files).

The workaround is to append ".world" to all services that do not currently have a specified domain.

Agent finds no services after discovery

This problem has been fixed for Agent versions 7.3.4 and higher. For Agent versions 7.3.3 and lower, the following workaround can be used.

Check the listener.ora file, and make sure that no $ORACLE_HOME parameter is specified in the SID_LIST section. Specifying an $ORACLE_HOME in the SID_LIST section prevents the Agent from finding the requisite files for service discovery.

Agent crashes with Dr. Watson error

The problem may occur when the NT Service OracleServiceORCL is not running or not there (you may have created a database, with a SID other than ORCL) and the Oracle8 Performance Utility is installed.

If you do not have a database on your machine with SID = ORCL you will experience this problem, as the Oracle8 Performance Utility has hard coded a BEQ connect descriptor referencing the ORCL SID, in the NT registry during installations. The location of the connect descriptor is (HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Oracle80\Performance). Check the value of Username, Password and Hostname.

The solution is to deinstall the Oracle8 Performance utility.

If this does not fix the problem, try setting the variable automatic_ipc = off in the sqlnet.ora file, comment out all IPC addresses in the listener.ora file, and restart the agent.

Dr Watson : access violation in the application PLUS80.EXE

If you get this error while trying to run a SQL*Plus job, then you have encountered a SQL*Plus bug. If you start SQL*Plus on the command line and connect using the connect string INSTEAD of the alias as described in the tnsnames.ora ile you see the same thing, a crash of SQL*Plus. It had to do with the connect string size exceeding the allocated connect string buffer size. For more information, see ORA-12163: 'TNS:connect descriptor is too long'.

Define a connect string for your database in your tnsnames.ora file that is less than 255 characters. Then re-start your agent, and re-discover this node from the console.

The OracleAgent service hung on starting.

This may occur if there are externally mapped drives on the system path variable. For more information see Did the Agent startup successfully? .

You many also be encountering a problem that is specific to Intelligent Agent 7.3.3 and other versions of the Oracle database (i.e. 7.3.2) You may identify that this is happening by checking the Windows NT Control Panel -> Services dialog. The Oracle Agent shows that the status is started, but when you highlight it to shut it down, the Stop button becomes disabled, along with the Start, Pause, and Continue buttons.

You cannot install IA 7.3.3 with a 7.3.2 db on Windows NT because these two products use two different versions of the RSFs (7.3.3 and 7.3.2). Both these RSFs can not be installed together.

Please upgrade your database to 7.3.3 with SQL*Net 2.3.3. If you cannot do this, then you have to remove OEM 1.3.5 & IA 7.3.3, make sure you have SQL*Net 2.3.2 client/server/adaptors installed. Install IA 8.0.3 and use OEM 1.4. IA 8.0.3 uses the RSF 803 and NET8, so there ia no conflict with Oracle 7.3.x.

Receive the error failed -> 'output from job lost' while running job.

The Windows NT user that you created for the agent (see Agent Configuration, Configuration Guide) needs read/write permissions to the $ORACLE_HOME\net80\agent directory (and TEMP directory, for some applications) and read permissions to the SYSTEM32 directory

Verify that the NT user has these permissions.

Agent gives fatal NT error and service is uncontrollable

This happens if Oracle7 and Oracle8 are installed in the same ORACLE_HOME on NT. This is NOT a supported configuration. The agent tries to start, gives some sort of time-out error, saying that it failed to start, and then gets ina 'limbo' state, where you are unable to stop or start the Agent service anymore.

First start the Agent, then start the Oracle8 services

Agent appears to have stopped working after I run a job frequently

If you run a scheduled job every minute and receive the error on NT console "Event ID 2009, Number of sessions exceeded 2048" Verify that there are 2048 users logged in by viewing, NT-PROGRAMS-ADMIN TOOLS-WIN NT DIAGNOSTICS, CHOOSE THE NETWORK TAB. The Intelligent Agent is not releasing sessions when a scheduled job is finished running.

The workaround is to stop and start Agent every couple of hours.

or

Upgrade to the 8.0.4 agent

OS and ORA errors after upgrading to OEM 1.5

This occurs when you install OEM 1.5 on an NT with a 8.0.3 database. The required support files are upgraded from 8.0.3 to 8.0.4 during the install.

If you then try to start the agent, you will receive an error stack. If you reboot the NT machine, then things really start to act funny: No database, an OEM console but no Agent.

This is clear and documented: On NT, only the first two numbers are supported. Trying to install two version different in the third digit doesn't work. Only, in the number OEM 1.5, one is not aware that you are also installing 8.0.4 RSF's.

Unix Agent

Discovery fails with no services at all

First check that all of the SQL*Net files are present and correctly defined. You can then debug discovery by editing your oratab file contains only a valid SID with a listener running. After you get this working, you can add the remaining entries in the oratab file to see which entry is causing the problem.

Check the $ORACLE_HOME/network/log/nmiconf.log files for errors.

Auto-discovery will not work 'Failed to get service information for node <nodename>'

This is a well-known bug of Solaris 7.3.3 and has already been fixed in 7.3.4 and 8.0.3. You get this error if there are non-database entries in the oratab file. i.e. client side installs

You should also have received the following message when starting the agent:

                No listener found for SID <sid> 

There are two workarounds for this:

  1. Comment out these entries in the oratab when starting the agent.

  2. Make a copy of the oratab file and remove these entries. Then, set the $ORATAB environment variable of the user who starts the agent to this file.

When doing lsnrctl dbsnmp_start you get: NO LISTENER FOUND FOR <XXX>

This is problem with the Intelligent Agent 7.3.3. Your services.ora file is empty. When looking at the snmp_ro.ora file-the SNMP.VISIBLESERVICES is empty.

The "cannot find listener for XXXX" usually comes from a SID in the oratab file that does not have a listing in the listener.ora file. This means that the listener is not listening on this specific SID. Unless this is a SID that you are trying to connect the agent to, this is a warning and should not stop the discovery process for the other SIDs.

Please check if the case sensitivity is respected in the oratab and the listener.ora for all the SIDs, and make sure that the oratab file does not have SIDs with a * or any other prefix. An example of this problem is when an SID used for some other Oracle product that starts with a * and this stopped discovery.

Intelligent Agent Error Messages and Resolutions

Generic Agent

ORA-12163: 'TNS:connect descriptor is too long'

Copy the snmp.address.<host_name> parameter from your $ORACLE_HOME\network\admin mp_ro.ora file. Paste this address and parameter into your $ORACLE_HOME\network\admin mp_rw.ora file. In snmp_rw.ora, reduce the size of this connect string by removing the address entries for IPC. (NMP and SPX may also be removed.)

Shutdown/restart the agent. See examples below.


Note:

The parameter snmp.address in no longer found in snmp_ro.ora starting with the 7.3.4/8.0.3 Agents. Therefore, you will have to use this example to add a new variable to your snmp_rw.ora.  


EXAMPLES:

Entry to be copied out of snmp_ro.ora:

snmp.address.ORCL_MACHINE-PC = (DESCRIPTION=(ADDRESS_LIST 
=(ADDRESS=(PROTOCOL=IPC)(KEY=oracle.world))(ADDRESS=(PROTOCOL=IPC)(KEY=ORCL))(AD
DRESS=(COMMUNITY= TCP.world)(Host=machine-pc)
(PROTOCOL=TCP)(Port=1521))(ADDRESS=(COMMUNITY=TCP.world)(Host=machine-pc)
(PROTOCOL=TCP)(Port= 1526)))(CONNECT_DATA=(SID=ORCL)(SERVER=DEDICATED))) 

Modified entry in snmp_rw.ora:

snmp.address.ORCL_machine-PC = (DESCRIPTION=(ADDRESS_LIST 
=(ADDRESS=(COMMUNITY=TCP.world)(Host = machine-pc)(PROTOCOL= TCP)(Port= 
1521))(ADDRESS=(COMMUNITY= TCP.world)(Host = machine-pc)(PROTOCOL= 
TCP)(Port=1526)))(CONNECT_DATA=(SID=ORCL)(SERVER=DEDICATED)))

TNS-12542: 'TNS:address already in use'

This is actually a Net8 Listener error.

The following is documented in the 8.0.3.0.0 Intel NT release notes for the Net8 Listener. When a client connects to an Oracle8 server in dedicated server mode, WINSOCK2 Shared Sockets feature is used so that the client connection is routed from the listener to the database server. This feature improves the connection time, because the client does not need to close the socket connection with the listener and establish a new connection with the database server.

With the use of Shared Sockets, threads also use the same port as the listener. If you shut down the listener and try to start it up again for the same port, the listener does not start up if the port is in use due to any open connections with the database. Ensure that no client is connected to the database before starting up the listener. Note that if you are using a listener with a different port number you are able to start it up.


Warning:

Do not bring down the listener when any clients are connected to the database. If you need to listen for a new database, modify the listener.ora configuration file, and issue the reload command from the Listener Control Utility LSNRCTL80.  


See Oracle Networking Products Getting Started for Windows Platforms for more information about the listener.ora file and the LSNRCTL80 utility. Oracle Corporation attempted to overcome the restriction by using the WINSOCK2 option to allow the re-use of a port, but the option does not work reliably. Oracle Corporation is currently working with Microsoft Corporation to resolve this issue.

For additional information about the reload command, see the Net8 Administrator's Guide.

VOC-04816 'Invalid Destination'

While submitting a job, validation fails with "failed to find address for Agent_node". And then the VOC-04816 Invalid Destination. This might also be caused by an invalid address in the tnsnames.ora located on the console.

Upgrade your agent to at least 7.3.3. or later.

Verify that your SQL*Net configuration files are correct?

NT Agent

Any NT Operating System Error when starting the agent

If you see an OS error when starting the agent, check to see if it is actually an agent error as described in snmimsg.mc. Due to one of the Windows APIs not working as documented, the agent fails to print out the real cause of the error.

Use the Event Viewer in the Administrative tools group of Windows NT. You should find the true cause of the problem documented. The source for the agent errors are under the service name "dbsnmp". Highlight the most recent dbsnmp entry in the list. Double click on the event to get the actual results.

In order to debug the agent after you have received an OS error, follow the following steps:

UNIX Agent

NMS-004 When starting agent

If snmpd is running on the unix box, set the following in the server's /etc/snmpd.conf:

smux 0.0 "" <ipaddress of server> 

If the nms-4 error remains, turn on logging in the snmpd.conf file with the following parameter:

logging   file=/usr/tmp/snmpd.log   enabled 

The log file gives more information about what could be happening. For example, a space between the double quotes in the smux line can cause the application to misinterpret the space as a password. The double quotes by themselves - mean no password.

NMS-0308 : 'Failed to listen on address : another agent may be running'.

There are two possible causes for this error:

  1. If two agents are installed on a machine, in two different ORACLE_HOME, then you see this message if you try to start the second agent. This is because both agents try to listen the same default port #1748

Only have one agent on a machine.

  1. The port 1748 where the agent listens is being used by someone else, or is not being released by dead process that were formerly using it (unfortunately common problem on SUN) .

To confirm port is being used by someone else

  1. Use this command in UNIX

    netstat -a | grep 1748 
    

^---- this is port #

If any result shown on screen that ends in "LISTENING" then the port is in use.

  • If the following is true :

  • Then do this.

    --> get process numbers

    LSNRCTL> dbsnmp_start

    If it still fails to start the agent, go through steps again, but before re-starting the AGENT, do this.

    LSNRCTL> dbsnmp_start

    This definitely re-starts the agent, but you removed all of the job and events queues it was using in the past!

    If all else fails, re-booting the machine should definitely free up the port.

    You may also have to relink the agent to clear this problem. Please see the Oracle Enterprise Manager Configuration Guide and README for more information.

    NMS-001 while starting the Agent

    This message indicates that the SNMP master agent (the process on UNIX that controls the SNMP protocol) could not be contacted. By default the agent listens and works over SQL*Net, but the agent can also work over SNMP on UNIX systems. The SNMP interface is not yet build in the NT implementation.

    This message can safely be ignored unless you are trying to communicate with a Master Agent.

    NMS-205 while starting the Agent

    The 'dbsnmp' user could not be located.

    Run the catsnmp.sql script for that database with either the SYS or INTERNAL accounts.

    NNL-018 while starting the Agent.

    The agent tries to contact the Names Server, but can not get in contact with it. This can happen if a Names Server is indeed installed but not used.

    Add a line in the file snmp_rw.ora

    NMI.REGISTER_WITH_NAMES = FALSE
    

    NMS-351 while starting the Agent

    This happens if there mismatches between the ID's in the '*.q' files in the $ORACLE_HOME/network/agent directory. Delete all the '*.q' in the $ORACLE_HOME/network/agent directory. Rebuild your repository. Restart the agent.

    Tracing the Agent

    Beginning with 7.3.3, the agent reads information from the snmp_ro.ora and snmp_rw.ora files in the $ORACLE_HOME\network\admin directory.


    Note: These files only exist after you have started the agent the first time. If you want to trace the agent the first time it is started, you can manually create a new file called snmp_rw.ora and add the trace parameters to this file. Otherwise, start the agent and then modify the snmp_rw.ora file to add the trace information and restart the agent.  


    Example of modifications of the snmp_rw.ora file:

    NMI.TRACE_LEVEL = (OFF | USER | ADMIN | 16 )
    

    The NMI.TRACLE_LEVEL settings mirror those used for SQL*Net.

    Optional:

    NMI.TRACE_FILE = agent        Default=dbsnmp.trc 
    NMI.TRACE_DIRECTORY = C:\TEMP  Default=$ORACLE_HOME/network/trace
    

    (Any existing directory where the agent has write permissions)

    The log file, $ORACLE_HOME/network/log/nmi.log, is written by the agent on every startup, even if tracing is not turned on. It contains the name and version of the agent and the name and location of the agent's configuration files. If tracing is turned on, it also contains problems encountered with the database and listener connections.

    The log file, $ORACLE_HOME/network/log/nmiconf.log, is created on the first start up of the agent and appended to every time after that. The auto discovery is done by the Tcl script, nmiconf.tcl (hence, the log file name). This file is written to only during startup. $ORACLE_HOME/agentbin/ORATCLSH is a special-purpose TCL shell that supports all standard TCL verbs (supported in TCL75.dll) plus a large subset (not all) of the ORATCL verbs supported by the OEM Agent. ORATCLSH is not a general purpose utility and may only be used in combination with the OEM Agent as it depends on files and data structures maintained by the OEM Agent.

    There is no documentation of ORATCLSH and it has never been part of the supported feature set of the OEM Agent. It is provided strictly as a debugging tool to help Oracle customers and developers in developing OEM job and event scripts. The executable ORATCLSH is provided for debugging your TCL scripts. Before executing ORATCLSH, set the environment variable TCL_LIBRARY to point to $ORACLE_HOME/network/agent/tcl, the location of the init.tcl file.

    Tracing TCL

    You may also turn Tcl tracing on by setting the environment variable ORATCL_DEBUG and turning tracing on in the snmp_rw.ora file. The ORATCL_DEBUG must be set to the $ORACLE_HOME/network/trace directory. You must shut down and re-start the agent for these parameters to take effect. TCL tracing creates a file, oratcl.trc in the above location. Every time an event is run an entry is added to the oratcl.trc file.




    Prev

    Next
    Oracle
    Copyright © 1999 Oracle Corporation.

    All Rights Reserved.

    Library

    Product

    Contents

    Index