SmartMIB Professional Solution:

Network Element’s Fault Management


(SM-Pro-Fault)

(SM-ProUpd-Fault)

(SM-ProPlus-Fault)

Solution Introduction & Details Overview:

 

This solution presentation focuses on Fault Management based in turn on gathering device’s resources and events related details and statistics capabilities built into the operating systems of all router network devices which are essential in managing telecoms, service provider, and small to large enterprise networks.

 

It introduces a SOSL based management script to build a management process to monitor Faults using the SNMP Protocol as an interface to retrieve/poll Fault related data and statistics.

 

The goal of Fault Management is detect, isolate, report, and (whenever possible) automatically repair network problems as or before they occur. A successful Fault Management process would hence have to be a proactive process that collects and analyses data simultaneously and in real-time.

 

The first step, detection, can be thought of as an online process that gives indication of malfunctioning. Real-time detection mechanisms are usually implemented within the network protocols and devices. For example, protocol errors can be detected by finite-state machines. The second step, consists of fault localization and identification. Fault localization is typically achieved through algorithms that compute a possible set of faults while fault identification is done by testing the hypothetical faulty component(s). The last step, repair is achieved by taking corrective actions. This step may need equipment replacement, change of system configuration, or software removal of bugs.

 

To achieve the goal of Fault Management; thresholds needs to be set to trigger alarms and events whenever exceeded. Events are then to be collected, tracked and correlated to determine the final and possible Faults which in turn needs to be reported and tracked.

General Fault Management Process Data Characteristics

The following table describes data that can be applied to the Fault management process.

Data

Description

Buffer memory related data

Information that reports on the managed device’s buffer memory allocation and creation failures

Environmental monitoring

related data

Information that reports on the fans, power supplies, temperature, and voltage states

Ethernet-like interface statistics related data

Information that reports on Ethernet-like interfaces statistics such as: number of times a carrier sense condition is lost, number of delayed frame transmissions due to medium unavailability, number of incoming frame failures received due to an internal MAC sub layer errors, and the number of outgoing frame to be transmitted failures due to an internal MAC sub layer errors

Interface related error data

Information that reports on the number of Carrier transitions, collisions, CRC errors, and resets seen on the particular interface.

 


Network Element’s Fault Management Index Page

Previous Page Page 2/7 Next Page


[Small Solutions], [Professional Solutions], [Security Solutions], [Development Solutions]

[Home], [About], [Solutions Center], [NMS Market], [Products & Services],
[Management Technology], [Technical Support], [Contact us], [Site Map]