It seems as though this topic has tormented my mind for quite some time now. It is an excellent topic for R&D and something that can keep any system admin troubled. I believe “effective” would have been an ideal prefix to the post’s title since what I have seen so far today is nothing close to effective.
When it comes to network monitoring & management the only real reason you don’t have anything in your showcase to ever talk about is simply because of the lack of interoperable protocols used to retrieve information about your network in various network devices. This in continuation requires software that monitors your network to have a relevant amount of overhead maintenance. At the recent 2008 Cyprus state fair I was at the CableNet stand where I sat down and talked to someone that does a lot of work with HP Openview. His argument regarding OpenView’s lack of automation on a network was “it’s software - it requires maintenance”. I find this an argument invalid in today’s networks where they can grow into monster sized networks. Especially in the carrier and service provider grade networks you really want to avoid having a person or a whole team managing monitoring on your network. This is just something very prehistoric and unnecessary.
One good example is topology discovery at Layer 2 and 3 (not to mention 4). Traditionally people use CDP to work with Cisco devices and their neighboring Cisco devices. The problem is that it will never fit in perfectly since you are bound to have another vendor’s equipment which does not support Cisco’s proprietary CDP though Cisco has licensed some vendors such as MikroTik to use CDP in their hardware. In a response to this LLDP (Link Layer Discovery Protocol) is a recently implemented protocol that acts much like CDP does. It also has extensions to announce VLAN information in your L2 network and more information that could benefit network monitoring , topology discovery and changes , etc…
Another example is SNMP. SNMP is a protocol used to retrieve and deliver information relevant to the network device’s operation. This can include statistical information such as interface traffic , errors , types of traffic , accounting and more. It can also include operational statuses of interfaces , processes and more. SNMP is widely used today to retrieve such information. The problem is that every vendor has their own MIBs (Management Information Base) that identify the OIDs returned from an SNMP query. Therefore your software has to be maintained by loading SNMP mibs manually into the software regarding each respective device or by installing free/commercial modules to your monitoring software to support identification of the data retrieved. This poses more overhead and reminds me much of a guy sitting in a datacenter all day making sure each packet has the correct window size and adjusting it manually
Just sounds crazy that your network cannot evolve with your monitoring system and vice versa however I don’t expect a level of AI in the whole thing , I simply expect a standard(s) to be set to rid of this problem.
One more example of the “clever” monitoring systems out there would be retrieving the routing table via SNMP and comparing it to other devices discovered to see who is in who’s network. This is done out-of-the-box with Zenoss which is an enterprize grade (as they say) monitoring system. If you have 2 locations which have 192.168.1.0/24 with a nat on the inside but are in different locations entirely , Zenoss is not clever enough to know that this is NAT therefore each network is a separate instance. In conclusion , Zenoss puts all these networks in a cloud interconnecting them graphically for a visual yet it is wrong discovery. This is only a sample of the problems that arise with effective network management (nothing effective yet).
Monitoring Systems
When I was in Munich for a few days I had the chance to take a look at Allied Telesis’s NMS which aids you in the discovery , monitoring and provisioning of their product range. I was somewhat impressed by what it did despite the ugly Solaris 9 CDE looking UI it had. It could effectively help an admin find things , get things done and so forth. The downfall of course is that it only supports Allied Telesis devices so it has limited deployment. A saying can be born from something like this and we could roll out “If I had a monitoring system for every time I put one of these in… we would be called monitoring-r-us-all-day”
During my time at various ISP/NSPs I have probably tried all of them. Nagios , Cacti , Netdisco , Zabbix , Zenoss and whatnot. They all work with their own arcitecture , own code and everything is very self-devised on how the arcitecture is produced. Come to think of it some people avoid using proper network arcitecture such as the Cisco models and this is probably why nothing every fits perfectly in any of them but what makes you think that the actual monitoring systems comply to standard network arcitecture anyway?
At the end of the day network administrators avoid following a by the book approach to the detail because as we all know not every guildeline will ever fit right in. This is why a monitoring system should follow the administrator and not the other way around. It is the software that has to maintain itself simply because people are maintaining the network. Toplogy changes and other actions should be something that does not require effort of any kind. This helps in saving time , money and hassle in the case that a proper notification was not made to even the slightest change in a network. It can also help a monitoring system make suggestions , simulations as to how the network will function prior to changes and so forth.
Monitoring systems fail to do so any of the above. They are based around the idea of using ICMP to make sure a machine is a live and SNMP to retrieve statistical and operational information. This seems ever so wrong to me as it does not fit in every environment , lacks L2 information from a point onwards and doesn’t feel like a clever thing to do. Some people also use syslog to monitor in a central location which is very useful.
I used to master the design and deployment of Nagios. In fact I used to love using it for many years and it did what it did perfectly. If you think you require maintenance to do things nowadays , Nagios four years ago was something of a full time job (and I think it still is - I don’t use it anymore). Then I tried a few others which I found immature , sloppy and overall unusable. Then I tried modifying Cacti to fit large networks for additional functionality such as operational notifications , syslogging and more. It failed to serve its greater purpose and this is not a sad event since Cacti was meant for something a little less demanding than the full monitoring of a NSP network.
Zenoss is the winner (for now)
I finally reached my verdict. Zenoss is currently the most ideal tool I’ve seen for the job. Do not missunderstand me though. Zenoss is far from the standard’s meeting point. It does not handle what it does in an intelligent manner , simply a few clever bits and pieces but I can assure its the best there is today. I do however wish them the best of luck because they have managed to beat the best of what is out there available to you today in a fraction of the time that others have worked on their own projects. This gives us to believe that perhaps one day it will be able to suit all networks and all administrators with minimal administrative overhead. It also centralizes system logging (syslog) which is a standard for logging used widely today. This aids pin pointing problems on the fly.
The real problems
The real problem isn’t the software. The first real problem is awarded to the protocol available and the information that is retrievable from each network device disregarding vendor type. Cross vendor protocols such as SNMP have to become a little more compatible with each other. Unfortunately SNMP has become such a mess that it just won’t cut it anymore to do the science fiction I am dreaming of. Telnet/SSH access to devices is one way of retrieving information but still it isn’t very helpful if you have a variety of devices connected to your network. CDP is an excellent approach to more on your network but still it isn’t available on anything but a Cisco device.
The next big problem that has to be addressed is the non-compliance of administrators to network arcitecture standards. Standards such as AIN and SONA are not followed by the book and some administrators end up with a big mess which they only know how it is organized since they set it up. The slang term “complicated network” is not something you wan’t to hear coming out of your network administrator today.
In parallel to the network arcitectures not being followed by administraotrs in their network , network arcitecture design plans have to be implemented at a software level to give your monitoring system the ability to know what the standard is for a network design. This aids the software to know if what you are doing is right or wrong and what approach there is to the network. There are less network arcitectures out there than what there is hardware vendors. It is much easier to implement the fishing rod than to actually have to distringuish between each fish.
Possible solutions
As far as protocols are concerned things are changing towards the better. Since we are all only a step away from newer technologies such as ipv6 for example , it is only natural that replacement and new protocols must come into place to handle retrieval of information and this time it has to be something that will allow you to go cross-platform. It saddens me to see that many good protocols are not taking altitude as to what they can do but it is completely understandable from a commercialized aspect as to why they do not do so.
1) Protocols to be standardized and enforced in all network aware products
- LLDP - Link Layer Discovery Protocol - a layer 2 protocol that shares information regarding a network aware device which is directly connected to the device in question.
- LLDP-MED - Link Layer Discovery Protocol - Media Endpoint Discovery is an enchancement to LLDP that enables additional functionality and features such as inventory enabled features , location discovery for emergency services and others.
2) Embedded Network Arcitecture logic in software
So what I am trying to make a point of here is that on an application level it would be nice to see proper monitoring systems that have embedded network arcitecture logic in order to add features no other monitoring system can do. With the addition of network arcitecture standard logic in a monitoring system some of the “clever” benefits you can get are:
- Effects of making a change to your network at L2 & L3 prior to making the change (Simulation)
- Suggestions on how to optimize the network
- Proper standardization of your network arcitecture
- Easier addition of new network nodes/devices
- Hierarchies and dependancies
At the time of this artcle , all monitoring systems known to man are simply based on a structure that suits the network administrator. There is no standardization and no logic that helps arcitecture. LLDP and it’s extensions in combination with a better standardization of SNMP can change all of this. During research information on this paper I stumbled across an article regarding LLDP from Cisco.com which I would like to share with you.
http://www.cisco.com/en/US/technologies/tk652/tk701/technologies_white_paper0900aecd804cd46d.html
The article refers to LLDP-MED , it’s feature set and how it can aid the network administrator to do things that were limited to Cisco devices only in the past.
In conclusion to my thoughts I wish I had more programming skills to look into this further but unfortunately they are limited (at least for the time being). Your comments and thoughts on network monitoring are apreciated and welcomed.
Hi there, quite interesting article. thanks for that. But seems that you have mentioned only Zenoss. Take a look at Nagios at http://www.nagios.org or Dotcom Monitor at http://www.dotcom-monitor.com The first is open source software, has a lot of features. But you have to poesess tech skills to set it up. The second one is free and paid online service, very powerful.
I don’t think that you will ever get any complex network which can be monitored automatically without tweaking. The only way to achieve that is by using standardized network designs, which won’t happen IMO.
Also I want to say that auto-linking of various words to tags is quite distracting.
Thank you for your feedback.
Peter I have had a lot of experience with nagios and I have also successfully deployed scripts in the passed that allow you to ensure proper architecture with nagios based on real network architecture (thank god for CDP , snmp and all the rest).
Unfortunately the problem is in the software. Nagios is time consuming to configure and the structure cannot allow you to define many standardized practices today in networking. As for dotcom monitor , well you cannot deploy a monitoring system for an NSP grade network using an online service now can you ?
Hazard , first off please allow me to apologize on behalf of the wp plugin that tags the site. As for complex network topology mapping and monitoring , I believe it can be done by bringing in the “wrongs” from the software in a corrected form into the lower layers in the form of a standard protocol , much like LLDP can do for you today but with a little bit of “art” in it. I am only referring to the same kind of automation that exists in many functions in IP.
A project that deals with such automated features is ANA (Autonomic Network Architecture) which I have looked into recently and seems to promise quite a few automated features to some approaches on IP. The same thing can be applicable to a lower layer protocol that would assist your network in keeping its socks up (including the administrator) through the use of intelligent network monitoring and mapping.
First of all thanks for using Zenoss, it sounds like you’re quite the demanding sysadmin, I hope we can continue to evolve to meet your needs. While Zenoss doesn’t provide every feature you mention, our goal is to provide a flexible, easy-to-administer platform that suits the needs most admins and can be extended to provide whatever may be missing. So thanks for the feedback and please stay active in the Community and let us know what we can do to continue to be your monitoring platform of choice.
Thanks,
Matt Ray
Zenoss Community Manager
mray@zenoss.com
Did you have a look at NAV - Network Administration Visualized?
Have a look at http://metanav.uninett.no/
regards