The objective of this project is to develop a system of monitoring and classification of traffic mass Internet, able to use real-time dimensions and applications carried by each data connection, with a high accuracy and very low computational cost thanks to a series of automatic learning algorithms to predict. Such a system will present as main characteristics the following:
- High precision. The classification of applications that the system is based on the use of algorithms of machine learning based on sampled information. The system is able to adapt to the characteristics of the traffic in an initial phase of training, with the possibility of later continuous trainings evaluating the accuracy.
- Very low computational cost. Given that the normal operation of the system requires no capture or inspection of the contents of the packages (Deep Packet Inspection DPI).
- High scalability. The system is capable of processing volumes of massive traffic in high-capacity links (> 10 Gb/s), given that it feeds sampled NetFlow flow at a configurable rate (1/100, 1/1000,...). On the other hand the sampling also reduces the required computational load on traffic routers.
- Real-time monitoring. With capacity to process traffic with frequencies up to 1 minute, allowing detection of anomalies and punctual network congestion situations applications.
- Distributed monitoring. The system is capable of processing information of NetFlow, flows generated at various points in the network.
- Low impact on network configuration. Since NetFlow technology is already deployed and is widely used in networks of operators and big companies, the impact on the Network Setup is minimal. Because it is a standard it is also independent of the network equipment maker.
- Confidentiality. The fact of working with NetFlow technology, and not using type DPI and of machine learning techniques means that it's not necessary to access the contents of the traffic, i.e., it is not necessary to process "user data" circulating on the network. With this, the system guarantees the confidentiality of the information, while respecting the principles of secrecy of correspondence
Once developed, the system is subjected to a test of concept by staying to do so at a point of massive exchange of real traffic provided by the Centre de Supercomputació de Catalunya.
Finally we will study the possibilities that the system might be exploitable commercially, since the information concerning the classification of traffic on the Internet is, due to its constant growth and evolution, of great interest for various types of customers such as operators of telecommunications, institutions that manage academic networks, large enterprises.