Click here to close now.

Welcome!

Java Authors: Liz McMillan, Carmen Gonzalez, Tim Hinds, Pat Romanski, Lori MacVittie

Related Topics: Java, SOA & WOA

Java: Article

Why Rule-Based Log Correlation Is Almost a Good Idea... (Part 5)

Performance tolls - why you can't correlate 100% of your logs

Performance Tolls - Why you cannot correlate 100% of your logs...?
Compounding the combinatory explosion in the number of static-based correlation rules, it is impossible to correlate 100% of all your logs, it is just too expensive and not practical. Read on...

A correlation engine works really hard, even when dealing with a limited set of scenarios:

  • Each scenario requires lots of rules and exceptions, and most of these rules need to be interpreted further as dozen, if not hundred of simple checks and tests. For example, you may want to flag loops with a simple rule such as "IP Origin" = "IP Destination". If you have 1 000 logs this means that for each log you need to do 1 000 tests. Imagine having a million logs, a trillion logs, which is not uncommon on a medium sized infrastructure over a couple days.
  • Each scenario requires state information to be kept and managed for hours, or even days. For example, you may want to be alerted if A happens then within 1 day B happens and then within 1 day C happens. This means that lots of state information needs to be kept over a 2-day span, the engine is constantly monitoring for A to happen, then as soon as A happens the engine starts the clock and monitors for B, and if B happens within that day then a new countdown starts to look for C, all the meanwhile also constantly monitoring for A so as to start a new A then B then C condition... If A happens a lot, and B also happens frequently after A then the engine will need to store lots of A then B state information while monitoring for all the required C.

This requires:

  • Extremely powerful servers needed to run these rules and their corresponding "if then else" tests and checks when there are lots of logs.
  • Vast amounts of temporary memory needed to keep state information in memory and speed up processing while avoiding swapping to disks.

So the correlation engine needs to be fed very carefully, don't give it more than what it can chew or it will essentially run out of resources and die.

Knowing which logs need to be part of scope is an important part of tuning a correlation engine.

No, you cannot ask your correlation solution to manage all of your logs. It's not designed for that. Managing only the most critical ones is already a daunting task for it.

Correlation load for one simple correlation rule over one hour
As an example, let's have a closer look at Attack Scenario 1 - Identity Theft that we elaborated on, and put a threshold of 1 hour for flagging Identity Theft.

Assumptions - at that time, we have:

o 5 logins/sec average from local logins

o 5 logins/sec average from VPN logins

o 1 000 events/sec average total infrastructure

o Logs kept for a total of 1 week - for reporting etc.

Total data space of 1000*3600*24*7 = 604 800 000 events

For each local login event - which is 5 times per second

o Look in the VPN login events - for the past 3600 seconds - and check if that same person logged in through VPN

Total data space of 3600*5 = 18 000 events VPN logins

Total of 18 000 * 5 = 90 000 checks per second

Size of that 1 hour data space in which to perform the 90 000 checks

o 1000 events/sec * 3600 = 3 600 000 events

So, for this one correlation rule:

  • Among the 604 million records total for the past week
  • Among the 3.6 million records for the past hour
  • The Correlation Engine needs to perform:

Reads

90 000 database reads and checks per second

Writes

While at the same time doing 1000 record writes and inserts per second

  • And at the same time, continue collecting, parsing and normalizing, reporting, alerting, "signing of logs", housekeeping and allowing users to log in and use the tool etc etc...

Correlation load for one complex global rule over one day
Imagine now a complex correlation rule that requires the engine to look 100 times per second, and to do this over the full 1-day sliding window.

The assumptions are then:

  • Full data space

Same 1 week = 604 800 000 events

  • One-day sliding window data space

1000 * 3600 * 24 = 86 400 000 events

  • For each correlation rule, we are doing

Number of tests = 100 times per second, look into each record in the 1-day sliding window data space

100 * 86 400 000 = 8 640 000 000 tests per second

That's 8 billion reads per second!!!
Sure you can use tricks and shortcuts to avoid doing all the 8 billion checks, but that gives an idea of the searching power required... for this 1 scenario!!!

Imagine having to enrich this 1 correlation rule with geolocalization information, or somehow putting a dynamic dimension to it.

Imagine having 100's or 1000's of correlation rules, what would be the impact on number of database reads and load?

This is just not practical, and you cannot always solve this problem by throwing more hardware at it.

Did you know that APT attacks can last weeks and months? Stay tuned for what this means for static rule based correlation...

More Stories By Gorka Sadowski

Gorka is a natural born entrepreneur with a deep understanding of Technology, IT Security and how these create value in the Marketplace. He is today offering innovative European startups the opportunity to benefit from the Silicon Valley ecosystem accelerators. Gorka spent the last 20 years initiating, building and growing businesses that provide technology solutions to the Industry. From General Manager Spain, Italy and Portugal for LogLogic, defining Next Generation Log Management and Security Forensics, to Director Unisys France, bringing Cloud Security service offerings to the market, from Director of Emerging Technologies at NetScreen, defining Next Generation Firewall, to Director of Performance Engineering at INS, removing WAN and Internet bottlenecks, Gorka has always been involved in innovative Technology and IT Security solutions, creating successful Business Units within established Groups and helping launch breakthrough startups such as KOLA Kids OnLine America, a social network for safe computing for children, SourceFire, a leading network security solution provider, or Ibixis, a boutique European business accelerator.

@ThingsExpo Stories
Wearable devices have come of age. The primary applications of wearables so far have been "the Quantified Self" or the tracking of one's fitness and health status. We propose the evolution of wearables into social and emotional communication devices. Our BE(tm) sensor uses light to visualize the skin conductance response. Our sensors are very inexpensive and can be massively distributed to audiences or groups of any size, in order to gauge reactions to performances, video, or any kind of presentation. In her session at @ThingsExpo, Jocelyn Scheirer, CEO & Founder of Bionolux, will discuss ho...
Roberto Medrano, Executive Vice President at SOA Software, had reached 30,000 page views on his home page - http://RobertoMedrano.SYS-CON.com/ - on the SYS-CON family of online magazines, which includes Cloud Computing Journal, Internet of Things Journal, Big Data Journal, and SOA World Magazine. He is a recognized executive in the information technology fields of SOA, internet security, governance, and compliance. He has extensive experience with both start-ups and large companies, having been involved at the beginning of four IT industries: EDA, Open Systems, Computer Security and now SOA.
When it comes to the Internet of Things, hooking up will get you only so far. If you want customers to commit, you need to go beyond simply connecting products. You need to use the devices themselves to transform how you engage with every customer and how you manage the entire product lifecycle. In his session at @ThingsExpo, Sean Lorenz, Technical Product Manager for Xively at LogMeIn, will show how “product relationship management” can help you leverage your connected devices and the data they generate about customer usage and product performance to deliver extremely compelling and reliabl...
SYS-CON Events announced today that GENBAND, a leading developer of real time communications software solutions, has been named “Silver Sponsor” of SYS-CON's WebRTC Summit, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. The GENBAND team will be on hand to demonstrate their newest product, Kandy. Kandy is a communications Platform-as-a-Service (PaaS) that enables companies to seamlessly integrate more human communications into their Web and mobile applications - creating more engaging experiences for their customers and boosting collaboration and productiv...
From telemedicine to smart cars, digital homes and industrial monitoring, the explosive growth of IoT has created exciting new business opportunities for real time calls and messaging. In his session at @ThingsExpo, Ivelin Ivanov, CEO and Co-Founder of Telestax, shared some of the new revenue sources that IoT created for Restcomm – the open source telephony platform from Telestax. Ivelin Ivanov is a technology entrepreneur who founded Mobicents, an Open Source VoIP Platform, to help create, deploy, and manage applications integrating voice, video and data. He is the co-founder of TeleStax, a...
The industrial software market has treated data with the mentality of “collect everything now, worry about how to use it later.” We now find ourselves buried in data, with the pervasive connectivity of the (Industrial) Internet of Things only piling on more numbers. There’s too much data and not enough information. In his session at @ThingsExpo, Bob Gates, Global Marketing Director, GE’s Intelligent Platforms business, to discuss how realizing the power of IoT, software developers are now focused on understanding how industrial data can create intelligence for industrial operations. Imagine ...
The 3rd International @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - is now accepting submissions to demo smart cars on the Expo Floor. Smart car sponsorship benefits include general brand exposure and increasing engagement with the developer ecosystem.
Operational Hadoop and the Lambda Architecture for Streaming Data Apache Hadoop is emerging as a distributed platform for handling large and fast incoming streams of data. Predictive maintenance, supply chain optimization, and Internet-of-Things analysis are examples where Hadoop provides the scalable storage, processing, and analytics platform to gain meaningful insights from granular data that is typically only valuable from a large-scale, aggregate view. One architecture useful for capturing and analyzing streaming data is the Lambda Architecture, representing a model of how to analyze rea...
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes for use cases across the industrial, enterprise, and consumer segments.
When it comes to the Internet of Things, hooking up will get you only so far. If you want customers to commit, you need to go beyond simply connecting products. You need to use the devices themselves to transform how you engage with every customer and how you manage the entire product lifecycle. In his session at @ThingsExpo, Sean Lorenz, Technical Product Manager for Xively at LogMeIn, will show how “product relationship management” can help you leverage your connected devices and the data they generate about customer usage and product performance to deliver extremely compelling and reliabl...
SYS-CON Events announced today that SoftLayer, an IBM company, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place June 9-11, 2015 at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place November 3–5, 2015 at the Santa Clara Convention Center in Santa Clara, CA. SoftLayer operates a global cloud infrastructure platform built for Internet scale. With a global footprint of data centers and network points of presence, SoftLayer provides infrastructure as a service to leading-edge customers ranging from ...
The explosion of connected devices / sensors is creating an ever-expanding set of new and valuable data. In parallel the emerging capability of Big Data technologies to store, access, analyze, and react to this data is producing changes in business models under the umbrella of the Internet of Things (IoT). In particular within the Insurance industry, IoT appears positioned to enable deep changes by altering relationships between insurers, distributors, and the insured. In his session at @ThingsExpo, Michael Sick, a Senior Manager and Big Data Architect within Ernst and Young's Financial Servi...
SYS-CON Events announced today that Open Data Centers (ODC), a carrier-neutral colocation provider, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Open Data Centers is a carrier-neutral data center operator in New Jersey and New York City offering alternative connectivity options for carriers, service providers and enterprise customers.
The IoT market is projected to be $1.9 trillion tidal wave that’s bigger than the combined market for smartphones, tablets and PCs. While IoT is widely discussed, what not being talked about are the monetization opportunities that are created from ubiquitous connectivity and the ensuing avalanche of data. While we cannot foresee every service that the IoT will enable, we should future-proof operations by preparing to monetize them with extremely agile systems.
There’s Big Data, then there’s really Big Data from the Internet of Things. IoT is evolving to include many data possibilities like new types of event, log and network data. The volumes are enormous, generating tens of billions of logs per day, which raise data challenges. Early IoT deployments are relying heavily on both the cloud and managed service providers to navigate these challenges. Learn about IoT, Big Data and deployments processing massive data volumes from wearables, utilities and other machines.
SYS-CON Events announced today that CodeFutures, a leading supplier of database performance tools, has been named a “Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. CodeFutures is an independent software vendor focused on providing tools that deliver database performance tools that increase productivity during database development and increase database performance and scalability during production.
The explosion of connected devices / sensors is creating an ever-expanding set of new and valuable data. In parallel the emerging capability of Big Data technologies to store, access, analyze, and react to this data is producing changes in business models under the umbrella of the Internet of Things (IoT). In particular within the Insurance industry, IoT appears positioned to enable deep changes by altering relationships between insurers, distributors, and the insured. In his session at @ThingsExpo, Michael Sick, a Senior Manager and Big Data Architect within Ernst and Young's Financial Servi...
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Intelligent Systems Services will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Established in 1994, Intelligent Systems Services Inc. is located near Washington, DC, with representatives and partners nationwide. ISS’s well-established track record is based on the continuous pursuit of excellence in designing, implementing and supporting nationwide clients’ mission-critical systems. ISS has completed many successful projects in Healthcare, Commercial, Manufacturing, ...
PubNub on Monday has announced that it is partnering with IBM to bring its sophisticated real-time data streaming and messaging capabilities to Bluemix, IBM’s cloud development platform. “Today’s app and connected devices require an always-on connection, but building a secure, scalable solution from the ground up is time consuming, resource intensive, and error-prone,” said Todd Greene, CEO of PubNub. “PubNub enables web, mobile and IoT developers building apps on IBM Bluemix to quickly add scalable realtime functionality with minimal effort and cost.”