Big Lake = Big Value

“Getting value out of your data lake”

For the first time in the security industry, we are seeing security operations teams and data analytics teams working together. This positive development illustrates that security data has value to everyone and can be shared throughout a company.

It is important to take control of your data destiny, as it impacts both cost and control. If you give your security or operational data to a vendor who can show you impressive results, but they control the data, you will be dependent on them for data retention (for years) and analytics, which is not ideal.

In our previous post “Data Transformers to the Rescue”, we discussed how to move and transform data in the era of big data analytics and AI. Today, we will provide you with actionable ideas on how to extract real value from your data lake, particularly in security. As we mentioned in our earlier post “Will Your SIEM Survive?”, it is crucial for the SIEM to leverage the data lake and serve as the primary means for logging more data sources to enhance visibility for threat detection, hunting, and forensics.

DetectionDescriptionML TypeDifficultyExamples
Network/FirewallSummarizing data injecting into Sentinel/SIEMNone2/10Hourly summaries of firewall and endpoint network/traffic logs. 95%+ efficiency of original data ingestion size. Src/dst and bytes in/out
Anomalous network trafficBaselining and finding deviations in FW trafficUnsupervised Anomaly Detection3/10Data Exfil: anomalous outbound Bytes for an IP
Anomalous office activityBaselining and finding deviations in office activity. Most useful if can correlate with non-MS user logs.Unsupervised Anomaly Detection4/10Data Exfil: unusual number of files downloaded for user + anomalous network traffic
Correlation from Human FeedbackIdentifying logs related to confirmed incidentsUnsupervised Correlation6/10Incident classified as TP, logs sent back to data lake, data lake identifies logs that are similar and brings them up for analysis
CopilotPrompting GPT to tell story from dataLLM8/10Prompt: What suspicious behavior did USER perform on Tuesday? Answer: 1. Run KQL to find all User’s activity 2. Feed results into GPT 3. GPT return suspicious rows

The above table offers possibilities that arise from improved logging, which enables better utilization of the rapidly expanding power of machine learning, anomaly detection, and big data analytics.

We hope that what you see here represents a monumental shift from detecting indicators of compromise (IOC) to detecting behaviors. This is an important departure from how security has traditionally operated.

Data models can move faster than we can discover IOCs, or better yet, models can create IOCs in near real-time and feed them to our endpoint and network tools.

  • Security and data teams need to collaborate.
  • Log data should be shared to avoid duplication and reduce costs.
  • It is important to retain control over your data.
  • Data retention should be cost-efficient to store long-term.
  • Data ingestion routing is critical for control and key to placing data in security tool and/or lake.
  • We need to log more data sources, especially applications and SaaS logs.
  • Big data analytics will transform our security defense and remediation capabilities.
  • Behavioral detection will create Indicators of Compromise (IOCs) and Indicators of Attack (IOAs), reversing the traditional approach to threat detection and allowing for faster detections.
  • AI will drive new methods for identifying bad/malicious behaviors, from external actors to risky insiders.
  • Microsoft Fabric will empower and ease the implementation and management of big data analytics and lake management. Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place.

We hope you are feeling inspired and excited about the future of enterprise logging and security engineering. This is an evolution of capabilities that requires redesigning your data ingestion methods and data management solutions. We encourage you to get started now, as we are confident that you will not regret it. Enterprises who have already adopted these capabilities have seen tremendous value in terms of cost savings and improved efficiency in their security operations.

Be sure to follow this series “Azure Security Data Lake”. There is a lot more to come as we explore the future of security AI and big data analytics.

If you’re interested in automating your security operations, follow our series “Lets Automate Your SOC”.