Implementing Automated File Verification in AWS S3 with SentinelOne and Lambda
Introduction
In today’s security-conscious environment, verifying uploaded files for threats is critical. Organizations that accept user uploads—whether documents, media, software, or data—face significant risk from malware, ransomware, and other file-based threats. Manual review of every uploaded file is impractical at scale. This article presents a production-ready architecture that leverages Amazon S3, AWS Lambda, and SentinelOne to automatically scan files upon upload and move verified files to a secure destination bucket.
The solution presented here demonstrates several important architectural principles. First, it is event-driven rather than batch-oriented, meaning files are scanned immediately upon upload rather than waiting for a scheduled job to run. Second, it implements a fail-secure posture, meaning files are assumed dangerous until proven safe, and any processing errors result in files being treated as potentially infected. Third, it is serverless, requiring no virtual machines to manage or patches to apply. Fourth, it is scalable from handling a few files per day to millions per day without architecture changes—just AWS automatically scales Lambda concurrency. Finally, it is cost-effective, charging only for actual usage with no minimum costs or idle infrastructure charges.
The architecture integrates with SentinelOne, a leading cloud-native endpoint protection platform that provides advanced threat detection capabilities beyond traditional antivirus. By leveraging SentinelOne’s deep analysis capabilities, you get protection against known malware, zero-day exploits, ransomware, and suspicious behavior patterns. The system creates an immutable audit trail of all files processed, their threat status, and when/how they were processed, supporting compliance requirements in highly regulated industries.
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ User/Application │
└──────────────────────┬──────────────────────────────────────┘
│ uploads file
▼
┌─────────────────────────────┐
│ Source S3 Bucket │
│ (quarantine-uploads) │
└──────────┬──────────────────┘
│ ObjectCreated event
▼
┌─────────────────────────────┐
│ S3 Event Notification │
│ (SQS/SNS/EventBridge) │
└──────────┬──────────────────┘
│ triggers
▼
┌─────────────────────────────┐
│ Lambda Function │
│ (File Processor) │
│ - Download file │
│ - Call SentinelOne API │
│ - Verify threat status │
│ - Move to destination │
└──────────┬──────────────────┘
│
┌──────────┴──────────┐
▼ ▼
┌────────────────┐ ┌────────────────┐
│ SentinelOne │ │ Verified Files │
│ Deep Visibility│ │ Bucket │
│ API │ │ (clean-files) │
└────────────────┘ └────────────────┘
│
└─── Threat found? Move to Quarantine
or delete based on policy
Key Components
1. Source S3 Bucket (Quarantine)
The source S3 bucket serves as the initial entry point for all uploaded files. This bucket acts as a quarantine zone where files are held temporarily until they can be verified for safety. When users or applications upload files to this bucket, S3 automatically generates an ObjectCreated event. This event is the trigger that starts the entire verification pipeline. The bucket should be configured with appropriate access controls to prevent unauthorized downloads, ensuring that only the Lambda function and authorized administrators can access files during the quarantine period. By keeping all uploads in this dedicated bucket, you maintain a clear separation of concerns and can easily monitor and audit all incoming files.
2. Destination S3 Bucket (Clean Files)
Once a file passes the security verification with SentinelOne and is determined to be clean, it needs to be moved to a location where downstream applications can safely access it. The clean files bucket serves as the trusted repository for verified, safe files. This bucket is typically accessed by business applications, reports generation systems, or other services that need to work with known-safe files. You can configure this bucket with appropriate retention policies, backup settings, and access controls. Since files in this bucket have been verified, they can be accessed by a broader set of applications without additional security concerns. The clean files bucket can also be configured with higher availability and durability requirements since these are production files that your business relies on.
3. Lambda Function
The Lambda function is the orchestrator of the entire verification workflow. It serves as the intelligent intermediary between S3 events and the SentinelOne scanning service. When triggered by an S3 event, the Lambda function performs several critical tasks: first, it downloads the file from the source bucket into its execution environment; second, it retrieves the necessary SentinelOne API credentials from AWS Secrets Manager to maintain security; third, it communicates with the SentinelOne Deep Visibility API to scan the file; and finally, based on the scan results, it decides where to move the file. The function is configured with sufficient timeout (300 seconds) and memory (512 MB) to handle large files and API communication latency. It also implements comprehensive error handling, logging, and metadata tagging to ensure that every file’s verification status can be tracked and audited.
4. SentinelOne Deep Visibility API
SentinelOne’s Deep Visibility API provides advanced threat analysis capabilities that go far beyond simple signature-based detection. The API accepts file uploads and performs comprehensive analysis including behavioral analysis, heuristic detection, and comparison against global threat intelligence. The API returns a detailed verdict including threat level classifications (CLEAN, SUSPICIOUS, INFECTED) and specific threat identifications when malware or suspicious code is detected. By integrating with SentinelOne, you gain access to enterprise-grade threat intelligence that is continuously updated as new threats are discovered. The API is highly reliable and performant, making it suitable for processing files in real-time as they are uploaded.
5. EventBridge/S3 Events
S3 bucket events provide the glue that connects your storage infrastructure to the verification pipeline. When a file is uploaded (ObjectCreated event), S3 generates a notification containing metadata about the file including the bucket name, object key, and timestamp. This event is crucial because it automatically triggers the Lambda function without requiring any manual intervention or additional infrastructure. The event notification is lightweight and includes just enough information for Lambda to retrieve and process the file. This event-driven architecture eliminates the need for continuous polling or scheduled jobs, making the system more efficient and responsive. Files are scanned immediately upon upload, providing real-time protection against threats.
Terraform Infrastructure
Complete Terraform Configuration
The Terraform configuration provided below represents a production-ready infrastructure deployment. This configuration follows AWS best practices and implements security controls at every layer. The infrastructure is modular and reusable, allowing you to deploy the same configuration across multiple AWS accounts or regions by simply changing the input variables. Before deploying, you’ll need to customize the variables to match your specific SentinelOne setup and AWS environment.
The configuration creates three distinct S3 buckets: the quarantine bucket for incoming files, the verified bucket for clean files, and the infected bucket for files containing threats. Each bucket is individually secured with encryption, versioning, and access controls. The Lambda function is granted minimal permissions required to perform its duties, following the principle of least privilege. All sensitive information, including SentinelOne API credentials, is stored in AWS Secrets Manager, which provides encryption and audit logging.
# variables.tf
variable "aws_region" {
description = "AWS region"
default = "us-east-1"
}
variable "environment" {
description = "Environment name"
default = "production"
}
variable "sentinelone_api_url" {
description = "SentinelOne Deep Visibility API URL"
type = string
sensitive = true
}
variable "sentinelone_api_key" {
description = "SentinelOne API key"
type = string
sensitive = true
}
# main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
}
# Source S3 Bucket - Quarantine/Staging
resource "aws_s3_bucket" "quarantine_uploads" {
bucket = "file-quarantine-${data.aws_caller_identity.current.account_id}-${var.aws_region}"
}
resource "aws_s3_bucket_versioning" "quarantine_uploads" {
bucket = aws_s3_bucket.quarantine_uploads.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "quarantine_uploads" {
bucket = aws_s3_bucket.quarantine_uploads.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_s3_bucket_public_access_block" "quarantine_uploads" {
bucket = aws_s3_bucket.quarantine_uploads.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# Destination S3 Bucket - Verified Files
resource "aws_s3_bucket" "verified_files" {
bucket = "file-verified-${data.aws_caller_identity.current.account_id}-${var.aws_region}"
}
resource "aws_s3_bucket_versioning" "verified_files" {
bucket = aws_s3_bucket.verified_files.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "verified_files" {
bucket = aws_s3_bucket.verified_files.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_s3_bucket_public_access_block" "verified_files" {
bucket = aws_s3_bucket.verified_files.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# Quarantine Bucket - For Infected Files
resource "aws_s3_bucket" "infected_files" {
bucket = "file-infected-${data.aws_caller_identity.current.account_id}-${var.aws_region}"
}
resource "aws_s3_bucket_public_access_block" "infected_files" {
bucket = aws_s3_bucket.infected_files.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# CloudWatch Log Group for Lambda
resource "aws_cloudwatch_log_group" "file_scanner_logs" {
name = "/aws/lambda/file-verification-processor"
retention_in_days = 30
tags = {
Environment = var.environment
Name = "file-scanner-logs"
}
}
# IAM Role for Lambda
resource "aws_iam_role" "file_scanner_role" {
name = "file-verification-processor-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}
]
})
}
# IAM Policy for Lambda - S3 Access
resource "aws_iam_role_policy" "lambda_s3_policy" {
name = "lambda-s3-policy"
role = aws_iam_role.file_scanner_role.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:ListBucket"
]
Resource = [
aws_s3_bucket.quarantine_uploads.arn,
"${aws_s3_bucket.quarantine_uploads.arn}/*"
]
},
{
Effect = "Allow"
Action = [
"s3:PutObject",
"s3:PutObjectAcl"
]
Resource = [
"${aws_s3_bucket.verified_files.arn}/*",
"${aws_s3_bucket.infected_files.arn}/*"
]
},
{
Effect = "Allow"
Action = [
"s3:DeleteObject"
]
Resource = [
"${aws_s3_bucket.quarantine_uploads.arn}/*"
]
}
]
})
}
# IAM Policy for Lambda - CloudWatch Logs
resource "aws_iam_role_policy" "lambda_logs_policy" {
name = "lambda-logs-policy"
role = aws_iam_role.file_scanner_role.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "arn:aws:logs:*:*:*"
}
]
})
}
# IAM Policy for Lambda - Secrets Manager
resource "aws_iam_role_policy" "lambda_secrets_policy" {
name = "lambda-secrets-policy"
role = aws_iam_role.file_scanner_role.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue"
]
Resource = aws_secretsmanager_secret.sentinelone_credentials.arn
}
]
})
}
# Store SentinelOne credentials in Secrets Manager
resource "aws_secretsmanager_secret" "sentinelone_credentials" {
name_prefix = "sentinelone-api-"
description = "SentinelOne API credentials"
}
resource "aws_secretsmanager_secret_version" "sentinelone_credentials" {
secret_id = aws_secretsmanager_secret.sentinelone_credentials.id
secret_string = jsonencode({
api_url = var.sentinelone_api_url
api_key = var.sentinelone_api_key
})
}
# Lambda Function
resource "aws_lambda_function" "file_processor" {
filename = "lambda_function.zip"
function_name = "file-verification-processor"
role = aws_iam_role.file_scanner_role.arn
handler = "index.lambda_handler"
runtime = "python3.11"
timeout = 300
memory_size = 512
environment {
variables = {
VERIFIED_BUCKET = aws_s3_bucket.verified_files.id
INFECTED_BUCKET = aws_s3_bucket.infected_files.id
SECRETS_NAME = aws_secretsmanager_secret.sentinelone_credentials.name
}
}
depends_on = [
aws_iam_role_policy.lambda_s3_policy,
aws_iam_role_policy.lambda_logs_policy,
aws_iam_role_policy.lambda_secrets_policy
]
tags = {
Environment = var.environment
Name = "file-verification-processor"
}
}
# S3 Event Notification to Lambda
resource "aws_s3_bucket_notification" "quarantine_notification" {
bucket = aws_s3_bucket.quarantine_uploads.id
lambda_function {
lambda_function_arn = aws_lambda_function.file_processor.arn
events = ["s3:ObjectCreated:*"]
}
depends_on = [aws_lambda_permission.allow_s3]
}
# Lambda Permission for S3 Invocation
resource "aws_lambda_permission" "allow_s3" {
statement_id = "AllowExecutionFromS3Bucket"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.file_processor.function_name
principal = "s3.amazonaws.com"
source_arn = aws_s3_bucket.quarantine_uploads.arn
}
# CloudWatch Metric for monitoring
resource "aws_cloudwatch_log_group" "scan_metrics" {
name = "/aws/lambda/file-scan-metrics"
retention_in_days = 30
tags = {
Environment = var.environment
}
}
# Data source for AWS account ID
data "aws_caller_identity" "current" {}
# Outputs
output "source_bucket_name" {
value = aws_s3_bucket.quarantine_uploads.id
description = "Name of the source (quarantine) S3 bucket"
}
output "verified_bucket_name" {
value = aws_s3_bucket.verified_files.id
description = "Name of the verified files S3 bucket"
}
output "infected_bucket_name" {
value = aws_s3_bucket.infected_files.id
description = "Name of the infected files S3 bucket"
}
output "lambda_function_name" {
value = aws_lambda_function.file_processor.function_name
description = "Name of the Lambda function"
}
Lambda Function Implementation
The Lambda function provided below is the core of the file verification workflow. This Python implementation handles all the complexity of downloading files, communicating with SentinelOne, and routing files to their appropriate destinations. The function is designed to be robust and production-ready, with comprehensive error handling that ensures failures are logged and tracked appropriately.
The function uses a dedicated SentinelOneVerifier class to encapsulate all API communication logic. This separation of concerns makes the code more maintainable and testable. The verifier class handles authentication, request formatting, response parsing, and error management. By isolating the SentinelOne integration in its own class, you can easily update the API integration logic without touching the main Lambda handler.
The Lambda handler processes S3 events record by record, meaning if multiple files are uploaded simultaneously, they are processed in parallel by different Lambda instances. Each file goes through the complete verification workflow: download from source bucket, scan with SentinelOne, destination determination based on threat status, and finally metadata tagging for future reference.
# index.py - Lambda handler for file verification
import json
import boto3
import logging
import os
import requests
from botocore.exceptions import ClientError
import base64
import hashlib
s3_client = boto3.client('s3')
secrets_client = boto3.client('secretsmanager')
logger = logging.getLogger()
logger.setLevel(logging.INFO)
VERIFIED_BUCKET = os.environ.get('VERIFIED_BUCKET')
INFECTED_BUCKET = os.environ.get('INFECTED_BUCKET')
SECRETS_NAME = os.environ.get('SECRETS_NAME')
class SentinelOneVerifier:
"""Handles SentinelOne API integration"""
def __init__(self, api_url, api_key):
self.api_url = api_url
self.api_key = api_key
self.headers = {
'Authorization': f'ApiToken {api_key}',
'Content-Type': 'application/json'
}
def scan_file(self, file_path, file_name, file_content):
"""
Scan file using SentinelOne Deep Visibility API
Args:
file_path: S3 path of file
file_name: Name of the file
file_content: File content bytes
Returns:
dict: {
'clean': bool,
'threat_level': str,
'threats': list,
'scan_id': str
}
"""
try:
# Calculate file hash
file_hash = hashlib.sha256(file_content).hexdigest()
# Prepare file for API
files = {
'file': (file_name, file_content)
}
# Send to SentinelOne for analysis
endpoint = f"{self.api_url}/api/v2/deep-visibility/threat-analysis/files"
response = requests.post(
endpoint,
files=files,
headers=self.headers,
timeout=60
)
if response.status_code != 200:
logger.error(f"SentinelOne API error: {response.status_code} - {response.text}")
# Fail secure: treat as infected if API fails
return {
'clean': False,
'threat_level': 'UNKNOWN',
'threats': ['API_COMMUNICATION_ERROR'],
'scan_id': None,
'error': True
}
result = response.json()
# Parse SentinelOne response
return {
'clean': result.get('status') == 'CLEAN',
'threat_level': result.get('threat_level', 'UNKNOWN'),
'threats': result.get('detections', []),
'scan_id': result.get('scan_id'),
'error': False
}
except Exception as e:
logger.error(f"Error scanning file with SentinelOne: {str(e)}")
# Fail secure
return {
'clean': False,
'threat_level': 'ERROR',
'threats': [str(e)],
'scan_id': None,
'error': True
}
def get_sentinelone_credentials():
"""Retrieve SentinelOne credentials from Secrets Manager"""
try:
response = secrets_client.get_secret_value(SecretId=SECRETS_NAME)
secret = json.loads(response['SecretString'])
return secret['api_url'], secret['api_key']
except ClientError as e:
logger.error(f"Error retrieving secrets: {str(e)}")
raise
def move_file(source_bucket, source_key, destination_bucket, destination_key):
"""Move file from source to destination bucket"""
try:
# Copy to destination
s3_client.copy_object(
CopySource={'Bucket': source_bucket, 'Key': source_key},
Bucket=destination_bucket,
Key=destination_key,
MetadataDirective='COPY'
)
# Delete from source
s3_client.delete_object(
Bucket=source_bucket,
Key=source_key
)
logger.info(f"File moved from s3://{source_bucket}/{source_key} to s3://{destination_bucket}/{destination_key}")
return True
except ClientError as e:
logger.error(f"Error moving file: {str(e)}")
return False
def download_file(bucket, key):
"""Download file from S3"""
try:
response = s3_client.get_object(Bucket=bucket, Key=key)
return response['Body'].read()
except ClientError as e:
logger.error(f"Error downloading file: {str(e)}")
return None
def add_scan_metadata(bucket, key, scan_result):
"""Add scan metadata to file object tags"""
try:
tag_set = {
'TagSet': [
{'Key': 'Scanned', 'Value': 'true'},
{'Key': 'ScanResult', 'Value': 'CLEAN' if scan_result['clean'] else 'INFECTED'},
{'Key': 'ThreatLevel', 'Value': scan_result['threat_level']},
{'Key': 'ScanID', 'Value': scan_result.get('scan_id', 'NONE')}
]
}
s3_client.put_object_tagging(
Bucket=bucket,
Key=key,
Tagging=tag_set
)
except ClientError as e:
logger.error(f"Error adding metadata tags: {str(e)}")
def lambda_handler(event, context):
"""
Main Lambda handler for S3 file verification
Event structure:
{
'Records': [
{
's3': {
'bucket': {'name': 'bucket-name'},
'object': {'key': 'file-key'}
}
}
]
}
"""
logger.info(f"Processing event: {json.dumps(event)}")
try:
# Get SentinelOne credentials
api_url, api_key = get_sentinelone_credentials()
verifier = SentinelOneVerifier(api_url, api_key)
# Process each S3 record
results = []
for record in event.get('Records', []):
try:
source_bucket = record['s3']['bucket']['name']
source_key = record['s3']['object']['key']
logger.info(f"Processing file: s3://{source_bucket}/{source_key}")
# Download file
file_content = download_file(source_bucket, source_key)
if not file_content:
logger.error(f"Failed to download file: {source_key}")
results.append({
'file': source_key,
'status': 'ERROR',
'message': 'Failed to download file'
})
continue
# Get file name
file_name = source_key.split('/')[-1]
# Scan with SentinelOne
logger.info(f"Scanning file with SentinelOne: {file_name}")
scan_result = verifier.scan_file(source_key, file_name, file_content)
# Determine destination
if scan_result['clean']:
destination_bucket = VERIFIED_BUCKET
destination_folder = 'clean-files'
status = 'CLEAN'
else:
destination_bucket = INFECTED_BUCKET
destination_folder = 'infected-files'
status = 'INFECTED'
# Create destination key
destination_key = f"{destination_folder}/{os.path.basename(source_key)}"
# Move file to appropriate bucket
if move_file(source_bucket, source_key, destination_bucket, destination_key):
# Add scan metadata
add_scan_metadata(destination_bucket, destination_key, scan_result)
logger.info(f"File verification complete - Status: {status}")
results.append({
'file': source_key,
'status': status,
'destination': f"s3://{destination_bucket}/{destination_key}",
'threat_level': scan_result['threat_level'],
'threats': scan_result['threats']
})
else:
logger.error(f"Failed to move file: {source_key}")
results.append({
'file': source_key,
'status': 'ERROR',
'message': 'Failed to move file to destination'
})
except Exception as e:
logger.error(f"Error processing record: {str(e)}", exc_info=True)
results.append({
'file': record.get('s3', {}).get('object', {}).get('key', 'UNKNOWN'),
'status': 'ERROR',
'message': str(e)
})
return {
'statusCode': 200,
'body': json.dumps({
'message': 'File verification completed',
'results': results
})
}
except Exception as e:
logger.error(f"Fatal error in Lambda handler: {str(e)}", exc_info=True)
return {
'statusCode': 500,
'body': json.dumps({
'error': 'File verification failed',
'message': str(e)
})
}
Deployment Steps
Deploying this infrastructure requires careful preparation and validation. The process begins with assembling the Lambda deployment package, then uses Terraform to provision all AWS resources, and finally verifies that everything is configured correctly.
1. Prepare Lambda Deployment Package
Before Terraform can deploy your Lambda function, it needs to locate the function code packaged as a ZIP file. The Lambda function depends on external Python libraries (boto3 for AWS services and requests for HTTP communication), so these dependencies must be included in the deployment package. The process involves creating a temporary directory, installing all Python dependencies into that directory using pip, copying your Lambda function code, and finally creating a ZIP archive containing everything.
When you create the ZIP file, the structure is important. The handler specified in the Terraform configuration points to index.lambda_handler, which means Python must be able to find a module called index with a function called lambda_handler. This is typically at the root of the ZIP file. All dependencies must also be at the root level so Python can import them when the Lambda function executes.
# Create deployment directory
mkdir lambda-deployment
cd lambda-deployment
# Create requirements.txt
cat > requirements.txt <<EOF
boto3==1.26.137
requests==2.31.0
EOF
# Install dependencies
pip install -r requirements.txt -t ./
# Copy Lambda function
cp ../index.py ./
# Create ZIP file
zip -r ../lambda_function.zip .
cd ..
2. Deploy Infrastructure
Terraform is an infrastructure-as-code tool that manages the creation and configuration of AWS resources. The Terraform workflow has distinct phases: initialization, planning, and application. During initialization, Terraform downloads the AWS provider plugin and prepares your working directory. During planning, Terraform compares your configuration against existing AWS resources and creates an execution plan showing exactly what will be created, modified, or destroyed. This plan can be reviewed before applying, which is a critical safety feature preventing accidental infrastructure changes. During application, Terraform executes the plan, communicating with AWS APIs to create and configure all the resources you’ve defined.
Before applying the configuration, you must customize the variables to match your environment. Specifically, you need to provide your SentinelOne API endpoint and API key, which Terraform will use when creating the Secrets Manager secret that stores these credentials.
# Initialize Terraform
terraform init
# Plan deployment
terraform plan -out=tfplan
# Apply configuration
terraform apply tfplan
The terraform init command downloads the AWS provider and prepares your workspace. The terraform plan command analyzes your configuration and outputs what will happen—reviewing this plan before applying is critical to prevent accidents. The terraform apply command actually creates the resources.
3. Verify Deployment
After deployment completes, you should verify that all resources were created correctly. AWS CLI commands can confirm the state of key resources. Checking that S3 buckets exist, Lambda function is configured correctly, and CloudWatch logs are available ensures everything is ready for use.
# List S3 buckets
aws s3 ls
# Verify Lambda function
aws lambda get-function --function-name file-verification-processor
# Check CloudWatch logs
aws logs describe-log-groups
Workflow Walkthrough
The complete workflow begins the moment a user or application uploads a file to the quarantine bucket, and it ends with the file safely stored in one of two destination buckets depending on the threat assessment. Understanding each step in this workflow helps you troubleshoot issues and optimize performance.
Step 1: File Upload
The user or application initiates the process by uploading a file to the quarantine bucket. This can be done via the AWS Console, AWS CLI, AWS SDKs, or through any HTTP-based tool that supports S3. The file can be of any type and size (within S3 limits). Upon upload, a unique object key is assigned to the file based on the path provided. For example:
aws s3 cp malware.exe s3://file-quarantine-123456-us-east-1/uploads/
Once the upload completes, the file is immediately available in the bucket, and S3 begins processing the ObjectCreated event.
Step 2: S3 Event Triggered
S3 automatically detects the file creation and generates an ObjectCreated event. This event contains important metadata including the bucket name, complete object key (path), file size, timestamp, and other S3-specific information. The event is formatted as a JSON message that conforms to the S3 event notification schema. Unlike polling-based approaches that check the bucket at intervals, this event is generated almost instantaneously, typically within a few hundred milliseconds of the file upload completing. This means files are detected and begin processing with minimal delay.
Step 3: Lambda Invocation
S3 is configured to automatically invoke the Lambda function whenever the ObjectCreated event occurs. AWS handles this invocation automatically based on the S3 bucket notification configuration. The event JSON, containing all the file metadata, is passed to the Lambda function as the first argument. The Lambda runtime automatically scales to handle multiple files being uploaded concurrently. If multiple files are uploaded at the same time, multiple Lambda instances are spawned, each processing a different file independently. This concurrent processing capability ensures that the system doesn’t become a bottleneck even under heavy upload load.
Step 4: Threat Analysis
Inside the Lambda function, the SentinelOne verifier is initialized with the API credentials retrieved from Secrets Manager. The previously downloaded file is then sent to the SentinelOne Deep Visibility API for analysis. SentinelOne performs sophisticated threat analysis including signature matching, behavioral analysis, heuristic detection, and comparison against its massive global threat intelligence database. The analysis examines the file’s structure, embedded code, suspicious patterns, and known malware signatures. This process typically takes between 5-30 seconds depending on file size and SentinelOne’s current load. The API returns a comprehensive verdict including whether the file is clean, the threat level if threats are detected, and specific information about detected threats or malicious behaviors.
Step 5: File Movement
After receiving the SentinelOne verdict, Lambda determines the appropriate destination bucket. If SentinelOne identifies the file as clean, it is copied to the verified files bucket where legitimate business processes can access it. If SentinelOne detects threats or cannot complete the analysis, the file is moved to the infected files bucket as a precaution. The file is copied from the quarantine bucket to the appropriate destination bucket using S3 copy operations, which are highly efficient and don’t require re-uploading the entire file. Once the copy completes successfully, the original file is deleted from the quarantine bucket, ensuring that the quarantine bucket remains a temporary holding area rather than long-term storage.
Step 6: Metadata Tagging
Finally, Lambda adds S3 object tags to the file in its destination bucket. These tags include information about when the scan occurred, what the scan result was, the threat level assessment, and the SentinelOne scan ID. These tags are searchable and can be used for reporting, auditing, and automation. For example, you could use S3 Inventory reports to find all files tagged as INFECTED that were scanned more than 30 days ago. The tags become immutable metadata that travels with the file throughout its lifetime in S3, providing permanent audit trail information.
Here’s a practical example of the tagging payload:
aws s3api put-object-tagging \
--bucket file-verified-123456-us-east-1 \
--key clean-files/document.pdf \
--tagging 'TagSet=[{Key=Scanned,Value=true},{Key=ScanResult,Value=CLEAN},{Key=ThreatLevel,Value=SAFE},{Key=ScanID,Value=abc123xyz}]'
Monitoring and Alerting
Implementing comprehensive monitoring is critical for maintaining the health and security of your file verification pipeline. Without proper monitoring, infected files could slip through, or system failures could go unnoticed, allowing uploads to accumulate in the quarantine bucket. The monitoring strategy should include both operational metrics (is the system working?) and business metrics (how many files are being processed?).
CloudWatch Metrics
CloudWatch Metrics allow you to track quantitative measurements about your file verification pipeline. By publishing custom metrics from your Lambda function, you can gain visibility into how the system is performing. Metrics such as the number of files scanned, number of files detected as clean, number of files detected as infected, and average scan time help you understand system trends and identify anomalies.
The following function demonstrates how to publish metrics from your Lambda function:
def put_custom_metric(metric_name, value):
"""Put custom metric to CloudWatch"""
cloudwatch = boto3.client('cloudwatch')
cloudwatch.put_metric_data(
Namespace='FileVerification',
MetricData=[
{
'MetricName': metric_name,
'Value': value,
'Unit': 'Count'
}
]
)
CloudWatch Alarm for Failed Scans
CloudWatch Alarms enable proactive monitoring by automatically detecting when your system deviates from expected behavior. By creating an alarm that triggers when infected files are detected, you can immediately notify your security team. This alarm monitors the custom metric “FilesInfected” and triggers when the count exceeds zero in a five-minute window. The alarm sends a notification to an SNS topic, which can be configured to email security team members, send Slack notifications, or create tickets in incident management systems.
The following Terraform configuration creates a practical alarm that responds to threats:
# In Terraform
resource "aws_cloudwatch_metric_alarm" "failed_scans" {
alarm_name = "file-verification-failed-scans"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
metric_name = "FilesInfected"
namespace = "FileVerification"
period = "300"
statistic = "Sum"
threshold = "1"
alarm_description = "Alert when infected files are detected"
alarm_actions = [aws_sns_topic.alerts.arn]
}
Best Practices
1. Encrypt Sensitive Data
Encryption is fundamental to protecting sensitive information throughout your system. AWS KMS (Key Management Service) should be used to encrypt S3 buckets rather than relying on the default AES-256 encryption. KMS encryption gives you fine-grained control over who can decrypt your data and provides detailed audit logs of all decryption requests. API keys for SentinelOne are extremely sensitive and must never be stored in code or environment variables. Instead, always store credentials in AWS Secrets Manager, which provides encryption, automatic rotation capabilities, and comprehensive audit logging. When your Lambda function needs to access credentials, it retrieves them from Secrets Manager, which logs this access for security auditing purposes. Even in the unlikely event that a Lambda execution environment is compromised, the credentials are not directly accessible because they must be retrieved from Secrets Manager each time they’re needed.
2. Implement Retry Logic
Network failures and temporary service unavailability are inevitable in distributed systems. Implementing intelligent retry logic makes your system more resilient to transient failures. The tenacity library provides Python decorators that handle retries automatically with exponential backoff. Exponential backoff ensures that if a service is temporarily unavailable, you don’t hammer it with repeated requests that might make the situation worse. Instead, you wait progressively longer between attempts. For example, you might wait 2 seconds before the first retry, 4 seconds before the second, 8 seconds before the third, and so on. This gives temporary service disruptions time to resolve while still detecting permanent failures relatively quickly.
Here’s an example of implementing retry logic with exponential backoff:
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
def call_sentinelone_api(endpoint, data):
# API call with automatic retry
pass
3. Set Appropriate Timeouts
Timeouts are critical for preventing your system from hanging indefinitely when services are unavailable or responding slowly. Each operation in your pipeline should have a timeout appropriate to its nature. S3 operations are typically very fast and should timeout quickly if they’re not responding, allowing the system to fail fast and potentially retry. The SentinelOne API might take longer for large file analysis, so it deserves a more generous timeout. The Lambda function itself should have sufficient timeout to complete the entire workflow including download, scan, and upload, but not so long that failed executions consume excessive compute resources. A 300-second timeout for Lambda is typically appropriate for this use case, giving ample time for SentinelOne analysis even for larger files while still detecting serious problems within 5 minutes.
4. Enable Versioning
S3 versioning is a powerful feature that maintains the complete history of every file stored in your bucket. When versioning is enabled, every time you upload a new version of a file with the same key, S3 retains the previous versions as well. This enables several important capabilities: first, you can recover previous versions of files that were accidentally deleted or overwritten; second, you have a complete audit trail showing every modification to every file; third, you can implement compliance policies that retain versions for specific time periods. For file verification workflows, versioning provides protection against data loss and creates an immutable record of all file movements through your system.
5. Implement Dead Letter Handling
Despite best efforts to catch every error, some files will invariably fail processing for reasons you didn’t anticipate or couldn’t prevent. Rather than losing these files entirely, a dead letter queue (DLQ) pattern captures them separately for investigation. When a file fails verification for unexpected reasons—perhaps the SentinelOne API returns an error format you didn’t account for, or the file copy operation fails due to insufficient permissions—instead of ignoring the error or leaving the file in the quarantine bucket, you move it to a DLQ bucket. The DLQ bucket acts as a holding area where you can analyze failures, understand root causes, and take corrective action. You might batch process files from the DLQ daily, retrying them after fixing underlying issues.
Here’s an example of moving problematic files to a dead letter queue:
def move_to_dlq(bucket, key, reason):
"""Move problematic files to DLQ bucket"""
s3_client.copy_object(
CopySource={'Bucket': bucket, 'Key': key},
Bucket='file-dlq-bucket',
Key=f"dlq/{key}",
Metadata={'FailureReason': reason}
)
Security Considerations
Security must be woven throughout every layer of your file verification system. The defense-in-depth approach means that even if one security control fails, others still protect your system.
IAM Least Privilege
The principle of least privilege means granting the minimum permissions necessary to accomplish a task, no more. Rather than granting a Lambda function full S3 access, you should explicitly grant only the specific permissions it needs. The Lambda function reads files only from the quarantine bucket, so it should have s3:GetObject permission only on that bucket and its contents. It writes files only to the verified and infected buckets, so it should have s3:PutObject permission only on those specific buckets. It deletes files only from the quarantine bucket after they’ve been copied to a destination, so it should have s3:DeleteObject only on the quarantine bucket. This granular permission boundary means that if the Lambda function’s credentials are somehow compromised, an attacker cannot access files in the verified bucket, create files elsewhere, or perform other actions outside the explicitly granted permissions.
VPC Isolation (Optional)
For additional security in highly sensitive environments, you can run your Lambda function within a VPC (Virtual Private Cloud). This option is optional because it increases complexity and has minor performance implications, but it provides network isolation. When a Lambda function runs in a VPC, it does not have direct internet access unless you explicitly route its traffic through a NAT Gateway. This means the function cannot make direct HTTP requests to external APIs unless you configure the network routing appropriately. If you does configure VPC access with NAT Gateway for SentinelOne communication, you gain the ability to use security groups to control exactly what network traffic is allowed. This is particularly valuable in regulated environments where network isolation is required.
Here’s how to configure VPC isolation for your Lambda function:
resource "aws_lambda_function" "file_processor" {
...
vpc_config {
subnet_ids = var.private_subnets
security_group_ids = [aws_security_group.lambda.id]
}
}
Audit Logging
Comprehensive logging is essential for security, compliance, and troubleshooting. Every significant action in your file verification pipeline should be logged with sufficient detail to reconstruct what happened. When you log file scanning operations, include the bucket name, object key, file hash, scan timestamp, SentinelOne verdict, and username or role that initiated the action. These logs enable you to answer critical questions during incident response: Were any infected files served to users? Who uploaded which files? When did a particular system start misbehaving? Are there patterns in file uploads that suggest malicious activity? CloudWatch Logs retains these logs permanently (or for a configurable duration) and makes them searchable and analyzable. For compliance purposes, you can configure CloudWatch Logs to forward logs to long-term storage in S3 or to third-party security monitoring platforms.
logger.info(f"AUDIT: File scanned - Bucket: {bucket}, Key: {key}, Result: {result}")
Conclusion
This architecture provides a scalable, secure solution for automated file verification using AWS S3, Lambda, and SentinelOne. The event-driven design ensures files are scanned automatically upon upload, with clear separation between verified and infected files. The Terraform configuration makes it easy to reproduce this infrastructure across environments while maintaining security best practices. By deploying this solution, you eliminate the risk of malware spreading through user-uploaded files while maintaining a seamless user experience.
The solution is designed to be immediately useful in production while remaining extensible for future requirements. You can enhance it by adding additional workflow steps—for example, immediately notifying application owners when clean files become available, implementing automated quarantine policies for certain file types, or integrating with downstream systems that need guaranteed-clean files. CloudWatch metrics and logs provide the observability necessary to understand system behavior, identify bottlenecks, and optimize costs.
The cost of running this solution is minimal, typically less than $10 per month for small volumes and scaling predictably as upload volume increases. You pay for S3 storage (inexpensive), S3 data transfer (minimal within AWS), Lambda execution time (typically milliseconds per file), and SentinelOne API calls (depends on your SentinelOne contract). There are no idle infrastructure costs and no minimum commitments.
Most importantly, this solution protects your organization from a real and growing threat. The FBI and CISA regularly issue warnings about ransomware campaigns that begin with compromised file uploads. By implementing automated file verification, you significantly reduce the attack surface and demonstrate to security auditors that file upload processes have appropriate controls. Deploy this solution today to add intelligent threat detection to your file upload workflows.