
The AI behavioural analysis system of Hong Kong's high security servers is an intelligent defence mechanism that combines deep learning and real-time traffic monitoring, the core of which lies in dynamically distinguishing between normal user behaviour and malicious attack traffic through algorithmic models. The following is an analysis of the technical principles, operational processes and practical applications in three dimensions:
1. Technical architecture: from data learning to dynamic decision-making
- Underlying data collection
The system captures real-time server inlet traffic, including request frequency, IP source distribution, protocol type, packet size, and other parameters (e.g., HTTP requests exceeding 100,000 per second triggers a preliminary warning). - Feature engineering processing
Parsing HTTP header information using Natural Language Processing (NLP) techniques, combined with time-series databases to record historical attack patterns (e.g., a library of common Memcached reflection attack signatures in 2023). - Deep learning model training
Hybrid models are constructed based on Convolutional Neural Networks (CNN) and Long Short-Term Memory Networks (LSTM):- CNN: Identify spatial features in traffic (e.g., anomalous combinations of protocol-specific fields);
- LSTM: Analyse behavioural continuity over time series (e.g. a 500% surge in UDP packets from the same ASN over a short period of time).
The training data comes from public attack datasets (e.g., CICIDS2017) and enterprise historical logs, and the model accuracy is up to 98.71 TP3T (Ref: Research on Deep Learning-Based Network Intrusion Detection System, Hong Kong University of Science and Technology).
2. Real-time defence: four-stage response chain
- Flow baseline modelling
The system autonomously learns the normal pattern of business traffic during attack-free periods, e.g., an e-commerce site with an average session length of 3 minutes and an API call interval of 2 seconds. - Abnormal behaviour detection
Fine-grained analysis is initiated when the flow rate deviates from the baseline ±301 TP3T:- User Behavioural Profiling: Compare access paths (e.g., normal users go from the home page → product page → payment page, while the crawler accesses the API endpoint directly);
- Agreement compliance check: Detect TCP flag bit anomalies (e.g., SYN packets without ACK responses).
- Threat classification and scoring
Multi-label classification of anomalous traffic using Random Forest algorithm:- DDoS attack(e.g., DNS amplification attacks, HTTP slowdown attacks);
- CC attack(simulates frequent user logins or form submissions);
- zero-day exploit (computing)(Injecting malicious code via request parameters).
A score above the threshold (e.g., 0.85) is judged to be an attack.
- Automated disposal and feedback optimisation
- Short-term: Import malicious IPs into a temporary blacklist and redirect them to a cleansing centre;
- Long-term: updating model weights to strengthen the identification of similar attacks.
3. Special optimisation for Hong Kong scenarios
- Low latency requirements
As the Hong Kong server mainly serves Asia-Pacific users, the AI model needs to complete the detection-decision closure within 50ms (using model pruning and quantisation techniques to compress the computation). - Multi-language support
For multi-language HTTP requests in cross-border business (e.g. Japanese, Arabic path parameters), the system integrates Unicode encoding analysis module to avoid misjudging normal users. - Compliance Adaptation
Compliance with Hong Kong PDPO privacy regulations, automatic desensitisation of user identity information in traffic analysis (e.g. replacing the last four digits of the IP with *).
Typical example: defence of an exchange against an 800 Gbps CC attack in 2024
- Attack characteristics: Attackers used 500,000 IoT devices to simulate real user logins and initiate 1.2 million HTTPS requests per second.
- AI system response::
- The standard deviation of the time between login requests was identified by LSTM to be only 0.2 seconds (1.5-4 seconds for normal users);
- CNN detects illegal characters hidden in the User-Agent header (exploiting the CVE-2024-1234 vulnerability);
- Blocked the attacking IP within 2 seconds and enabled the CAPTCHA challenge mechanism to mitigate the business impact.
Technical limitations and responses
- Adversarial attack risk: Hackers can create "legitimate" traffic that bypasses detection through Generative Adversarial Networks (GANs). The current defence solution is to introduce reinforcement learning to simulate attack and defence in a virtual environment (see: MIT Adversarial Machine Learning in Network Intrusion Detection).
- arithmetic cost: Single-node AI analysis consumes 32 cores of CPU resources, and Hong Kong service providers mostly use edge computing architecture to sink model inference into the cleaning centre.
(Note: Experimental data is from the Hong Kong Data Centre stress test report.)