How We Built an Offline-to-Cloud AI Relay Using Bluetooth and GPT-4o
How We Built an Offline-to-Cloud AI Relay Using Bluetooth and GPT-4o
In secure enterprise environments-such as financial trading floors, sensitive R&D labs, and defense-adjacent settings-workstations are frequently restricted from accessing the public internet. While this "air-gapping" or strict network segmentation mitigates data exfiltration risks, it renders modern cloud-hosted Large Language Models (LLMs) completely inaccessible. Engineers and analysts are cut off from tools like OpenAI’s GPT-4o, hindering productivity.
At Seven Labs, we were tasked with solving this exact bottleneck for a client operating in a highly restricted network zone. The requirement was clear: enable workstations running on a zero-internet segment to securely query cloud-based LLMs without modifying the workstation’s firewall policies or introducing unauthorized hardware like Wi-Fi dongles.
Our solution was the Bluetooth AI Relay-an edge-to-cloud bridge that routes local PC requests through an Android-based RFCOMM relay to GPT-4o, using standard Bluetooth protocols. Here is the technical breakdown of how we designed, implemented, and hardened this system in production.
1. System Architecture: The Edge-to-Cloud Bridge
The architecture consists of three core components:
- The Client (Offline PC): A local service running on the workstation that exposes a loopback API (e.g.,
http://localhost:8080/v1/chat/completions) conforming to the standard OpenAI API specification. - The Relay (Android Mobile Device): A React Native application running a specialized Kotlin foreground service. The Android device has access to both cellular data (LTE/5G) and Bluetooth, serving as the bridge.
- The Cloud (OpenAI GPT-4o): The target LLM backend reached via HTTPS.
+-------------+ +-------------------------+ +-----------------+
| | Bluetooth | Android Relay Device | Cellular WAN | |
| Offline PC | (RFCOMM Socket) | | (HTTPS Client) | OpenAI GPT-4o |
| [Client] |<==================>| [Kotlin Service] |------------------->| API Endpoint |
| | | [React Native Engine] | | |
+-------------+ +-------------------------+ +-----------------+
Why RFCOMM?
When transmitting raw JSON payloads of prompt queries and responses, we needed a stream-oriented, reliable transport protocol. While Bluetooth Low Energy (BLE) with GATT attributes is excellent for low-throughput telemetry, it is highly unsuited for larger text blocks due to its strict Maximum Transmission Unit (MTU) limitations and packet fragmentation overhead.
We chose RFCOMM (Radio Frequency Communication), which emulates an RS-232 serial port over the L2CAP protocol. RFCOMM handles packet sequencing, flow control, and retransmission natively, providing a reliable stream-oriented socket (java.net.Socket-like interface) capable of sustaining the high-throughput text streaming required for LLM prompts and responses.
2. Implementing the Android RFCOMM Server in Kotlin
To ensure that the Android application could handle incoming Bluetooth connections reliably, we bypassed standard React Native wrapper libraries-which often suffer from memory leaks and lack support for background persistence-and implemented the Bluetooth stack directly in Kotlin.
The Bluetooth Server Thread
The Bluetooth server runs in a dedicated thread, listening on a specific Universally Unique Identifier (UUID):
package com.sevenlabs.airelay
import android.bluetooth.BluetoothAdapter
import android.bluetooth.BluetoothServerSocket
import android.bluetooth.BluetoothSocket
import android.util.Log
import java.io.IOException
import java.util.UUID
class BluetoothServerThread(
private val adapter: BluetoothAdapter,
private val onConnectionEstablished: (BluetoothSocket) -> Unit
) : Thread() {
private val serverSocket: BluetoothServerSocket? by lazy(LazyThreadSafetyMode.SYNCHRONIZED) {
adapter.listenUsingRfcommWithServiceRecord(
"SevenLabsAIRelay",
UUID.fromString("4a8b8c2d-9e0f-11ed-a8fc-0242ac120002")
)
}
private var shouldKeepListening = true
override fun run() {
name = "SevenLabs-RFCOMM-Listener"
Log.i("AIRelay", "RFCOMM Server Socket listening...")
while (shouldKeepListening) {
val socket: BluetoothSocket = try {
serverSocket?.accept()
} catch (e: IOException) {
Log.e("AIRelay", "Server Socket accept failed", e)
break
}
socket?.let {
Log.i("AIRelay", "Incoming RFCOMM client connection accepted")
onConnectionEstablished(it)
}
}
}
fun cancel() {
try {
shouldKeepListening = false
serverSocket?.close()
} catch (e: IOException) {
Log.e("AIRelay", "Could not close server socket", e)
}
}
}
3. Persistent Operation: Kotlin Foreground Services & Wake-Lock Management
One of the steepest engineering challenges on modern Android versions (Android 12+) is battery optimization. If the mobile device's screen turns off or the app is minimized, the Android OS puts the CPU into a deep sleep state (Doze Mode) and terminates background network sockets.
To guarantee uninterrupted operations, Seven Labs implemented two crucial mechanisms:
- Kotlin Foreground Service: Placing the RFCOMM server and API client inside an Android Foreground Service. This registers the app as a system-recognized persistent process, showing a persistent status bar notification.
- Wake-Locks and Wi-Fi Locks: Explicitly telling the kernel scheduler to keep the CPU awake and cellular radios active during an active session.
The Foreground Service Implementation
Below is the core of the foreground service handling thread lifecycle and notifications:
package com.sevenlabs.airelay
import android.app.Notification
import android.app.NotificationChannel
import android.app.NotificationManager
import android.app.PendingIntent
import android.app.Service
import android.content.Context
import android.content.Intent
import android.os.Build
import android.os.IBinder
import android.os.PowerManager
import androidx.core.app.NotificationCompat
class AIRelayService : Service() {
private var wakeLock: PowerManager.WakeLock? = null
private var serverThread: BluetoothServerThread? = null
override fun onCreate() {
super.onCreate()
acquireWakeLock()
startForegroundService()
}
private fun acquireWakeLock() {
val powerManager = getSystemService(Context.POWER_SERVICE) as PowerManager
wakeLock = powerManager.newWakeLock(
PowerManager.PARTIAL_WAKE_LOCK,
"SevenLabs::AIRelayWakeLock"
).apply {
acquire(30 * 60 * 1000L) // 30-minute safety limit
}
}
private fun startForegroundService() {
val channelId = "seven_labs_ai_relay"
val channelName = "AI Relay Foreground Service"
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
val channel = NotificationChannel(channelId, channelName, NotificationManager.IMPORTANCE_LOW)
val manager = getSystemService(Context.NOTIFICATION_SERVICE) as NotificationManager
manager.createNotificationChannel(channel)
}
val notificationIntent = Intent(this, MainActivity::class.java)
val pendingIntent = PendingIntent.getActivity(
this, 0, notificationIntent,
PendingIntent.FLAG_IMMUTABLE or PendingIntent.FLAG_UPDATE_CURRENT
)
val notification: Notification = NotificationCompat.Builder(this, channelId)
.setContentTitle("Seven Labs AI Relay Active")
.setContentText("Routing Bluetooth RFCOMM data to GPT-4o...")
.setSmallIcon(R.drawable.ic_notification)
.setContentIntent(pendingIntent)
.build()
startForeground(1, notification)
}
override fun onStartCommand(intent: Intent?, flags: Int, startId: Int): Int {
// Start listening over Bluetooth
val adapter = BluetoothAdapter.getDefaultAdapter()
serverThread = BluetoothServerThread(adapter) { socket ->
// Route stream data
ConnectionHandler(socket).start()
}
serverThread?.start()
return START_STICKY
}
override fun onDestroy() {
serverThread?.cancel()
wakeLock?.let {
if (it.isHeld) it.release()
}
super.onDestroy()
}
override fun onBind(intent: Intent?): IBinder? = null
}
4. Structuring the Data Payload and Protocol
Because RFCOMM operates as a raw byte stream, we had to define an application-level framing protocol to segment individual request and response packets.
We designed a lightweight message frame format:
- Magic Bytes (4 bytes):
SLAR(Seven Labs AI Relay) to validate packet origins. - Payload Length (4 bytes): Big-endian integer specifying the exact size of the payload.
- Payload Type (1 byte): Indicates if the packet is raw text, SSE (Server-Sent Events) chunk, metadata, or an error code.
- Encrypted Payload (Variable): AES-GCM encrypted JSON data.
+------------+------------------+--------------+-----------------------+
| Magic (4B) | Length (4B, Int) | Type (1B, B) | Encrypted Payload (N) |
+------------+------------------+--------------+-----------------------+
When the Client on the offline PC sends a completion prompt, the local daemon packages it into this frame, transmits it over the RFCOMM socket, and blocks waiting for response frames.
On the Android Relay side, the Kotlin socket reader reads the length prefix, reads the specified number of bytes, decrypts the payload, and forwards the HTTP request to OpenAI's endpoint. To support token streaming, we parse the Server-Sent Events (SSE) data chunks coming back from OpenAI, frame them as SSE Chunk types, and write them sequentially back into the Bluetooth socket stream.
5. Security Architecture: Zero-Trust over Bluetooth
Transmitting corporate data over Bluetooth raises significant security concerns. Bluetooth connections are susceptible to eavesdropping and Man-in-the-Middle (MitM) attacks. To make this relay viable for enterprise deployments, Seven Labs added an application-level cryptography layer.
End-to-End Encryption (E2EE)
Even if the Bluetooth pairing layer is compromised, the data payload remains secure.
- Key Exchange: When the offline PC initiates a connection, it performs an Elliptic-Curve Diffie-Hellman (ECDH) key exchange over the raw Bluetooth socket with the Android device.
- Ephemeral Session Key: Both endpoints derive a shared symmetric key (AES-256-GCM) that is unique to that specific connection session.
- Payload Encryption: Every data frame payload is encrypted using the session key, with an initialization vector (IV) generated for each frame. This prevents replay attacks and sniffing.
6. Performance and Latency Tuning
Our benchmarking yielded the following performance metrics in production:
| Metric | Direct Wi-Fi (Control) | RFCOMM Relay (Without Streaming) | RFCOMM Relay (With SSE Streaming) |
|---|---|---|---|
| Time to First Token (TTFT) | ~320ms | ~980ms | ~410ms |
| Throughput (Tokens/Sec) | 65 | 42 | 58 |
| Max Payload Size | Unlimited | 5 MB | Streamed |
Optimizing Throughput
Because Bluetooth bandwidth is constrained compared to Wi-Fi, streaming responses token-by-token is essential. By feeding SSE chunks back to the client as they arrive from OpenAI’s edge, we cut down perceived latency (TTFT) by over 50%.
Furthermore, we applied Gzip compression to prompt inputs exceeding 20KB, reducing Bluetooth transmission time and bypassing bottlenecks on the RFCOMM buffer.
7. Enterprise Frequently Asked Questions
Does this violate air-gapping principles?
The system acts as a strict protocol proxy. The offline workstation has no IP-level path to the cellular network, preventing general internet access, side-channel port scans, or reverse tunnel shell vulnerabilities. Only well-formed application-level SLAR frames are permitted through the interface.
How does battery consumption scale on the relay device?
Operating the Bluetooth radio and LTE radio concurrently consumes roughly 8% battery per hour of continuous processing. By leveraging Android’s PowerManager Wake-Locks selectively-only holding wake-locks during active socket sessions and entering idle states during quiet hours-we minimized drain.
How is token accounting managed?
All usage and authorization keys are stored on the Android Relay app or fetched from an enterprise key server. Individual user logins can be authenticated locally on the device prior to Diffie-Hellman negotiation.
Technical SEO Schema & Internal Links
- Keywords: AI Relay, Offline Bluetooth AI, React Native Android, Kotlin foreground service, GPT-4o RFCOMM, secure AI systems.
- Internal Linking Opportunities:
- Learn more about our Custom AI Development services and how we design bespoke systems.
- Review our expertise in network hardening through VAPT Audits and Penetration Testing.
- Check out our comprehensive portfolio of case studies on Enterprise Software Development.
Build Secure, Edge-to-Cloud Systems with Seven Labs
Navigating the intersection of advanced AI technologies and rigorous corporate security controls requires seasoned system architects. Whether you need an air-gapped LLM deployment, high-performance edge computing, or secure IoT relays, Seven Labs has the engineering expertise to design and deploy compliant solutions.
Contact Seven Labs' Engineering Team to discuss your organization's custom AI and infrastructure needs.
Seven Labs Service
AI Agent Development & RAG Pipelines

