Book a CallContact Us
Back to all posts
June 7, 2026

How We Built an Offline-to-Cloud AI Relay Using Bluetooth and GPT-4o

How We Built an Offline-to-Cloud AI Relay Using Bluetooth and GPT-4o

How We Built an Offline-to-Cloud AI Relay Using Bluetooth and GPT-4o

In secure enterprise environments-such as financial trading floors, sensitive R&D labs, and defense-adjacent settings-workstations are frequently restricted from accessing the public internet. While this "air-gapping" or strict network segmentation mitigates data exfiltration risks, it renders modern cloud-hosted Large Language Models (LLMs) completely inaccessible. Engineers and analysts are cut off from tools like OpenAI’s GPT-4o, hindering productivity.

At Seven Labs, we were tasked with solving this exact bottleneck for a client operating in a highly restricted network zone. The requirement was clear: enable workstations running on a zero-internet segment to securely query cloud-based LLMs without modifying the workstation’s firewall policies or introducing unauthorized hardware like Wi-Fi dongles.

Our solution was the Bluetooth AI Relay-an edge-to-cloud bridge that routes local PC requests through an Android-based RFCOMM relay to GPT-4o, using standard Bluetooth protocols. Here is the technical breakdown of how we designed, implemented, and hardened this system in production.


1. System Architecture: The Edge-to-Cloud Bridge

The architecture consists of three core components:

  1. The Client (Offline PC): A local service running on the workstation that exposes a loopback API (e.g., http://localhost:8080/v1/chat/completions) conforming to the standard OpenAI API specification.
  2. The Relay (Android Mobile Device): A React Native application running a specialized Kotlin foreground service. The Android device has access to both cellular data (LTE/5G) and Bluetooth, serving as the bridge.
  3. The Cloud (OpenAI GPT-4o): The target LLM backend reached via HTTPS.
+-------------+                    +-------------------------+                    +-----------------+
|             |    Bluetooth       |  Android Relay Device   |    Cellular WAN    |                 |
|  Offline PC |  (RFCOMM Socket)   |                         |  (HTTPS Client)    |  OpenAI GPT-4o  |
|  [Client]   |<==================>| [Kotlin Service]        |------------------->|  API Endpoint   |
|             |                    | [React Native Engine]   |                    |                 |
+-------------+                    +-------------------------+                    +-----------------+

Why RFCOMM?

When transmitting raw JSON payloads of prompt queries and responses, we needed a stream-oriented, reliable transport protocol. While Bluetooth Low Energy (BLE) with GATT attributes is excellent for low-throughput telemetry, it is highly unsuited for larger text blocks due to its strict Maximum Transmission Unit (MTU) limitations and packet fragmentation overhead.

We chose RFCOMM (Radio Frequency Communication), which emulates an RS-232 serial port over the L2CAP protocol. RFCOMM handles packet sequencing, flow control, and retransmission natively, providing a reliable stream-oriented socket (java.net.Socket-like interface) capable of sustaining the high-throughput text streaming required for LLM prompts and responses.


2. Implementing the Android RFCOMM Server in Kotlin

To ensure that the Android application could handle incoming Bluetooth connections reliably, we bypassed standard React Native wrapper libraries-which often suffer from memory leaks and lack support for background persistence-and implemented the Bluetooth stack directly in Kotlin.

The Bluetooth Server Thread

The Bluetooth server runs in a dedicated thread, listening on a specific Universally Unique Identifier (UUID):

package com.sevenlabs.airelay

import android.bluetooth.BluetoothAdapter
import android.bluetooth.BluetoothServerSocket
import android.bluetooth.BluetoothSocket
import android.util.Log
import java.io.IOException
import java.util.UUID

class BluetoothServerThread(
    private val adapter: BluetoothAdapter,
    private val onConnectionEstablished: (BluetoothSocket) -> Unit
) : Thread() {

    private val serverSocket: BluetoothServerSocket? by lazy(LazyThreadSafetyMode.SYNCHRONIZED) {
        adapter.listenUsingRfcommWithServiceRecord(
            "SevenLabsAIRelay",
            UUID.fromString("4a8b8c2d-9e0f-11ed-a8fc-0242ac120002")
        )
    }

    private var shouldKeepListening = true

    override fun run() {
        name = "SevenLabs-RFCOMM-Listener"
        Log.i("AIRelay", "RFCOMM Server Socket listening...")

        while (shouldKeepListening) {
            val socket: BluetoothSocket = try {
                serverSocket?.accept()
            } catch (e: IOException) {
                Log.e("AIRelay", "Server Socket accept failed", e)
                break
            }

            socket?.let {
                Log.i("AIRelay", "Incoming RFCOMM client connection accepted")
                onConnectionEstablished(it)
            }
        }
    }

    fun cancel() {
        try {
            shouldKeepListening = false
            serverSocket?.close()
        } catch (e: IOException) {
            Log.e("AIRelay", "Could not close server socket", e)
        }
    }
}

3. Persistent Operation: Kotlin Foreground Services & Wake-Lock Management

One of the steepest engineering challenges on modern Android versions (Android 12+) is battery optimization. If the mobile device's screen turns off or the app is minimized, the Android OS puts the CPU into a deep sleep state (Doze Mode) and terminates background network sockets.

To guarantee uninterrupted operations, Seven Labs implemented two crucial mechanisms:

  1. Kotlin Foreground Service: Placing the RFCOMM server and API client inside an Android Foreground Service. This registers the app as a system-recognized persistent process, showing a persistent status bar notification.
  2. Wake-Locks and Wi-Fi Locks: Explicitly telling the kernel scheduler to keep the CPU awake and cellular radios active during an active session.

The Foreground Service Implementation

Below is the core of the foreground service handling thread lifecycle and notifications:

package com.sevenlabs.airelay

import android.app.Notification
import android.app.NotificationChannel
import android.app.NotificationManager
import android.app.PendingIntent
import android.app.Service
import android.content.Context
import android.content.Intent
import android.os.Build
import android.os.IBinder
import android.os.PowerManager
import androidx.core.app.NotificationCompat

class AIRelayService : Service() {

    private var wakeLock: PowerManager.WakeLock? = null
    private var serverThread: BluetoothServerThread? = null

    override fun onCreate() {
        super.onCreate()
        acquireWakeLock()
        startForegroundService()
    }

    private fun acquireWakeLock() {
        val powerManager = getSystemService(Context.POWER_SERVICE) as PowerManager
        wakeLock = powerManager.newWakeLock(
            PowerManager.PARTIAL_WAKE_LOCK,
            "SevenLabs::AIRelayWakeLock"
        ).apply {
            acquire(30 * 60 * 1000L) // 30-minute safety limit
        }
    }

    private fun startForegroundService() {
        val channelId = "seven_labs_ai_relay"
        val channelName = "AI Relay Foreground Service"

        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
            val channel = NotificationChannel(channelId, channelName, NotificationManager.IMPORTANCE_LOW)
            val manager = getSystemService(Context.NOTIFICATION_SERVICE) as NotificationManager
            manager.createNotificationChannel(channel)
        }

        val notificationIntent = Intent(this, MainActivity::class.java)
        val pendingIntent = PendingIntent.getActivity(
            this, 0, notificationIntent,
            PendingIntent.FLAG_IMMUTABLE or PendingIntent.FLAG_UPDATE_CURRENT
        )

        val notification: Notification = NotificationCompat.Builder(this, channelId)
            .setContentTitle("Seven Labs AI Relay Active")
            .setContentText("Routing Bluetooth RFCOMM data to GPT-4o...")
            .setSmallIcon(R.drawable.ic_notification)
            .setContentIntent(pendingIntent)
            .build()

        startForeground(1, notification)
    }

    override fun onStartCommand(intent: Intent?, flags: Int, startId: Int): Int {
        // Start listening over Bluetooth
        val adapter = BluetoothAdapter.getDefaultAdapter()
        serverThread = BluetoothServerThread(adapter) { socket ->
            // Route stream data
            ConnectionHandler(socket).start()
        }
        serverThread?.start()
        return START_STICKY
    }

    override fun onDestroy() {
        serverThread?.cancel()
        wakeLock?.let {
            if (it.isHeld) it.release()
        }
        super.onDestroy()
    }

    override fun onBind(intent: Intent?): IBinder? = null
}

4. Structuring the Data Payload and Protocol

Because RFCOMM operates as a raw byte stream, we had to define an application-level framing protocol to segment individual request and response packets.

We designed a lightweight message frame format:

  • Magic Bytes (4 bytes): SLAR (Seven Labs AI Relay) to validate packet origins.
  • Payload Length (4 bytes): Big-endian integer specifying the exact size of the payload.
  • Payload Type (1 byte): Indicates if the packet is raw text, SSE (Server-Sent Events) chunk, metadata, or an error code.
  • Encrypted Payload (Variable): AES-GCM encrypted JSON data.
+------------+------------------+--------------+-----------------------+
| Magic (4B) | Length (4B, Int) | Type (1B, B) | Encrypted Payload (N) |
+------------+------------------+--------------+-----------------------+

When the Client on the offline PC sends a completion prompt, the local daemon packages it into this frame, transmits it over the RFCOMM socket, and blocks waiting for response frames.

On the Android Relay side, the Kotlin socket reader reads the length prefix, reads the specified number of bytes, decrypts the payload, and forwards the HTTP request to OpenAI's endpoint. To support token streaming, we parse the Server-Sent Events (SSE) data chunks coming back from OpenAI, frame them as SSE Chunk types, and write them sequentially back into the Bluetooth socket stream.


5. Security Architecture: Zero-Trust over Bluetooth

Transmitting corporate data over Bluetooth raises significant security concerns. Bluetooth connections are susceptible to eavesdropping and Man-in-the-Middle (MitM) attacks. To make this relay viable for enterprise deployments, Seven Labs added an application-level cryptography layer.

End-to-End Encryption (E2EE)

Even if the Bluetooth pairing layer is compromised, the data payload remains secure.

  1. Key Exchange: When the offline PC initiates a connection, it performs an Elliptic-Curve Diffie-Hellman (ECDH) key exchange over the raw Bluetooth socket with the Android device.
  2. Ephemeral Session Key: Both endpoints derive a shared symmetric key (AES-256-GCM) that is unique to that specific connection session.
  3. Payload Encryption: Every data frame payload is encrypted using the session key, with an initialization vector (IV) generated for each frame. This prevents replay attacks and sniffing.

6. Performance and Latency Tuning

Our benchmarking yielded the following performance metrics in production:

MetricDirect Wi-Fi (Control)RFCOMM Relay (Without Streaming)RFCOMM Relay (With SSE Streaming)
Time to First Token (TTFT)~320ms~980ms~410ms
Throughput (Tokens/Sec)654258
Max Payload SizeUnlimited5 MBStreamed

Optimizing Throughput

Because Bluetooth bandwidth is constrained compared to Wi-Fi, streaming responses token-by-token is essential. By feeding SSE chunks back to the client as they arrive from OpenAI’s edge, we cut down perceived latency (TTFT) by over 50%.

Furthermore, we applied Gzip compression to prompt inputs exceeding 20KB, reducing Bluetooth transmission time and bypassing bottlenecks on the RFCOMM buffer.


7. Enterprise Frequently Asked Questions

Does this violate air-gapping principles?

The system acts as a strict protocol proxy. The offline workstation has no IP-level path to the cellular network, preventing general internet access, side-channel port scans, or reverse tunnel shell vulnerabilities. Only well-formed application-level SLAR frames are permitted through the interface.

How does battery consumption scale on the relay device?

Operating the Bluetooth radio and LTE radio concurrently consumes roughly 8% battery per hour of continuous processing. By leveraging Android’s PowerManager Wake-Locks selectively-only holding wake-locks during active socket sessions and entering idle states during quiet hours-we minimized drain.

How is token accounting managed?

All usage and authorization keys are stored on the Android Relay app or fetched from an enterprise key server. Individual user logins can be authenticated locally on the device prior to Diffie-Hellman negotiation.


Technical SEO Schema & Internal Links


Build Secure, Edge-to-Cloud Systems with Seven Labs

Navigating the intersection of advanced AI technologies and rigorous corporate security controls requires seasoned system architects. Whether you need an air-gapped LLM deployment, high-performance edge computing, or secure IoT relays, Seven Labs has the engineering expertise to design and deploy compliant solutions.

Contact Seven Labs' Engineering Team to discuss your organization's custom AI and infrastructure needs.

Seven Labs Service

AI Agent Development & RAG Pipelines

We build secure edge-to-cloud AI systems. Explore our AI services →
Loading...

Read Next

Edge AI vs Cloud AI: Choosing the Right Architecture for Enterprise Systems

An in-depth systems engineering guide comparing Edge AI and Cloud AI. Learn about quantization, infe...

Read article

Why Your Automation ROI is Flawed (And How to Fix It)

If you think time saved equals money earned, your automation ROI calculation is broken. Learn how to...

Read article
Chat with us