Shirkadda Data Pipeline Edge Intelligence: Text to Structured Entities in milliseconds. Sida loo isticmaali karaa Sida loo yaqaan "Inference Tax" waxaa loo yaqaan "Inference Tax" - dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha. FogAI Markaas ka mid ah dhismaha iyo dhismaha, waxaa loo isticmaali karaa in ay ku saabsan dhismaha iyo dhismaha iyo dhismaha. "Extract all the field units, locations, and timestamps from the following text." Waxaa laga yaabaa in ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah: Taageerada Inference: Waayo, wax soo saarka badan oo ku saabsan 8B-ka, waxaa loo soo saarka batari ah, waxaa loo soo saarka VRAM-ka, oo loo soo saarka latentity (300ms+ per query) oo kaliya si ay u soo saarka JSON. Halucinations: LLMs waa generative. Waxay ku raaxeen oo token ka dibna, oo ay ka mid ah mid ka mid ah macluumaadka dhismaha iyo macluumaadka ah. Sida loo isticmaali karaa, waxaan u isticmaali karaa mid ka mid ah Marka loo isticmaalo model (parameters 194M). Ku xiran Sida loo yaabaa, waxaa loo yaabaa in ka mid ah wax soo saarka ah oo ka mid ah wax soo saarka ah. , this layer bridges the gap between raw text streams and structured actionable data—all without a single Python wrapper. Knowledge Extraction Layer knowledgator/gliner-bi-base-v2.0 MNN Sida loo yaqaan "magic" waxaa loo yaqaan "magic" waxaa loo yaqaan "magic". Shirkadda Bi-Encoder Qalabka NER-ka ah waxay u baahan tahay in ay ku yaalaa entities (e.g., Haku Haku Qalabka dhismaha iyo dhismaha dhismaha iyo dhismaha dhismaha Marka modelka waxaa la xira. PERSON ORG LOC WELDING DEFECT RADIO FREQUENCY GLiNER (Generalist and Lightweight Named Entity Recognition) waxaa loo isticmaali karaa Waxa uu ka mid ah ka mid ah wax soo saarka: Bi-Encoder Architecture The Text Encoder: Dhammaan dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha. The Label Encoder: waxaa loo isticmaalaa in la soo saarka entities aad u aragto. Markaas oo ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah? Caching. Nadiif ah oo ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah. Markaas ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah. . ['worker', 'forklift', 'safety_vest', 'pallet'] FogAi caches the Label Embeddings in RAM Waayo, waxaa laga yaabaa in la soo bandhigi karaa in ay ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah. Marka aad u baahan tahay 5 ama 500. Constant-Time Inference Nala soo xiriir Data Flow: Zero Python JNI iyo gRPC waxaa loo isticmaali karaa in la isticmaali karaa MNN inference. The workflow is entirely devoid of heavy Python runtime overhead: Raw Text Ingest -> A raw string waxay ku yaalaa at the Vert.x Gateway. JNI / C++ Hand-off -> String waxaa ka mid ah ka mid ah buffers muujinta off-heap. MNN Text Encoder -> The gliner-bi-base-v2.0 ONNX graph is executed via the MNN runtime (ka mid ka mid ah wax soo saarka oo dhan ee CPUs Edge iyo NPUs). Text is converted into a high-dimension vector space. Vector Dot Product -> Engine C++ waxaa loo helo matrix ugu caan ah ee Product Dot oo ka mid ah New Text Embeddings iyo Label Embeddings. Structured Output -> Qalabka JSON ah oo ku habboon ah oo ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah. Waxaad ka mid ah wax soo saarka in ay ku saabsan wax soo saarka iyo wax soo saarka. Qalabka Qalabka Inference Tax Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Ma waxaad ka dhigi karaa in ay ka mid ah wax soo saarka ah oo ka mid ah wax soo saarka iyo wax soo saarka. Follower ee Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Sida loo yaqaan Standard sentence. pycompare FogAi ['animal', 'location', 'time', 'date'] Waayo, waxaan ka mid ahaysaa 3 oo ka mid ah in ay ka mid ah in ay ka mid ah: The Heavyweight (General LLM): Qwen2.5-0.5B-Instruct The Specialized Heavyweight: numind / NuExtract-1.5 (a dhismaha dhismaha dhismaha dhismaha) The Agile Bi-Encoder (Engine ee FogAi) : GLiNER-194M Sida loo yaabaa, waxaa laga yaabaa in ay ka mid ah wax soo saarka ah oo ka mid ah wax soo saarka iyo wax soo saarka. Qalabka Qalabka Qalabka Qalabka ( ) pycompare/test_llm_perf.py Marka: Qwen2.5-0.5B-Instruct Shirkadda: Generative Causal LM Qalabka Qalabka Qalabka: 53 Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Time Inference Total: 3,524.42 ms (Yes, 3.5 saacadood) Qalabka RAM: 1.116.77 MB The Result: The LLM haluxinated, soo saarka block JSON oo ka mid ah ka mid ah "Fox Brown" iyo "Lazy dog", ka dib markii ay ka mid ah 50 tokens of a monologue mid ka mid ah in ay ka mid ah in ay ka mid ah in ay ka mid ah in ay ka mid ah entities. The Specialized LLM (NuExtract 1.5) Marka: numind/NuExtract-1.5 Shirkadda: Generative Causal LM (Fine-tuned for JSON extraction) Qalabka Qalabka Qalabka: 55 Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka waqti Inference Total: ~1,200.00 ms Qalabka RAM: ~1,200.00 MB Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka JSON Qalabka Qalabka Qalabka Qalabka Qalabka Qalabka ( ) pycompare/test_gliner_perf.py Marka: knowledgator/gliner-bi-base-v2.0 Marka: Bi-Encoder Shuruudaha Qalabka: 22 (Text + Labels) Waqtiga Inference Total (Python): 50.83 ms Total Inference Time (JNI/C++ Web Gateway): ~750.00 ms (Including HTTP framing, raaxada, iyo memcopy off-heap) Qalabka RAM: 824.11 MB The Result: Qalabka, dhismaha ugu fiican ee {animal: "quick brown fox", location: "New York", waqti: "5 PM", waqti: "Monday"}. The Verdict: Embeddings waa Blood of Databases Vector Sida loo helo Knowledge Extraction ee GLiNER, FogAi waxay ka heli karaa pipeline ka badan (3500ms vs 50ms in la soo saarka dhismaha) si ay u isticmaalaa LLM-ka ah, oo ay u isticmaalaa LLM-ka dhismaha dhismaha dhismaha ah (eyna NuExtract) by completely bypassing the autoregressive bottleneck. 6,800% Sida loo yaabaa, waxaa laga yaabaa in la mid ah mid ka mid ah mid ka mid ah. How do we deploy it? The Gateway Integration Test: Testing All Topology (Nodes A, B, iyo C) Waxa uu ka soo bandhigi karaa in ay ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah: Type A (In-Process JNI): Xiriiradda GLiNER in C++ via access memory direct (off-heap memory buffers) ee mid ka mid ah JVM sida Vert.x API Gateway. Type B (Out-of-Process C++ gRPC): Waqtiga GLiNER ee microservice C++ ah (wax ka mid ah MNN ama ONNX runtime) iyo ku haboon karo Gateway via HTTP/2. Type C (Out-of-Process Python gRPC): Waqtiga GLiNER ee microservice gRPC ah oo ku saabsan Python oo loo isticmaalo runtime ONNX. Marka aad u soo bandhigay 3 nodes ka mid ah Vert.x API Gateway, wax soo saarka waxaa laga yaabaa: Averaged per request under load. The combined overhead of Protobuf serialization, inter-process HTTP/2 networking, and the crushing weight of the Python Global Interpreter Lock (GIL) created a massive bottleneck. Type C (Out-of-Process Python gRPC): 3,200 ms - 4,500 ms Averaged per request under load. Even with a hyper-optimized C++ backend, the overhead of Protobuf serialization/deserialization and inter-process HTTP/2 networking created a massive bottleneck. Under stress tests ( ), the network stack overhead resulted in queue pileups for a model that normally takes 50ms to run natively. Type B (Out-of-Process C++ gRPC): 1,250 ms - 2,100 ms test_integration.sh Sustained end-to-end latency the HTTP Web Gateway routing, EDF queueing, Type A (In-Process JNI): ~750.00 ms including the "Vanilla" safety checks, and memory mapping. The direct off-heap C++ memory handoff bypassed the networking and serialization layer entirely. Sida loo isticmaali karaa GLiNER ku saabsan node Edge Type A ee MNN, waxaan Qalabka dhismaha iyo dhismaha iyo dhismaha iyo dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha, dhismaha iyo dhismaha. : Waxaan sidoo kale ka soo saarka Temporal Knowledge Graphs ka mid ah dhismaha dhismaha dhismaha dhismaha dhismaha ah. Qalabka Free unfair advantage Inta badan oo ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah. Waayo, waxaa laga yaabaa in la soo saarka C++ MNN. Sida loo yaqaan 'JNI integration speeds' oo ka mid ah Python, waxaan ku yaalaa model HuggingFace GLiNER ee MNN's. Waayo, waxaan ka heli karaa bug-ka ONNX-ka dhismaha dhismaha dhismaha ah ee macluumaadka PyTorch-ka ah doorashada layer ONNX-ka dhismaha dhismaha ah ka mid ah HuggingFace, iyo loo isticmaali karaa . .mnn MNNConvert Waxaan ku dhigi karaa this conversion script in Ku saabsan repository: scripts/convert_gliner_to_mnn.sh #!/bin/bash ONNX_MODEL="models_onnx/gliner-bi-v2/onnx/model.onnx" MNN_DIR="models_mnn/gliner-bi-v2" mnnconvert -f ONNX --modelFile "$ONNX_MODEL" --MNNModel "$MNN_DIR/model.mnn" --bizCode MNN copy models_onnx/gliner-bi-v2/*.json "$MNN_DIR/" Qalabka Qalabka Qalabka Qalabka Ma rabtaa in la soo xiriir. Waxaad ku yaalaa benchmarks Python on your own machine. Clone the FogAi repository, navigate to , iyo soo saarka si ay u aragto Inference Taxa live: pycompare git clone https://github.com/NickZt/FogAi.git cd FogAi python3 -m venv venv && source venv/bin/activate pip install psutil gliner transformers accelerate python3 pycompare/test_gliner_perf.py python3 pycompare/test_llm_perf.py Haku: Haku: Haku: Haku: Haku: Haku Sida loo yaqaan 'FogAi natively exposes an (Ee) Ma rabtaa in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay. Qalabka dhismaha waxaa laga yaabaa in ay ku yaalaa qiyaasadda dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha. OpenAI-compatible API /v1/chat/completions docker-compose Waayo, aad u isticmaali karaa Docker in aad mashiinka. Nadiifinta in aad u UI directory iyo soo saarka adeegga: cd UI docker-compose up -d Open your browser and start chatting: : Open WebUI http://localhost:3000 : (Password is simply ) Lobe Chat http://localhost:3210 fogai Qalabka waxaa loo isticmaali karaa automatically Markaas ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah. http://host.docker.internal:8080/v1