Umbhali omtsha we-cell model we-27-billion-parameter ayikho kuphela ku-biology. It is data engineering and a blueprint for the future of applied AI. Uma unjiniyela we-AI, kufuneka ukuphazamisele ukuthi usebenza futhi ucwanele i-new Ukusebenzisana phakathi kweYale neGoogle. I-C2S-Scale Preprint Ngaphandle kwalokho, kuhlobene ne-niche bioinformatics iphepha. Ngokuvamile, kuyinto omunye ama-manifesto ye-architectural ye-AI eyenziwe ngexesha elide. I-team yasungulwa imodeli ye-parameter ye-27B enikezela kuphela idatha e-biological—ikwenza a mayelana ne-cancer potential therapy. novel, wet-lab-validated scientific discovery Njengoba umbhali, ngingathanda kuncike isidakamizwa esifanele etholakalayo futhi ngingathanda kakhulu I-methodology yayo kuyinto ibhukumaki ebonakalayo ukuthi wonke umkhakha we-AI no-engineer kufanele akufanele. Indlela I-Core Problem: Amamodeli ze-AI Amasipho I-Tablet Sheets Umthamo wokusebenza i-LLMs ku-data yenzululwazi noma yebhizinisi kuyinto ukuthi lezi amamodeli zihlanganiswa ngezilimi, kodwa idatha zethu zihlanganisa embhedolobha, amasethi, kanye nezinhlangano amakhulu, amakhulu. Ukusebenza ukuthola i-LLM ukuhlola i-scRNA-seq ingxubevange ye-gen expression matrix kuyinto emangalisayo. Ngaminyaka eminingi, indlela esisodwa yenzelwe ukwakha izakhiwo ezilinganiselwe, ezihlangene zenzululwazi - I-AI ezinzima ukunambitheka ezinye izinzuzo zemvelo ku-model eyenziwe ngezibalo. Lokhu kunzima, enhle, futhi uzothola izinsizakalo ze-scaling eningi kanye nezinsizakalo ezikhuthazayo ze-mainstream ecosystem ye-LLM. I-C2S-Scale-Scale iqembu luhlobo luhlobo luhlobo luhlobo luhlobo luhlobo luhlobo. Ngaphandle kokuguqulwa kwemodeli ukuze zihlanganise idatha, zihlanganisa idatha ukuze zihlanganise imodeli. Ngaphandle kokuguqulwa kwemodeli ukuze zihlanganise idatha, zihlanganisa idatha ukuze zihlanganise imodeli. I-Architectural Masterstroke: I-Cell2Sentence I-Genius ye-Cell2Sentence (C2S) isakhiwo yayo kuyinto enhle kakhulu. Zihlola iphrofayili yokudluliselwa kwe-gene ephikisana ye-numerical ye-cell kanye nokuguqulwa ku-string elula ye-text. Yini? Zihlanganisa zonke ama-gen e-cell ngokusebenzisa izinga lokuphendula, bese nje ukhiye ama-names ye-top-K gene ngokuhambisana. I-complex biological state ye-cell, njenge: {'GeneA': 0.1, 'GeneB': 0.9, 'GeneC': 0.4, ...} Yenza isinyathelo se-cell enhle, enokutholakalayo ngabantu: GeneB GeneC GeneA ... Kuyinto inguqulo enhle yobuchwepheshe data. Nge lo mkhuba omunye, baye: Ukunciphisa ukweseka kwe-Custom Architectures: Abanikeze manje lithunyelwe le-language ye-biological ngqo ku-standard, ngaphandle kwe-shelf Transformer architecture njenge-Gemma noma Llama. Abanikeze ukujabulela umlilo we-LLM jikelele yokufundisa mahhala. I-Unlocked Multimodality: I-corpus yayo yokufundisa akuyona kuphela ama-cell sentences. Baya manje bakwazi ukuxuba ama-abstracts asebenzayo ze-papiers zenzululwazi lapho idatha asekelwe. I-model yaziwa ukuhlangabezana isixazululo se-cell ne-language ye-scientist e-a single, unified training run. Ukukhishwa kwe-True Vibe Coding for Biology: Umzila lokugqibela akuyona kuphela izinto. It kungathola isikhunta like, Yenza i-pancreatic CD8 + T cell, futhi ngeke ikhiqize isixazululo esitsha, isixazululo se-synthetic esibonakalayo isixazululo se-gen ye-cell esizayo. I-Payoff: I-Industrializing Scientific Discovery Ubuchwepheshe okuhlobene okuvumela isicelo se-killer ye-papiers. I-team yasungula i-screen ye-virtual yokufunda i-drug enokukwazi ukwandisa i-visibility ye-cancer cell ku-immune system. Kuyinto isibuyekezo se-database elula. Kuyinto umklamo. Umklamo wabheka ukuthi isidakamizwa eyodwa, i-silmitasertib, uya kuba le mphumela, kodwa Phakathi ne-interferon signaling. in-silico Waze Wathola le roman, i-AI-generated hypothesis ku-real wet lab, waya izifundo zofilimu, futhi proved it was correct. Kuyinto paradigm entsha. I-AI ayitholela impendulo emathunjini yayo kuphela. I-I synthesized ukubuyekezwa kwe-language ye-biological kanye ne-language ye-human ukuze ikhiqize i-new, i-non-obvious, futhi ekugcineni Isisombululo se-serendipity. It is a system for industrializing serendipity. Ukucaciswa Yintoni okuqukethwe kubakhi I-C2S-Scale Paper iyisisombululo sokufundisa indlela yokwakha izinhlelo ze-AI ephezulu emkhakheni omncane, engaphandle kwe-textual, kusuka ku-finance kuya ku-logistics kuya ku-manufacturing. Ukuqhathanisa Ukuqhathanisa I-Model. Qala Ukuqhathanisa I-Data yakho. Ukusebenza okungenani okungenani ku-design ye-neural networks eyakhelwe. Kuyinto ekusebenzeni okwenziwe, ekusebenzeni kwe-strategic ye-Data-to-Sentence representation ye-domain yakho eyodwa. Yini isilimi se-supply chain yakho? Yini i-grammar ye-financial data yakho? I-multimodality iyimfuneko, akuyona umphumela. Umthamo owenziwe ngokuvumelana nezimo ze-cell ne-paper abstracts. Imikhiqizo yakho ye-AI kufanele ifundwe akuyona kuphela idatha yakho ebonakalayo, kodwa ku-knowledge ebonakalayo yabantu abacindezela – izitimela zokusebenza, izitimela zokuxhumana, izitimela zokusebenza, izitimela zokusebenza. The Goal kuyinto I-Hypothesis Generator, Akukho-Answer Machine. I-AI engcono kakhulu emkhakheni ngeke akuyona ama-inthanethi angakwazi ukujabulela okuhleziwa. Bona abo, njenge-C2S-Scale, akwazi ukukhiqiza ama-hypotheses ezintsha, ezokutholakalayo ezikhuthaza i-limits ye-possible. Let's Build It: A Data-to-Sentence Isibonelo Konke okuhlobene, ngakho-ke siphinde. Lapha isibonelo ye-Python eyenziwe kakhulu ye- "Data-to-Sentence" ibhizinisi, esetshenziselwa indawo eyahlukile: server log analysis. Ukubonisa ukuthi unayo idatha lokubhalisa. Ngaphandle kokuthunyelwe ku-AI njenge-JSON ashisayo, singakwazi ukuguqulwa ku-"log sentence." import json def server_log_to_sentence(log_entry: dict) -> str: """ Translates a structured server log dictionary into a human-readable "log sentence". The "grammar" of our sentence is a fixed order of importance: status -> method -> path -> latency -> user_agent """ # Define the order of importance for our "grammar" grammar_order = ['status', 'method', 'path', 'latency_ms', 'user_agent'] sentence_parts = [] for key in grammar_order: value = log_entry.get(key) if value is not None: # We don't just append the value; we give it a semantic prefix # This helps the LLM understand the meaning of each part. sentence_parts.append(f"{key.upper()}_{value}") return " ".join(sentence_parts) def create_multimodal_prompt(log_sentence: str, human_context: str) -> str: """ Combines the machine-generated "log sentence" with human-provided context to create a rich, multimodal prompt for an LLM. """ prompt = f""" Analyze the following server request. **Human Context:** "{human_context}" **Log Sentence:** "{log_sentence}" Based on both the human context and the log sentence, what is the likely user intent and should we be concerned? """ return prompt # --- Main Execution --- if __name__ == "__main__": # 1. Our raw, structured data (e.g., from a database or log file) raw_log = { "timestamp": "2025-10-26T10:00:05Z", "method": "GET", "path": "/api/v1/user/settings", "status": 403, "latency_ms": 150, "user_agent": "Python-requests/2.25.1" } # 2. Translate the data into the new "language" log_sentence = server_log_to_sentence(raw_log) print("--- Original Structured Data ---") print(json.dumps(raw_log, indent=2)) print("\n--- Translated 'Log Sentence' ---") print(log_sentence) # 3. Combine with human context for a multimodal prompt human_context = "We've been seeing a series of failed API calls from a script, not a browser." final_prompt = create_multimodal_prompt(log_sentence, human_context) print("\n--- Final Multimodal Prompt for LLM ---") print(final_prompt) # Now, this final_prompt can be sent to any standard LLM for deep analysis. # The LLM can now reason about both the structured log data (as a sentence) # and the unstructured human observation, simultaneously. I-script elula ibonise isampula se-architectural core. Ukuguqulwa kwe-Data-to-Sentence kuyinto ikhoyili. Lokhu kuthatha I-Data ye-I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I.I. Waze