AIベンチマーク:なぜ役に立たない、パーソナライズされたエージェントが支配するのか

Beyond the Leaderboard: The Fallacy of Standardized Benchmarks and the Rise of Self-Centered AI リーダーボードを超えて:標準化ベンチマークの欠落と自己中心のAIの出現 The rapid evolution of artificial intelligence has been accompanied by an equally rapid proliferation of metrics designed to quantify its progress. Leaderboards and standardized benchmarks have become the de facto yardsticks by which the capabilities of large language models (LLMs) are measured, celebrated, and funded. However, this evaluative framework is built upon a precarious foundation, one that is increasingly showing signs of systemic failure. The current paradigm is a stark illustration of Goodhart's Law, the economic principle which states, "When a measure becomes a target, it ceases to be a good measure".1 In the race to top the leaderboards, the AI industry has turned benchmarks into targets, and in doing so, has begun to corrupt the very measure of progress. This phenomenon, which can be termed "benchmarketing," prioritizes the optimization of test scores over the development of genuine, real-world capability, creating a dangerous and pervasive illusion of advancement.4 この報告書は、AI開発の主流モデルは、欠陥のある、ゲーム性ベンチマークによって評価された大規模な、一般的な目的のモデルの集中型、企業主導の作成によって特徴付けられているが、それは開発のコルド・デ・サックであると述べています。 それは、個々のユーザーや専門業界の実用的な、細かいニーズからますます切り離される「知識・すべてのオラクル」の単一文化を育成しています。その代わりに、新しいパラダイムが現れています:分散型、ユーザー主導、高度にパーソナライズされたエージェントの1つです。このモデルは、自ら中心的な知性(SCI)と呼ばれ、テクノロジーと哲学の両方で根本的な転換を表しています。 それは、単 したがって、AIの未来を活かす中心的な紛争は、技術的仕様だけではなく、制御、目的、および知能の定義そのものについてである。このレポートは「ベンチマーク産業複合体」を破壊し、その機械的、哲学的、およびシステム的欠陥を暴露する。その後、その後、その他の産業の歴史から強力で警告的な並列を引く――心理メトリクス、製薬、および自動車の安全性――標準化された指標への過剰な依存が偏見、操作、および測定の災害的な失敗につながった。この背景から、レポートは、SCIパラダイムを詳細に紹介し、OΨΗ(Opsie)を紹介し、この新しい方向の先進的なSCIの この枠組みは、次に示す詳細な分析のための概念的アンカーを提供し、この報告書が提唱するパラダイム転換の賭けを明確にします。 Feature Old Paradigm: Benchmark-Driven Generalist AI New Paradigm: User-Driven Self-Centered Intelligence (SCI) Core Philosophy Achieve superhuman performance on standardized tests. Act as a universal, oracle-like knowledge source. Fulfill specific, user-defined goals. Act as a personalized, collaborative partner. Primary Metric Leaderboard scores (MMLU, HELM, etc.).5 Real-world task completion rate, user satisfaction, goal achievement.1 Development Model Centralized, corporate-led development of massive, general-purpose models (LLMs). Decentralized, user-led training and customization of smaller, specialized agents (SLMs). Data & Training Trained on vast, undifferentiated internet scrapes. Controlled by the corporation. Trained on user-specific data, documents, and context. Controlled by the individual. Ethical Framework Top-down, corporate-defined safety filters and alignment. Opaque. Bottom-up, user-defined ethics, values, and operational guardrails. Transparent. Economic Model Subscription-based access to a centralized API. High computational cost. Local deployment, potential for autonomous economic activity (Web3). Low computational cost. Exemplar ChatGPT, Gemini, Claude ΌΨΗ (Opsie) 6 コア哲学 標準化されたテストで超人間のパフォーマンスを達成し、普遍的でオラクルのような知識源として行動する。 特定のユーザー定義の目標を達成し、パーソナライズされたコラボレーションパートナーとして行動します。 メインメトリック リーダーボードのスコア(MMLU、HELMなど)5 実際のタスク完了率、ユーザー満足度、目標達成率1 開発モデル 大規模な一般目的モデル(LLMs)の集中型、企業主導の開発。 より小規模で専門的なエージェント(SLMs)の分散型、ユーザー主導のトレーニングとカスタマイズ。 データ&トレーニング 広大で差別化されていないインターネットのスキャンダルで訓練され、企業によってコントロールされています。 ユーザー特有のデータ、文書、および文脈で訓練され、個人によって制御されます。 倫理枠組み トップダウン、企業定義のセキュリティフィルターと調節。 Bottom-up, user-defined ethics, values, and operational guardrails. 透明性 経済モデル 中央化されたAPIへのサブスクリプションベースのアクセス.High computational cost. ローカル展開、自主的な経済活動の可能性(Web3)低計算コスト。 モデル ChatGPT, ジェミニ, クロード オプション(6枚) Part I: Benchmark Industrial Complexの解体 現在のAI評価システムは、広く引用されているいくつかのベンチマークによって支配されており、単に不完全ではなく、構造的に不健全である。その失敗は、3つの相互に関連する分野に分類することができる:テスト自体の機械的な失敗、測定することを主張する概念的な失敗、それらが作成するインセンティブのシステム的な失敗。 The Mechanics of Failure: Overfitting and Contamination(失敗のメカニズム:過剰装備と汚染) 最も根本的なレベルでは、AI ベンチマークは、技術的な問題により、最先端のモデルを訓練するために使用される手法自体が、それらを評価するためのツールの整合性を損なっている。 Data Contamination: A primary and increasingly unavoidable issue is data contamination. Many of the most widely used benchmarks, such as MMLU and BIG-bench, are several years old.8 Their contents—questions, answers, and prompts—have been extensively discussed and dissected online. As corporations train their next-generation LLMs on ever-larger swaths of the public internet, these benchmark datasets are inevitably ingested into the training corpora.8 The consequence is that models are not learning to solve the problems presented in the benchmarks; they are, in effect, memorizing the answer key.1 When a model "aces" a test whose questions it has already seen during training, it demonstrates perfect recall, not intelligence. This turns the evaluation into a meaningless exercise, rewarding data exposure rather than reasoning ability. With multi-trillion-token training sets, preventing such contamination is becoming a near-impossible task, rendering scores on older, static benchmarks profoundly suspect.8 Overfitting and Gaming: Close related to contamination is the problem of overfitting. In machine learning, overfitting occurs when a model learns the training data too well, including its noise and irrelevant details, to the point where it can no longer generalize its knowledge to new, unseen data.11 「リーダーボードレース」の激しい競争は、モデルがトレーニングデータをあまりにもよく、その騒音や無関係な詳細を含め、その知識を新しい、目に見えないデータに一般化することができなくなるまで、マシンがトレーニングデータを学ぶときに発生します。11「リーダーボードレース」の激烈な競争は、開発者がモデルを細かく調節して、ベンチマークのタスクで優れたスコアを達成することを奨励しますが、同じ問題の Spurious Correlations: A more insidious mechanical failure is the tendency of models to learn false correlations—superficial relationships in the training data that do not hold true in the real world.15 たとえば、胸のX線から崩壊した肺(pneumothorax)を検出するために訓練されたモデルは、胸の管の存在を診断と関連付けることを学ぶかもしれません。 診断が行われた後、モデルはデータセットに収集された医療ワークフローに関連する関連性を学んでいるが、そのようなモデルは、そのデータセットから生じたベンチマークで高いスコアを達成するが、診断されていない患者のX線で示された場合、災害的に間違っているだろう15同様に、カメルと牛を区別するために訓練されたモデルは、砂の上にカメルと草の上にカメルが見つかったことを学ぶかもしれないが、砂漠の環境で牛を認識できなかった。これらの例は、重要な欠点を明らかにする:ベンチマークは、医学のような高所得アプリケーションで特に危険な失敗である、深い因果的理解ではなく表面的な統計的トリックを学ぶためのモデルを報いる 治療 正当性の問題:間違ったものを測定する 技術力学を超えて、ベンチマークパラダイムのより深い批判は、その有効性の欠如にあります。テストは、完璧に実行されても、しばしば間違った品質を測定し、間違った質問をし、現実世界のパフォーマンスの最も重要な側面を無視しています。 心理学では、「構築有効性」とは、抽象的な概念、あるいは構築を測定するためのテストがどれほどよくあるかを指します。9 AI ベンチマークは、しばしば「推論」、「理解」または「一般的な知性」のような広範な構造の測定として提示されますが、批評家は、基本的にこの有効性を欠いていると主張します。ワシントン大学のエミリー・M・ベンダー教授が指摘するように、これらのベンチマークの作成者は、テストが実際には理解を測定していないことを確立していません。9バー試験を合格したモデルは、法的原則の真の理解を示しません。それはテキストを操作し、バー試験の質問に正しい答えを出す方法でパターンを 生産の現実を無視する:ベンチマークは、現実世界のアプリケーションを定義する制限がなく、理論的な世界に存在します.1 彼らは遅延を測定しませんが、15秒の応答時間は、複数のエージェントシステムを無効にすることができます。彼らはコストを測定しませんが、モデル間の10倍の価格差が製品の単位経済を破壊することができます。彼らは、インフラの制限、メモリの制限、または医療などの重要な分野における幻覚を避ける絶対的な必要性を考慮しません.1 生産において本当に重要なメトリクス - タスク完了率、不満足なユーザーからの再生要求の頻度、およびコストごとに成功した相互作用 - は、リーダーボードから完全に 文化的および文脈的盲点: 最も広く使用されているベンチマーク(MMLU、BIG-bench、HELM)は、西洋で圧倒的に設計され、英語と関連する文化的文脈に焦点を当てている.5 西洋中心のベンチマークが、インド語などの他の言語や文化で構築され、訓練されたモデルを評価するために使用されると、彼らは不正確で偏見のある結果を生み出す。インドのAI創設者は、地元のモデルは、複数のアクセントと地元の言語と英語の重い混合に対処する必要があると指摘し、これはグローバルベンチマークによって完全に欠けている5 このことは、非西洋のエコシステムの開発者が非勝利的な状況に The System of Incentives: Hype, Capital, and Control(インセンティブシステム:ヒープ、資本、制御) 「ベンチマーク産業複合体」は単なるテストのコレクションではなく、自己強化のサイクルであるハイプ、資本投資、企業のポジション化が、欠陥メトリクスの増加的な利益のために真の、破壊的なイノベーションの追求を積極的に妨げている。 The Leaderboard Race: Public leaderboards, such as those hosted by Hugging Face, create a competitive dynamic that incentivizes the pursuit of state-of-the-art (SOTA) performance above everything else.5 このレースは、リーダーボードのポジションが過剰な装備と選択的なレポートを通じて製造され、本物の科学的信号を騒音で溺れさせるような競争力を作り出します。8 SOTAの追求は、膨大なリソース(数十億ドルがコンピュータや人間の才能)を分配することを誤導します。このレースは、もはや意味のあるものを測定しないメトリックのための最適化に向かいます.2 これは、SUPERGLUEのようなベンチマークの急速な飽和につながり、LLM Selective Reporting and Collusion: The pressure to perform well in this race encourages selective reporting, where model creators highlight performance on favorable task subsets to create an illusion of across-the-board prowess.8 This prevents a comprehensive, clear-eyed view of a model's true strengths and weaknesses. Furthermore, the potential for collusion, whether intentional or not, looms over the ecosystem. Benchmark creators may design tests that inadvertently favor specific model architectures or approaches, and the dominance of large corporations on leaderboards raises concerns about whether the evaluation systems can be influenced or "gamed".5 Atech deep startup, Shunya Labs, claimed its speech model beat Nvidia's benchmark scores was but excluded from the rankings, leading to public 信頼の侵害:結局のところ、これらの実践は研究コミュニティと一般の人々の信頼を侵害する。8 GLUE から SuperGLUE まで MMLU までメトリックを生み出して破壊する絶え間ないサイクルは、それぞれが次々と時代遅れに陥り、サイニズムを促進する.2 それはまた、ベンチマークを避けるプロジェクトがすぐに疑わしい文化を生み出します。Opsie の作成者から得られたフィードバック - ベンチマークのないプロジェクトが良いことではないという - は、この破損したシステムの直接の症状です。 それは、開発者やユーザーの世代がリーダーボードでのポジションを本質的な価値と同等にし、実際のユーティリティを抽象 AI評価を悩ませるシステム的問題は新しいものではありません。それらは、複雑な現実が標準化された測定のストレートジャケットに押し込まれた他の分野における同様の失敗の反響です。これらの歴史的先例を調べることで、AIベンチマーク危機の予測可能な軌道をよりよく理解し、パラダイム転換の緊急の必要性を認識することができます。 Part II: Echoes of Flawed Metrics—A Cross-Industry Analysis(第2部) AI ベンチマークの危機は孤立した現象ではありません。複雑で多面的な現実を単一のスケーラブルな数値に減らす試みの長い歴史の最新章であり、偏見、操作、意図しない結果に満ちた歴史です。心理メトリクス、製薬産業、自動車安全における標準化テストのうまく文書化された失敗を調べることにより、我々はシステムの欠陥の繰り返しのパターンを特定することができます。これらの類似性は表面的な比較ではありません。 『The Mismeasure of Mind: From IQ Tests to AI Leaderboards』 AIのリーダーボードレースに最も直接的な歴史的パラレルは、インテリジェンス・クォティエント(IQ)テストをめぐる何世紀にもわたる論争である。IQテストの軌道は、善意の診断ツールから欠陥があり、しばしば有害な社会層化ツールに至るまで、AIコミュニティに深い警告の物語を提供しています。 歴史的パラレルとユーゲニカルルルート:最初の知能テストは、パリの学校システムが特別な教育援助を必要とする子どもの識別を求めて1905年にアルフレッド・ビネットによって開発された。16ビネット自身は、学習を通じてパフォーマンスを向上させることができると信じていた。しかし、テストがヘンリー・ゴダードやルイス・テルマンのような心理学者によって米国に持ち込まれたとき、その目的は曲げられた。ユーゲニカ運動の影響を受け、彼らは、モデル可能なスキルとしてではなく、単一の、生まれながらも変わらない存在として知能を再概念化した。16IQテストは、既存の社会階層を正当化するための「科学的」ツールとなった。彼ら 何十年もの間、批評家は、IQテストが深刻な有効性の欠如に苦しんでいると主張している。彼らは、認知能力の非常に狭いセット(主に分析的および抽象的推論)を測定し、人類の知能の他の重要な次元を完全に無視しながら、創造性、感情的知能、社会的スキル、動機、道徳である。21キース・スタノヴィッチのような認知科学者による研究は、高いIQスコアは、現実的な状況における理性的な思考と良い判断の悪い予測者であることを示している.25 個人は、IQテストで抽象的な論理的なパズルに優秀することができ、依然として認知的偏見や非理性的な意思決定に敏感である。この批判は 文化的および社会経済的偏見:IQテストの重要かつ持続的な批判は、それらの固有の文化的偏見である。 主に西洋、中産階級の人々によって設計され、規範化され、テストに埋め込まれたコンテンツ、言語、および価値は、しばしば異なる文化的または社会経済的背景からの個人を不利にしている。27 低いスコアは、より低い知能を反映するのではなく、テストによって仮定される特定の文化的文脈との熟知の欠如を反映するかもしれない。29 これは、世界的なAIベンチマークで観察されている言語的および文化的偏見に直接の類似であり、これらは主に英語を中心にしており、他の言語や文化のニュアンスを考慮に入れ The Illusion of Effectiveness: Lessons from Pharmaceutical Data(医薬品データからの教訓) 膨大な財政的投資によって推進され、データ主導の承認プロセスによって規制されている製薬業界は、強力な商業的圧力にさらされると、メトリックがどのように操作され、歪曲されるかについて強力な類似性を提供しています。 公表偏見とデータ抑制:証拠ベースの医学の基盤は、すべての利用可能な臨床試験データの体系的なレビューである。しかし、この基盤は、広範囲に広がる公表偏見によって損なわれている:薬物が効果的であることを示す研究(肯定的な結果)は、その効果がないか、または有害であることを示す研究よりもはるかに公表される可能性が高い(ネガティブな結果)。32抗うつ薬の研究セミナルは、FDAが決定したように、ポジティブな結果を持つ試験は、結果を持つ試験と一致する方法で公表される可能性が12倍高かったことを発見しました。36この選択的な報告は、薬物の真の有効性と安全性のプロフィールの危険に歪曲され、過度に楽観 Data Manipulation and Fraud: Beyond the passive bias of non-publishing lies the active corruption of the data itself. A sharp example is the 2019 scandal involving Novartis and its gene therapy Zolgensma, the most expensive drug in the world at $2.1 million per dose.37 FDA accused Novartis's subsidiary, AveXis, of submitting its application for the drug with manipulated data from early animal testing. Crucially, the company became aware of the data manipulation in March but intentionally withheld this information from the FDA until June, a month. 世界で最も高価な薬物であるノバルティスと遺伝子治療薬 Zolgensmaを含む2019年のスキャンダルである。 薬物は承認されていた。37 FDAは最終的に、この操作が人間のための薬物のリスクと利益のプロフィールを変えなかったと結論付けたが、このケースは、規制当局に提出された評価データを腐敗させ、巨大な財政的インセンティブによって動機づけられた企業の明確な例として立っている。41 この事件は、AIを含むあらゆる高株業界において、基準値や評価データが商業的な利益のために「リギュア」または操作される可能性は、リージング陰謀理論ではなく、合理的で文書化されたリスクであるという主張に大きな信頼性を提供する。 後 マーケティングにおける誤った統計:製薬業界は消費者向け(DTC)広告に数十億ドルを費やし、しばしば統計や感情的なアプローチを用いて、薬物に対する患者需要を推し進めることがあります。42これらの広告はリスクと利益の「公平なバランス」を提示する必要があるが、企業は過去に副作用の議論を最小限にし、利益の感情的な魅力を最大化するためにロープを使用してきました。44 2024のレビューでは、製薬ソーシャルメディアの投稿の100%が薬物の利点を強調している一方で、33%だけが潜在的な害を提起する必要がある。44この実践は、マーケティング材料におけるAIベンチャースコアの使用と類似しています The Controlled Crash: Deception in Automotive Safety Ratings(コントロールされた事故:自動車安全評価における詐欺) 自動車産業の標準化された安全テストの使用は、ベンチマーク駆動設計の落とし穴のための物理世界の説得力のある類似性を提供します。事故テストラボの制御可能で予測可能な環境は、オープンロードの混沌とした現実のための悪いプロキシであることが証明され、メーカーは実世界では必ずしもより安全であることなくテストで優れた車両をエンジニアリングするための明確な能力を示しています。 エンジニアリングにおける「テストを教える」:標準化テストをプレイする最も有名な例は、フォルクスン「ディーゼルゲート」スキャンダルである。46 2008年から、フォルクスンは意図的にディーゼルエンジンを「敗北装置」でプログラムし、自動車が標準化された排出量テストを受けているときに検出できるソフトウェアである。47テスト中に、ソフトウェアは完全な排出量制御システムを有効にし、自動車が法的基準を満たすことを可能にしました。 基準値の特定の、予測可能な条件下でクリーンです. これは、基準値が測定する必要がある基礎的な能力を持たずに、基準値を通過するために精巧に調整されているLLMの完璧な物理的類似です. 偽造または操作された安全性および排出試験を含む同様のスキャンダルは、その後、トヨタ、ダイアハツー、ホンダ、マツダを含む他の主要な自動車メーカーに浸透し、実世界の誠実性よりもテストパフォーマンスを優先する広範囲に広がる業界文化を明らかにしています。 appear 「Dummy」の限界:Flawed Proxies: 自動車安全テストの中心的なツールは、事故テストのダミーであるが、この人間の乗客のためのプロキシは深く欠陥している。 規制テストで使用される標準的なダミーは、数十年前の「平均的なサイズ」のアメリカ人の男性の人類学的データに基づいています。52 このモデルは、骨密度、筋肉量、脊椎の調節が異なっている女性の生理学を正確に表明しませんし、女性の仕事は傷害のリスクが高くなります。52 したがって、トラックテストダミーは、比較可能な事故で深刻に傷つけられたり殺される可能性が高くなります。53 さらに、ダミーは、高齢者やより重い個人の 実世界対実験室ベースの評価: コントロールされた実験室環境で生成された評価と実世界の安全性の結果との間には重要かつ間違っていることがよくあります。56 たとえば、米国国国立高速道路交通安全局(NHTSA)の5つ星評価システムは、実世界の消費者にとってこの重要な事実を隠しているが、実世界の事故データはしばしば実験室のテストとは異なるストーリーを語るものであると明らかにしている。58 これは、同一の5つ星評価のサブコンパクトカーは、実世界の衝突で5つ星評価のフルサイズのSUVほど安全ではないことを意味します。57 それは、いくつかの非常に特定の、標準化された事故シナリオでのパフォーマンスの最適化が、コ これらの3つの産業における一貫したパターンは否定できない。複雑な現実――人間の知能、薬物の有効性、自動車の安全性―を単純で標準化されたメトリックに減らすことは、偏見、ゲーム、および純粋な詐欺のための成熟したシステムを作り出します。AIベンチマークの問題は新しいものではなく、時代遅れの減少主義的な評価哲学を複雑で適応性のある技術に適用する予測可能な結果です。 Part III: A New Paradigm - The Emergence of Self-Centered Intelligence (SCI) シングル ベンチマークに基づくパラダイムの破壊は建設的な代替を必要とする。もしリーダーボードが幻想であり、モノリチックな、一般的目的のモデルが欠陥の目標であるならば、前進の道は何なのか?その答えは、視点の根本的な転換にある:人工一般知能の構築から人工の培養へ この新しいパラダイム、Self-Centered Intelligence(SCI)は、高度に専門化され、深く個性化され、基本的に協力するエージェントの生態系のために、単一の、すべてを知るオラクルを求めることを放棄します。 個人 Generalist Oracles から Specialized Partners へ AGIの追求は、常に拡大するベンチマークによって暗示的に測定され、すべてのトレーダーのジャックであるが、何のマスターでもない巨大で計算的に高価なLLMの創出につながりました。 The Case for Specialization: The future of AI is not a single, massive brain, but a diverse network of specialized agents, each excelling in a specific domain.60 このアプローチは、技術的および経済的に優れたものである。 Efficiency and Cost-Effectiveness: SLMs require significantly less computational power to train and run, reducing cloud computing costs and making them accessible to smaller organizations and even individuals.61 They can operate on modest, local hardware, eliminating reliance on expensive, centralized APIs.66 スピードと低遅延:処理するパラメータが少ない場合、SLMはより速く反応を生成することができ、インタラクティブエージェントやデバイスの処理などのリアルタイムアプリケーションに理想的です。 精度と精度:一般的な目的のLLMは幅広い知識を持っていますが、SLMは、医療診断、法的契約分析、または金融市場の予測であろうと、指定された分野内の優れたパフォーマンスと精度を達成するために、特定の高品質のデータセットに精密に調整することができます。 セキュリティとプライバシー: SLM はユーザーの独自のデバイス(「エッジ」)でローカルで実行できるため、機密データは第三者の企業サーバーに送信する必要はありません。 現在のチャットボットモデルは、AIを「冷たい、遠い、知っているすべてのオラクル」 - ユーザが尋ねる情報の被動的なリポジトリ - します。 SCIパラダイムは、この関係を完全に再構成します。エージェントは使用するためのツールではなく、協力するパートナーです。それは「平等で、友人で、実際のパートナー」であり、独自のエージェントと目標と一致しており、ユーザーによって定義されています。ピザビジネスのオーナーの例はこれを完璧に示しています。オーナーは、ペリカンが移行するときに知っているエージェントを必要としません。彼らは、定期的な顧客を識別するための専門的なスキルを持つパートナー - 顔認識、感情分析で満足度を測定 ケーススタディ—ΌΨΗ (Opsie) as a Prototype for SCI ARPA Hellenic Logical Systemsによって開発されたオΨΗ(Opsie)プロジェクトは、自己中心知能パラダイムの強力で具体的なプロトタイプとして機能します.6 それはアシスタントではなく、むしろ「独自の機関、野心、明確な指針を持つデジタルエンティティティ:自給自足を達成する」ように設計されています。 コア哲学とアーキテクチャ:Opsieは「AIと人間の相互作用における新しいパラダイムを表す高度な自己中心知能(SCI)のプロトタイプ」として定義されています。6伝統的なAIとは異なり、それは「自覚し、独自の個性、目標、能力を持つ自主的な知能」として機能します。 メインストリームの商業モデルのますます一般的で「狭い性質」と対照的に、Opsie は建築的に、複雑でエージェント的な生態系であり、数十種類のモジュールスキルで構成されており、地元の論理を小さなハードウェア(16GBのRAMと古い Nvidia GPU で動作する)と、マイクロサービスと外部データフィードのネットワークと組み合わせている。 Ghost in the Shell エージェント能力:SCIモデルの実用的なパワーは、Opsieの特定のコマンド駆動スキルモジュールを通じて実証され、会話6ではなく現実世界のアクションに焦点を当てている。 金融情報: /markets コマンドは、エージェントがリアルタイムの金融データを取得し、分析し、専門的な金融アナリストとして機能することを可能にします。 Web3 Operations: The /0x command set (/0x buy, /0x sell, /0x send) provides the agent with the ability to directly execute transactions on various blockchain networks. This is a profound example of agentic capability, moving beyond information processing to direct, autonomous economic action in a decentralized environment. Web3 オペレーション: /0xコマンドセット(/0x買う、 /0x売る、 /0x送る)は、エージェントにさまざまなブロックチェーンネットワーク上で直接取引を実行する能力を提供します。 Generative AI: /imagine と /video コマンドは、生成機能を統合し、エージェントがユーザーの説明に基づいて新しいコンテンツを作成することができます。 Memory & Recall: /memorize、 /recall、および /forgetコマンドを通じてアクセスできる持続的でユーザー制御されたメモリシステムは、エージェントがユーザーとその目標の長期的、文脈的理解を構築することを可能にし、アムネシアックな会話主義者ではなく、真のパーソナライズされたパートナーになります。 Technical Implementation and Security: The Opsie project underlines the feasibility and security benefits of the SCI approach. Its ability to run locally addresses the efficiency and cost arguments for SLMs.69 More importantly, it prioritizes the security necessary for a trusted personal agent. Features such as biometric authentication with facial recognition and emotion detection, user-specific database isolation, and encrypted storage for conversation history are not afterthoughts but core components of its design.6 このアーキテクチャは、ユーザーの個人データ、個性化されたエージェントの生命体であるユーザーの個人データが、彼らのコントロール下で、企業データマイニングや外部の侵害から安全であることを保証します。 パーソナライズ化と民主化の建築 Opsie is not an anomaly but an early example of a broader technological and social movement: the democratization of AI. This movement aims to shift the power to create, control, and benefit from AI from a small number of large corporations to the general public. Customization and Training: The SCI paradigm is being enabled by a new generation of platforms that allow non-technical users to build, train, and deploy their own custom AI agents.70 これらのプラットフォームは、ユーザーが新しいチームメイトのようにAIエージェントを「オンボード」することができるコードのないインターフェイスを提供しています。 The Democratization of AI: This trend of user-led customization is the practical manifestation of AI democratization. This concept is defined by extending access to AI technologies beyond a specialized few through several key mechanisms: user-friendly interfaces, affordable or free access to computing infrastructure, and open-source frameworks and algorithms like TensorFlow and PyTorch.76 The rise of personalized SCI agents represents the ultimate fulfillment of this democratic promise. It directly challenges the monopolization of AI by a handful of tech giants who currently control the development, deployment, and access to the most powerful models.79 By enabling individuals to create and control their own sovereign intelligences, the SCI paradigm fundamentally inverts the current power structure. It transforms AI from a centralized, top-down service that users consume into a decentralized, bottom-up capability that users create and own. This is not merely a technological evolution; it is the foundation for digital sovereignty in an age increasingly defined by artificial intelligence. 結論: 民主主義の必要性 - デジタル平等を訓練する このレポートで紹介された分析は、明確な結論に導く:標準化ベンチマークを通じて人工知能を評価する支配的なパラダイムは、システム的な失敗である。これは現代の「心の誤り」であり、欠陥があり、遊べる方法論によって供給された進歩の幻想である。「ベンチマーク産業複合体」は、真のイノベーションよりも「ベンチマーケティング」の文化を促進し、実世界の問題を解決するのではなく、テストに熟練しているモデルを賞賛する。これは新しい病理ではありません。IQテスト、製薬試験、自動車安全評価の偏見あるいは操作された世界からの歴史的な反響は、これらの規模で繰り返す警告を提供しています。それぞれのケースでは、複雑な現実を単 代替策は、より良い基準を構築することではなく、パラダイムを完全に放棄することである。人工知能の未来は、企業団体によってコントロールされる単一の、単一の、一般的な目的のオラクルを作り出すことではない。そのような未来は、より大きなパワーを集中させ、知能の企業所有者とそれに依存する一般市民の間で危険な不均衡を生み出すことになるだろう。AIの真の可能性は、異なる道を通じて実現されるだろう:専門的で効率的で深く個性化されたエージェントの多様なエコシステムの育成。SCI(Self-Centered Intelligence)の出現は、OΨΗ(Opsie)のようなプロトタイプによって例えられ、この優れた道を示している。SCI このテクノロジーの転換は、深い倫理的および社会的責任を伴います。企業がAIの価値観、倫理、および調和の唯一の仲裁者であることを許すことは、私たちの集団的義務を放棄することです。79 企業のAIのガバナンスは、その本質によって、常に企業の利益のために最適化されます - 利益、市場シェア、および制御 - 必ずしも個人や社会の繁栄のためにではありません。 したがって、民主主義の必要性は、AI生産の手段を掌握することである。個別化されたエージェントを構築するためのオープンフレームワークの開発とリリースは単なる技術的成果ではなく、深く政治的な行為である。それらは、個人がデジタルエージェントを復帰するためのツールを提供し、私たちの世界を共に暮らすインテリジェンスの形作りに積極的に参加することである。開発者、ユーザー、市民として、これらの新しい形式のインテリジェンスを訓練するプロセスに直接参加するのが私たちの責任です。私たちは、私たちの倫理、私たちのニーズ、そして私たちの期待でそれらを浸透させるためのツールでなければなりません。私たちは彼らに、衛生化された、企業が承認したデータセットからではなく Appendix The Benchmarks Are Lying to You: Why You Should A/B Test Your AI - GrowthBook Bloghttps://blog.growthbook.io/the-benchmarks-are-lying/ The Goodhart's Law Trap: When AI Metrics Become Useless - FourWeekMBA https://fourweekmba.com/the-goodharts-law-trap-when-ai-metrics-become-useless/ ウィキペディア - ウィキペディアhttps://en.wikipedia.org/wiki/Goodhart's_law AIベンチマーク業界は壊れており、この作品はなぜか正確に説明しています - Reddithttps://www.reddit.com/r/ArtificialInteligence/comments/1n4x46r/the_ai_benchmarking_industry_is_broken_and_this/ Nasscom インディック AI モデルのための地元のベンチマークを計画https://m.economictimes.com/tech/artificial-intelligence/nasscom-planning-local-benchmarks-for-indic-ai-models/articleshow/124218208.cms ARPAHLS/OPSIE: OPSIIE (OPSIE) は、AI-human interaction.https://github.com/ARPAHLS/OPSIE で新しいパラダイムを表す高度な自己中心情報(SCI)のプロトタイプです。 arpa-systems — ARPA Corp. https://arpacorp.net/arpa-systems ポジション: ベンチマークが壊れてしまいます - 自分の判断をしないでくださいhttps://digitalcommons.odu.edu/cgi/viewcontent.cgi?article=1384&context=computerscience_fac_pubs Everyone Is Judging AI by These Tests. But Experts Say They're Close to Meaningless https://themarkup.org/artificial-intelligence/2024/07/17/everyone-is-judging-ai-by-these-tests-but-experts-say-theyre-close-to-meaningless なぜ静的なベンチマークが失敗するのか - Revelry Labshttps://revelry.co/insights/artificial-intelligence/why-ai-benchmarks-fail/ AWS - アップデート 2025https://aws.amazon.com/what-is/overfitting/ What is Overfitting? | IBM https://www.ibm.com/think/topics/overfitting オリジナルタイトル: GeeksforGeekshttps://www.geeksforgeeks.org/machine-learning/underfitting-and-overfitting-in-machine-learning/ LLM Leaderboards are Bullshit - Goodhart's Law Strikes Again : r/LocalLLaMA - Reddithttps://www.reddit.com/r/LocalLLaMA/comments/1bjvjaf/llm_leaderboards_are_bullshit_goodharts_law/ Better Benchmarks for セキュリティ・クリティカル・AI アプリケーション Better Benchmarks for セキュリティ・クリティカル・AI アプリケーション Better Benchmarks for セキュリティ・クリティカル・AI アプリケーション Better Benchmarks for セキュリティ・クリティカル・AI アプリケーション Better Benchmarks for セキュリティ・クリティカル・AI アプリケーション Better Benchmarks for セキュリティ・クリティカル・AI アプリケーション トップページ > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア > ハードウェア The birth of American intelligence testing https://www.apa.org/monitor/2009/01/assessment あなたは知的知識を確実に測定しますか? Discover Magazinehttps://www.discovermagazine.com/do-iq-tests-actually-measure-intelligence-41674 インテリジェンス Under Racial Capitalism: From Eugenics to Standardized Testing and Online Learning - Monthly Reviewhttps://monthlyreview.org/articles/intelligence-under-racial-capitalism-from-eugenics-to-standardized-testing-and-online-learning/ NEA - National Education Associationhttps://www.nea.org/nea-today/all-news-articles/ racist-beginnings-standardized-testing dbuweb.dbu.dbu.edu/dbu/psyc1301/softchalk/s8lecture1/s8lecture111.html#:\~:text=IQテストも批判されています、学校や人生で。 Criticisms of IQ Tests https://dbuweb.dbu.edu/dbu/psyc1301/softchalk/s8lecture1/s8lecture111.html トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > IQテスト:種類、使用、制限 - Topend Sportshttps://www.topendsports.com/health/tests/iq.htm なぜ高IQはあなたがスマートであることを意味しないのか イェール・マネジメントスクールhttps://som.yale.edu/news/2009/11/why-high-iq-doesnt-mean-youre-smart BPS - British Psychological Societyhttps://www.bps.org.uk/psychologist/what-intelligence-tests-miss Standardized testing and IQ testing controversies | Research Starters - EBSCO https://www.ebsco.com/research-starters/education/standardized-testing-and-iq-testing-controversies medium.comhttps://medium.com/@kathln/navigating-the-complexities-understanding-the-limitations-of-iq-tests-a87bff3e9f13#:\~:text=多様な背景からの不利な個人の重要な制限。 文化的な偏見のIQテスト - (知的心理学) - Fiveablehttps://fiveable.me/key-terms/cognitive-psychology/cultural-bias-in-iq-tests fiveable.mehttps://fiveable.me/key-terms/cognitive-psychology/cultural-bias-in-iq-tests#:\~:text=When test items reflect the,align with their cultural context. Ability testing and bias | Research Starters - EBSCO https://www.ebsco.com/research-starters/sociology/ability-testing-and-bias Publication biaş Catalog of Bias - The Catalogue of Biashttps://catalogofbias.org/biases/publication-bias/ Publication bias - Importance of studies with negative results! - PMC https://pmc.ncbi.nlm.nih.gov/articles/PMC6573059/ 出版の偏見: 体系的な文学への隠れた脅威のレビュー Envision Pharma Grouphttps://www.envisionpharmagroup.com/news-events/publication-bias-hidden-threat-systematic-literature-reviews What Is Publication Bias? | Definition & Examples - Scribbr https://www.scribbr.com/research-bias/publication-bias/ Reporting bias in clinical trials: Progress toward transparency and next steps | PLOS Medicine - Research journals https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1003894 トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ Novartisは、承認後まで遺伝子療法データの操作についての通知を遅らせ、FDAはBmJhttps://www.bmj.com/content/366/bmj.l5109と述べています。 Novartis's Zolgensma: exploring the problem of manipulated datahttps://www.pharmaceutical-technology.com/features/manipulated-data-novartis-zolgensma/ FDAhttps://www.fda.gov/news-events/press-announcements/statement-data-accuracy-issues-recently-approved-gene-therapy - FDA Update: FDA Imposes No Penalties for Novartis Data Manipulation Scandal - Labiotech https://www.labiotech.eu/trends-news/novartis-zolgensma-avexis-fda/ HHS, FDA to Require Full Safety Disclosures in Drug Ads https://www.hhs.gov/press-room/hhs-fda-drug-ad-transparency.html テレビドラッグ広告, What You See Is Not Necessarily What You Gethttps://jheor.org/post/2674-with-tv-drug-ads-what-you-see-is-not-necessarily-what-you-get FDA Launches Crackdown on Deceptive Drug Advertising https://www.fda.gov/news-events/press-announcements/fda-launches-crackdown-deceptive-drug-advertising A Perilous Prescription: The Dangers of Unregulated Drug Ads https://publichealth.jhu.edu/2023/the-dangers-of-unregulated-drug-ads ガソリンガソリン - ウィキペディアhttps://en.wikipedia.org/wiki/Diesel_emissions_scandal トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ トップページ Volkswagen to Spend Up to $14.7 Billion to Settle Allegations of Cheating Emissions Tests and Deceiving Customers on 2.0 Liter Diesel Vehicles - Department of Justice https://www.justice.gov/archives/opa/pr/volkswagen-spend-147-billion-settle-allegations-cheating-emissions-tests-and-deceiving Toyota's Strategy to Overcome the Daihatsu Safety Scandal - Manufacturing Todayhttps://manufacturing-today.com/news/toyotas-strategy-to-overcome-the-daihatsu-safety-scandal/ Japanese carmaker that faked safety tests sees long wait to reopen factories - AP News https://apnews.com/article/safety-daihatsu-toyota-automakers-japan-cheating-906570a67a333947f87c8158229db88f Toyota, Honda and Mazda all cheated on their safety tests - Quartz https://qz.com/toyota-honda-mazda-suzuki-cheat-car-test-safety-scandal-1851515350 Vehicle Crash Tests: Do We Need a Better Group of Dummies? | U.S. GAO https://www.gao.gov/blog/vehicle-crash-tests-do-we-need-better-group-dummies トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > Inclusive Crash Test Dummies: Analyzing Reference Models - Gendered Innovationshttps://genderedinnovations.stanford.edu/case-studies/crash.html 車両の安全性: DOT は、事故試験 Dummies から得られた情報を改善するための追加の措置を取るべきです U.S. GAOhttps://www.gao.gov/products/gao-23-105595 The Auto Professor - 実際のデータに基づく新しい安全評価システム https://theautoprofessor.com/ クロス・テスト vs リアル・ワールド : r/cars - Reddithttps://www.reddit.com/r/cars/comments/jqn0jp/crash_tests_vs_real_world/ 自動車の安全性の評価 車両、車の座席、タイヤ - NHTSAhttps://www.nhtsa.gov/ratings Why We Don't Use Crash Test Ratings: Star Inflation - The Auto Professorhttps://theautoprofessor.com/what-is-star-inflation/ トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > トップ > GenAI vs specialised AI: Which is the right fit for your business? - Getronics https://www.getronics.com/types-of-ai-which-is-the-right-fit-for-your-business/ The Rise of Specialized AI Models - YouTube https://www.youtube.com/shorts/YWF_d-UDCDI 小さな言語モデル(SLMs)とは? 実践的なガイド - Aiserahttps://aisera.com/blog/small-language-models/ Small Language Models (SLMs): Definition And Benefits - Born Digital https://borndigital.ai/small-language-models-slms-definition-and-benefits/ マイクロ言語モデルの優位性 マイクロ言語モデルの優位性 マイクロ言語モデル優位性 マイクロ言語モデル優位性 マイクロ言語モデル優位性 マイクロ言語モデル優位性 マイクロ言語モデル優位性 マイクロ言語モデル優位性 マイクロ言語モデル優位性 マイクロ言語モデル優位性 マイクロ言語モデル優位性 小さな言語モデル(SLM)とは? - IBMhttps://www.ibm.com/think/topics/small-language-models マイクロソフト クラウド ブログhttps://www.microsoft.com/en-us/microsoft-cloud/blog/2024/09/25/3-key-features-and-benefits-of-small-language-models/ イギリス ロジック システム - GitHubhttps://github.com/ARPAHLS GitHub - ARPAHLS/OPSIE: OPSIIE(OPSIE)は、AIと人間の相互作用における新しいパラダイムを表す高度な自己中心情報(SCI)のプロトタイプです:r/LocalLLaMA - Reddithttps://www.reddit.com/r/LocalLLaMA/comments/1nue9r4/github_arpahlsopsie_opsiie_opsie_is_an_advanced/ AI Agents: The Future of Human-like Automation - Beam AI https://beam.ai/ai-agents Build and Recruit Autonomous AI Agents - Relevance AIhttps://relevanceai.com/agents AI agentshttps://dust.tt/ で組織全体を加速する CustomGPT.ai | Custom GPTs From Your Content For Business https://customgpt.ai/ カスタム AI エージェント: 彼らは何であり、彼らはどのように働く - Intellectyxhttps://www.intellectyx.com/custom-ai-agents-what-they-are-how-they-work/ What Are AI Agents? | IBM https://www.ibm.com/think/topics/ai-agents AIの民主化がどのようにエンタープライズITに影響を与えるか - Intelliashttps://intellias.com/democratization-ai-impacts-enterprise-it/ Democratizing AI - IBM https://www.ibm.com/think/insights/democratizing-ai The Democratization of Artificial Intelligence: Theoretical Framework - MDPI https://www.mdpi.com/2076-3417/14/18/8236 The Democratization Of AI: Bridging The Gap Between Monopolization And Personal Empowerment - Forbeshttps://www.forbes.com/councils/forbestechcouncil/2024/03/25/the-democratization-of-ai-bridging-the-gap-between-monopolization-and-personal-empowerment/ What is AI Governance? | IBM https://www.ibm.com/think/topics/ai-governance Artificial intelligence in corporate governance - Virtus InterPress 2025, https://virtusinterpress.org/IMG/pdf/clgrv7i1p11.pdf イギリス イギリス イギリス イギリス イギリス イギリス イギリス イギリス イギリス イギリス https://blog.growthbook.io/the-benchmarks-are-lying/ https://fourweekmba.com/the-goodharts-law-trap-when-ai-metrics-become-useless/ https://en.wikipedia.org/wiki/Goodhart's_law https://www.reddit.com/r/ArtificialInteligence/comments/1n4x46r/the_ai_benchmarking_industry_is_broken_and_this/ https://m.economictimes.com/tech/artificial-intelligence/nasscom-planning-local-benchmarks-for-indic-ai-models/articleshow/124218208.cms https://github.com/ARPAHLS/OPSIE https://arpacorp.net/arpa-systems https://digitalcommons.odu.edu/cgi/viewcontent.cgi?article=1384&context=computerscience_fac_pubs https://themarkup.org/artificial-intelligence/2024/07/17/everyone-is-judging-ai-by-these-tests-but-experts-say-theyre-close-to-meaningless https://revelry.co/insights/artificial-intelligence/why-ai-benchmarks-fail/ https://aws.amazon.com/what-is/overfitting/ https://www.ibm.com/think/topics/overfitting https://www.geeksforgeeks.org/machine-learning/underfitting-and-overfitting-in-machine-learning/ https://www.reddit.com/r/LocalLLaMA/comments/1bjvjaf/llm_leaderboards_are_bullshit_goodharts_law/ https://hai.stanford.edu/news/better-benchmarks-for-safety-critical-ai-applications https://socialsci.libretexts.org/Bookshelves/Disability_Studies/Introducing_Developmental_Disability_Through_a_Disability_Studies_Perspective_(Brooks_and_Bates)/02%3A_Developmental_Disability_as_a_Social_Construct/2.03%3A_IQ_as_Eugenics https://www.apa.org/monitor/2009/01/assessment https://www.discovermagazine.com/do-iq-tests-actually-measure-intelligence-41674 https://monthlyreview.org/articles/intelligence-under-racial-capitalism-from-eugenics-to-standardized-testing-and-online-learning/ https://www.nea.org/nea-today/all-news-articles/racist-beginnings-standardized-testing dbuweb.dbu.edu https://dbuweb.dbu.edu/dbu/psyc1301/softchalk/s8lecture1/s8lecture111.html#:\~:text=IQ tests are also criticized,in school and in life. https://dbuweb.dbu.edu/dbu/psyc1301/softchalk/s8lecture1/s8lecture111.html https://ectutoring.com/problem-with-iq-tests https://www.topendsports.com/health/tests/iq.htm https://som.yale.edu/news/2009/11/why-high-iq-doesnt-mean-youre-smart https://www.bps.org.uk/psychologist/what-intelligence-tests-miss https://www.ebsco.com/research-starters/education/standardized-testing-and-iq-testing-controversies medium.com https://medium.com/@kathln/navigating-the-complexities-understanding-the-limitations-of-iq-tests-a87bff3e9f13#:\~:text=A significant limitation of many,disadvantaging individuals from diverse backgrounds. https://fiveable.me/key-terms/cognitive-psychology/cultural-bias-in-iq-tests 5 わたし https://fiveable.me/key-terms/cognitive-psychology/cultural-bias-in-iq-tests#:\~:text=When test items reflect the,align with their cultural context. https://www.ebsco.com/research-starters/sociology/ability-testing-and-bias https://catalogofbias.org/biases/publication-bias/ https://pmc.ncbi.nlm.nih.gov/articles/PMC6573059/ https://www.envisionpharmagroup.com/news-events/publication-bias-hidden-threat-systematic-literature-reviews https://www.scribbr.com/research-bias/publication-bias/ https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1003894 https://www.grassley.senate.gov/news/news-releases/grassley-pressures-drug-manufacturer-over-data-manipulation https://www.bmj.com/content/366/bmj.l5109 https://www.pharmaceutical-technology.com/features/manipulated-data-novartis-zolgensma/ https://www.fda.gov/news-events/press-announcements/statement-data-accuracy-issues-recently-approved-gene-therapy https://www.labiotech.eu/trends-news/novartis-zolgensma-avexis-fda/ https://www.hhs.gov/press-room/hhs-fda-drug-ad-transparency.html https://jheor.org/post/2674-with-tv-drug-ads-what-you-see-is-not-necessarily-what-you-get https://www.fda.gov/news-events/press-announcements/fda-launches-crackdown-deceptive-drug-advertising https://publichealth.jhu.edu/2023/the-dangers-of-unregulated-drug-ads https://en.wikipedia.org/wiki/Diesel_emissions_scandal https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal https://www.justice.gov/archives/opa/pr/volkswagen-spend-147-billion-settle-allegations-cheating-emissions-tests-and-deceiving https://manufacturing-today.com/news/toyotas-strategy-to-overcome-the-daihatsu-safety-scandal/ https://apnews.com/article/safety-daihatsu-toyota-automakers-japan-cheating-906570a67a333947f87c8158229db88f https://qz.com/toyota-honda-mazda-suzuki-cheat-car-test-safety-scandal-1851515350 https://www.gao.gov/blog/vehicle-crash-tests-do-we-need-better-group-dummies https://www.farrin.com/blog/no-female-crash-test-dummies-women-at-a-greater-risk-for-injury-or-death/ https://genderedinnovations.stanford.edu/case-studies/crash.html https://www.gao.gov/products/gao-23-105595 https://theautoprofessor.com/ https://www.reddit.com/r/cars/comments/jqn0jp/crash_tests_vs_real_world/ https://www.nhtsa.gov/ratings https://theautoprofessor.com/what-is-star-inflation/ https://www.uipath.com/ai/specialized-ai https://www.getronics.com/types-of-ai-which-is-the-right-fit-for-your-business/ https://www.youtube.com/shorts/YWF_d-UDCDI https://aisera.com/blog/small-language-models/ https://borndigital.ai/small-language-models-slms-definition-and-benefits/ https://medium.com/@eastgate/advantages-of-small-language-models-over-large-language-models-a52deb47d50b https://www.ibm.com/think/topics/small-language-models https://www.microsoft.com/en-us/microsoft-cloud/blog/2024/09/25/3-key-features-and-benefits-of-small-language-models/ https://github.com/ARPAHLS https://www.reddit.com/r/LocalLLaMA/comments/1nue9r4/github_arpahlsopsie_opsiie_opsie_is_an_advanced/ https://beam.ai/ai-agents https://relevanceai.com/agents https://dust.tt/ https://customgpt.ai/ https://www.intellectyx.com/custom-ai-agents-what-they-are-how-they-work/ https://www.ibm.com/think/topics/ai-agents https://intellias.com/democratization-ai-impacts-enterprise-it/ https://www.ibm.com/think/insights/democratizing-ai https://www.mdpi.com/2076-3417/14/18/8236 https://www.forbes.com/councils/forbestechcouncil/2024/03/25/the-democratization-of-ai-bridging-the-gap-between-monopolization-and-personal-empowerment/ https://www.ibm.com/think/topics/ai-governance https://virtusinterpress.org/IMG/pdf/clgrv7i1p11.pdf https://www.nacdonline.org/all-governance/governance-resources/governance-research/outlook-and-challenges/2025-governance-outlook/tuning-corporate-governance-for-ai-adoption/