LLMs สามารถสร้างรหัสคุณภาพได้หรือไม่? การทดลอง 40,000 สาย

คําอธิบายการบริหาร ฉันใช้เวลาสี่สัปดาห์ในระยะสั้น (อาจเป็นเวลา 80 ชั่วโมงทั้งหมด) ในการสร้างกรอบอินเทอร์เฟซอินเทอร์เฟซแบบตอบสนองแบบเต็มรูปแบบที่มีส่วนประกอบมากกว่า 40 ส่วนประกอบเราเตอร์และสนับสนุนเว็บไซต์แบบโต้ตอบโดยใช้รหัสที่สร้างขึ้นโดย LLM . LLMs can produce quality code—but like human developers, they need the right guidance หลักการค้นพบ On Code Quality: ภารกิจที่ระบุไว้อย่างถูกต้องให้รหัสผ่านครั้งแรกที่สะอาด ข้อกําหนดที่เฉพาะเจาะจงหรือเฉพาะเจาะจงทําให้เกิดการประยุกต์ใช้ที่เลวร้าย รหัสลดลงเมื่อเวลาผ่านไปโดยไม่มีการฟื้นฟูวัตถุประสงค์ LLMs Defensively over-engineer เมื่อขอเพื่อปรับปรุงความน่าเชื่อถือ On The Development Process: It is hard to be “well specified” when a task is large การพิจารณาขยาย ("คิด") นําไปสู่ผลลัพธ์ที่ดีขึ้นแม้ว่าบางครั้งจะนําไปสู่กลยุทธ์วงกลมหรือขยายมากเกินไป หลายมุมมอง LLM (Models Switching) ให้การตรวจสอบสถาปัตยกรรมที่มีค่าและความช่วยเหลือในการแก้ไขปัญหา การใช้กรอบโครงสร้างเช่น Bau.js หรือ Lightview ป้องกันการยืดหยุ่นดีกว่าการพัฒนาที่ไม่มีข้อ จํากัด เมตริกแบบฟอร์มระบุวัตถุประสงค์และแนะนําการกําจัดความซับซ้อนของรหัส บรรทัดด้านล่าง: ในหลายวิธี LLMs มีพฤติกรรมคล้ายกับเฉลี่ยของมนุษย์ที่ฝึกพวกเขา - พวกเขาทําข้อผิดพลาดที่คล้ายคลึงกัน แต่เร็วขึ้นและในขนาดใหญ่ ดังนั้นคุณสามารถได้รับหกเดือนของการบํารุงรักษาและการปรับปรุง "เลื่อน" 6 นาทีหลังจากที่คุณสร้างฐานรหัสที่สะอาดเริ่มต้นแล้วขอการเปลี่ยนแปลง ความท้าทาย สี่สัปดาห์ที่ผ่านมาฉันได้วางแผนที่จะตอบคําถามที่ได้รับการพิจารณาอย่างรุนแรงในชุมชนพัฒนา: สามารถสร้างรหัสที่มีคุณภาพในการผลิตได้หรือไม่ ไม่ใช่แอพพลิเคชันของเล่น ไม่ใช่แอพ CRUD แบบเรียบง่าย กรอบอินเตอร์เฟซอินเทอร์เฟซแบบตอบสนองที่สมบูรณ์แบบและทันสมัยพร้อมกับส่วนประกอบที่สร้างไว้ล่วงหน้าจํานวนมากเราเตอร์และเว็บไซต์ที่สนับสนุนด้วย: หลายพันแถวของ JavaScript, CSS และ HTML หน่วยความจําประสิทธิภาพและการพิจารณาด้านความปลอดภัย ประสบการณ์ด้าน UX และผู้พัฒนา ฉันเลือกที่จะสร้าง (ที่ ) - framework UI แบบตอบสนองรวมคุณสมบัติที่ดีที่สุดของ Bau.js, HTMX และ Juris.js ข้อ จํากัด: 100% รหัสที่สร้างขึ้นโดย LLM โดยใช้ Anthropic's Claude (Opus 4.5, Sonnet 4.5) และ Google's Gemini 3 Pro (Flash ไม่ได้ออกเมื่อฉันเริ่มต้น) Lightview lightview.dev เริ่มต้นด้วยคําถามไม่ใช่รหัส ฉันเริ่มด้วย Claude Opus: "I want to create a reactive UI library that combines HTMX, Bau, and Juris. For hypermedia, I prefer no special attribute names—just enhanced behavior. I also want string literal processing in HTML, an SPA router, a UI component library with automatic custom element creation, SEO-enabled apps with no extra work, and a website to promote and educate users. Do you have any questions?" "ฉันต้องการสร้างห้องสมุด UI แบบตอบสนองที่รวม HTMX, Bau, และ Juris สําหรับ hypermedia ฉันไม่ต้องการชื่อคุณสมบัติพิเศษเพียงแค่พฤติกรรมที่เพิ่มขึ้น ฉันต้องการการประมวลผลแบบสี่เหลี่ยมใน HTML, router SPA, ห้องสมุดส่วน UI พร้อมการสร้างองค์ประกอบที่กําหนดเองโดยอัตโนมัติแอพพลิเคชันที่ใช้งานได้โดยไม่ต้องทํางานเพิ่มเติมและเว็บไซต์เพื่อส่งเสริมและสอนผู้ใช้ คุณมีคําถามใด ๆ " Claude ไม่ได้เจาะเข้าไปในรหัสในตอนแรก มันถามว่า : dozens of clarifying questions TypeScript หรือ vanilla JavaScript? แพคเกจส่วนประกอบ UI ใดสําหรับ styling? (มีตัวเลือกที่มีข้อดี / ข้อเสีย) HTMX มีคุณสมบัติใดโดยเฉพาะอย่างยิ่ง การตั้งค่าโฮสติ้ง Routing strategy? However, at times it started to write code before I thought it should be ready and I had to abort a response and redirect it. การค้นพบ: LLMs มีแนวโน้มที่แข็งแกร่งต่อการสร้างรหัสล่วงหน้า แม้จะถูกเตือนให้ไม่ได้เข้ารหัสแล้วก็ก็ลืมหลังจากปฏิสัมพันธ์ไม่กี่ครั้ง สิ่งนี้เกิดขึ้นกับรูปแบบทั้งหมด -Claude, Gemini, GPT พวกเขาดูเหมือนจะถูกกระตุ้นโดยเฉพาะอย่างยิ่งเพื่อเริ่มต้นกระบวนการสร้างเมื่อมีการให้รหัสตัวอย่างแม้ว่าตัวอย่างจะถูกให้พร้อมกับคําถามแทนที่จะเป็นคําขอการใช้งาน การค้นพบ: LLMs มีแนวโน้มที่แข็งแกร่งต่อการสร้างรหัสล่วงหน้า แม้จะถูกเตือนให้ไม่ได้เข้ารหัสแล้วก็ก็ลืมหลังจากปฏิสัมพันธ์ไม่กี่ครั้ง สิ่งนี้เกิดขึ้นกับรูปแบบทั้งหมด -Claude, Gemini, GPT พวกเขาดูเหมือนจะถูกกระตุ้นโดยเฉพาะอย่างยิ่งเพื่อเริ่มต้นกระบวนการสร้างเมื่อมีการให้รหัสตัวอย่างแม้ว่าตัวอย่างจะถูกให้พร้อมกับคําถามแทนที่จะเป็นคําขอการใช้งาน คําแนะนํา: หาก LLM เริ่มสร้างโค้ดก่อนที่คุณพร้อมแล้วยกเลิกการเสร็จสิ้นทันทีและเปลี่ยนเส้นทาง: "ไม่สร้างโค้ดแล้ว คุณมีคําถามเพิ่มเติมหรือไม่?" คุณอาจจําเป็นต้องทําซ้ํานี้หลายครั้งในขณะที่ LLM "ลืม" และลอยกลับไปสู่การสร้างโค้ด การวางแผน vs โหมด Fast ใน Antigravity หรือโหมดที่คล้ายคลึงกันใน IDE อื่น ๆ ควรจะช่วยในการทําเช่นนี้ แต่ก็ไม่สะดวกที่จะใช้ซ้ํา การแก้ปัญหาที่ดีขึ้นจะเป็น: ถ้าผู้ใช้ถามคําถาม LLM ควรคิดว่าพวกเขาต้องการการสนทนาไม่ใช่โค้ด เพียงสร้าง / เปลี่ยนโค้ดเมื่อขออย่างชัดเจนหรือหลังจากขออนุญาต If an LLM starts generating code before you're ready, cancel the completion immediately and redirect: "Don't generate code yet. Do you have more questions?" You may need to repeat this multiple times as the LLM "forgets" and drifts back toward code generation. The Planning vs Fast mode toggle in Antigravity or similar modes in other IDE’s should help with this, but it's inconvenient to use repeatedly. : ถ้าผู้ใช้ถามคําถาม LLM ควรพิจารณาว่าพวกเขาต้องการการสนทนาไม่ใช่รหัสเท่านั้น สร้าง / แก้ไขรหัสเมื่อขออย่างชัดเจนหรือหลังจากขออนุญาต Guidance: A better solution would be After an hour of back-and-forth, Claude finally said: "No more questions. Would you like me to generate an implementation plan since there will be many steps?" แผนที่ได้มาเป็นแบบครบวงจร - ไฟล์ Markdown รายละเอียดพร้อมกล่องตรวจสอบการตัดสินใจการออกแบบและข้อพิจารณาสําหรับ: หลักห้องสมุดปฏิกรณ์ 40+ ชิ้นส่วน UI ระบบ Routing เว็บไซต์ ... แม้ว่าเว็บไซต์จะได้รับความสนใจน้อยลง - ช่องว่างที่ฉันจะแก้ไขในภายหลัง ฉันไม่ได้เปลี่ยนแผนนี้อย่างมีนัยสําคัญใด ๆ ยกเว้นการชี้แจงเกี่ยวกับรายการเว็บไซต์และการเพิ่มคุณสมบัติหลักหนึ่งคือการประกาศเหตุการณ์ที่เกิดขึ้นในตอนท้ายของการพัฒนา การก่อสร้างเริ่มต้น With the plan in place, I hit my token limit on Opus. No problem—I switched to Gemini 3 (High), which had full context from the conversation plus the plan file. ในไม่กี่นาที Gemini สร้าง —the core reactivity engine—along with two example files: a "Hello, World!" demo showing both Bau-like syntax and vDOM-like syntax. lightview.js แล้วฉันทําข้อผิดพลาด "สร้างเว็บไซต์เป็นสปา" ฉันพูดโดยไม่ระบุให้ใช้ Lightview ตัวเอง ฉันออกไปสําหรับอาหารกลางวัน เมื่อฉันกลับมามีเว็บไซต์ที่สวยงามทํางานในเบราว์เซอร์ของฉัน ฉันมองไปที่รหัสและหัวใจของฉันตก: . React with Tailwind CSS ค้นหา: LLMs จะใช้โซลูชั่นที่พบมากที่สุด / ยอดนิยมถ้าคุณไม่ได้ระบุอื่น React + Tailwind เป็นรูปแบบที่พบบ่อยมากสําหรับ SPAs ไม่มีคําแนะนําอย่างชัดเจนในการใช้ Lightview - กรอบที่ฉันเพิ่งสร้างขึ้น - LLM เป็นค่าเริ่มต้นของสิ่งที่มันเคยเห็นบ่อยที่สุดในข้อมูลการฝึกอบรม React + Tailwind is an extremely common pattern for SPAs. Without explicit guidance to use Lightview—the very framework I'd just built—the LLM defaulted to what it had seen most often in training data. Finding: LLMs will use the most common/popular solution if you don't specify otherwise. สิ่งที่แย่กว่าคือเมื่อฉันขอให้สร้างใหม่ด้วย Lightview ฉันลืมที่จะบอกว่า "ลบเว็บไซต์ที่มีอยู่ครั้งแรก" ดังนั้นจึงประมวลผลและแก้ไขไฟล์ทั้งหมด 50 ไฟล์หนึ่งต่อหนึ่งเผา tokens ด้วยอัตราการเตือนภัย คําแนะนํา: เมื่อขอให้ LLM ทํางานใหม่ให้ชัดเจนเกี่ยวกับวิธีการ: ลบไซต์ที่มีอยู่และสร้างใหม่จากจุดเริ่มต้นโดยใช้ Lightview vs Modify the existing site to use Lightview When asking an LLM to redo work, be explicit about the approach: Guidance: ลบไซต์ที่มีอยู่และสร้างใหม่จากจุดเริ่มต้นโดยใช้ Lightview vs Modify the existing site to use Lightview ประการแรกมักจะมีประสิทธิภาพมากขึ้นสําหรับการเปลี่ยนแปลงขนาดใหญ่ ประการที่สองจะดีกว่าสําหรับการแก้ไขเป้าหมาย LLM จะไม่เลือกเส้นทางที่มีประสิทธิภาพโดยอัตโนมัติ - คุณต้องนําทาง น่าประหลาดใจ Tailwind ปัญหาหนึ่งทําให้ฉันไม่สนใจ หลังจากที่ Claude สร้างเว็บไซต์โดยใช้องค์ประกอบ Lightview ฉันสังเกตเห็นว่ามันยังคงเต็มไปด้วย Tailwind CSS ชั้น ฉันถาม Claude เกี่ยวกับเรื่องนี้ "well," Claude ได้อธิบายอย่างมีประสิทธิภาพ, "คุณเลือก DaisyUI สําหรับส่วนประกอบ UI และ DaisyUI ต้อง Tailwind เป็นการอ้างอิง ฉันคิดว่าคุณจะดีกับ Tailwind ใช้ทั่วเว็บไซต์" Fair point—but I wasn't okay with it. I prefer semantic CSS classes and wanted the site to use classic CSS approaches. ค้นหา: LLMs ทําข้อสรุปที่สมเหตุสมผล แต่บางครั้งไม่พึงประสงค์ เมื่อคุณระบุเทคโนโลยีหนึ่งที่มีความเสี่ยง LLMs จะขยายการเลือกที่จะส่วนที่เกี่ยวข้องของโครงการ พวกเขามีเหตุผล แต่พวกเขาไม่สามารถอ่านความคิดของคุณเกี่ยวกับความต้องการ เมื่อคุณระบุเทคโนโลยีหนึ่งที่มีความเสี่ยง LLMs จะขยายการเลือกที่จะส่วนที่เกี่ยวข้องของโครงการ พวกเขามีความหมาย แต่พวกเขาไม่สามารถอ่านความคิดของคุณเกี่ยวกับความต้องการ Finding: LLMs make reasonable but sometimes unwanted inferences. Be explicit about what you don't want, not just what you do want. e.g. "I want DaisyUI components, but only use Tailwind for them not elsewhere." If you have strong preferences about architectural approaches, state them upfront. Guidance: เป็นที่ชัดเจนเกี่ยวกับสิ่งที่คุณไม่ต้องการไม่เพียง แต่สิ่งที่คุณต้องการ ตัวอย่างเช่น "ฉันต้องการส่วนประกอบ DaisyUI แต่ใช้ Tailwind เท่านั้นสําหรับพวกเขาไม่ใช่ที่อื่น ๆ" หากคุณมีความต้องการที่แข็งแกร่งเกี่ยวกับวิธีการสถาปัตยกรรมให้ระบุไว้ล่วงหน้า Guidance: ฉันขอให้ Claude เขียนเว็บไซต์ใหม่โดยใช้ CSS คลาสสิกและคลาสสิก ฉันชอบการออกแบบและไม่ต้องการลบไฟล์ดังนั้นอีกครั้งฉันประสบกับ refactor ที่บริโภคโทเค็นจํานวนมากเนื่องจากสัมผัสกับไฟล์จํานวนมาก ฉันอีกครั้งวิ่งออกจากโทเค็นและเหนื่อย GPT-OSS บิตตีความผิดพลาดและต้องเปลี่ยนไปยัง IDE อื่น ๆ เพื่อให้ทํางานต่อไป When one LLM struggles with your codebase, switch back to one that was previously successful. Different models have different "understanding" of your project context. And, if you are using Antigravity when you run out of tokens, you can switch to MS Visual Code in the same directory and use a light GitHub Copilot account with Claude. Antigravity is based on Visual Code, so it works in a very similar manner. Guidance: เมื่อ LLM หนึ่งต่อสู้กับรหัสฐานของคุณเปลี่ยนกลับไปยังหนึ่งที่ประสบความสําเร็จก่อนหน้านี้ รูปแบบที่แตกต่างกันมี "เข้าใจ" ที่แตกต่างกันของขอบเขตโครงการของคุณ และถ้าคุณใช้ Antigravity เมื่อคุณหมด tokens คุณสามารถเปลี่ยนไปยัง MS Visual Code ในไดเรกทอรีเดียวกันและใช้บัญชี GitHub Copilot ง่ายกับ Claude Antigravity ขึ้นอยู่กับ Visual Code ดังนั้นจึงทํางานในลักษณะที่คล้ายกันมาก Guidance: The Iterative Dance ในช่วงไม่กี่สัปดาห์ถัดไปฉันทํางานเพื่อสร้างเว็บไซต์และทดสอบ / iterate บนส่วนประกอบฉันทํางานผ่านหลาย LLMs ในขณะที่การตั้งค่าขีด จํากัด โคตร, Gemini, กลับไปที่ Claude แต่ละคนนําข้อดีและข้อบกพร่องที่แตกต่างกัน: Claude เป็นผู้เชี่ยวชาญด้านสถาปัตยกรรมและสร้างรหัสเว็บไซต์ที่สะอาดด้วยองค์ประกอบของ Lightview Pro consistently tried to use local tools and shell helper scripts to support its own work—valuable for speed and token efficiency. However, it sometimes failed with catastrophic results, many files zeroed out or corrupt with no option but to roll-back. Gemini การเปลี่ยนมุมมองได้รับการพิสูจน์ว่ามีประสิทธิภาพ: "คุณเป็น LLM ที่แตกต่างกัน ความคิดของคุณคืออะไร?" มักจะนําไปสู่ความเข้าใจที่ผ่านการเปลี่ยนแปลงหรือการแก้ไขอย่างรวดเร็วสําหรับข้อบกพร่องที่ LLM หนึ่งกําลังหมุน ฉันพบว่าผู้ชนะที่แท้จริงคือ Gemini Flash มันทํางานที่น่าตื่นตาตื่นใจของการฟื้นฟูรหัสโดยไม่ต้องแนะนําข้อผิดพลาดทางคําอธิบายและต้องการคําแนะนําขั้นต่ําเกี่ยวกับโค้ดที่จะวางที่ บางครั้งฉันสงสัยเกี่ยวกับการเปลี่ยนแปลงและจะบอกว่าเช่นนั้น บางครั้ง Flash จะเห็นด้วยและปรับและบางครั้งอื่น ๆ จะทําให้คําอธิบายที่สมเหตุสมผลของการเลือก และพูดคุยเกี่ยวกับอย่างรวดเร็ว ... wow! การพัฒนาของ Router นอกจากนี้เราเตอร์ยังจําเป็นต้องทํางาน Claude เริ่มต้นใช้เราเตอร์แบบ hash ( , ฯลฯ ) นี้เหมาะสําหรับ SPA - มันง่ายเชื่อถือได้และไม่จําเป็นต้องมีการกําหนดค่าเซิร์ฟเวอร์ #/about #/docs But I had additional requirements I hadn't clearly stated: I wanted conventional paths ( , ) สําหรับการเชื่อมโยงลึกและ SEO เครื่องมือค้นหาสามารถจัดการเส้นทางแฮชได้ในขณะนี้ แต่เส้นทางตามเส้นทางยังคงสะอาดสําหรับการดัชนีและการแบ่งปัน /about /docs ค้นหา: LLMs บางครั้งจะเริ่มต้นเป็นโซลูชั่นที่ถูกต้องง่ายที่สุด การนําทางตาม hash เป็นเรื่องง่ายที่จะดําเนินการและทํางานได้โดยไม่มีการกําหนดค่าด้านเซิร์ฟเวอร์ เนื่องจากฉันไม่ได้กล่าวว่าฉันต้องการการนําทางตามเส้นทาง LLM จะเลือกวิธีการที่เรียบง่ายขึ้น ค้นหา: LLMs บางครั้งจะเริ่มต้นเป็นโซลูชั่นที่ถูกต้องง่ายที่สุด การนําทางตาม hash เป็นเรื่องง่ายที่จะดําเนินการและทํางานได้โดยไม่มีการกําหนดค่าด้านเซิร์ฟเวอร์ เนื่องจากฉันไม่ได้กล่าวว่าฉันต้องการการนําทางตามเส้นทาง LLM จะเลือกวิธีการที่เรียบง่ายขึ้น When I told Claude I needed conventional paths for SEO and deep linking, it very rapidly rewrote the router and came up with what I consider a clever solution—a hybrid approach that makes the SPA pages both deep-linkable and SEO-indexable without the complexity of server-side rendering. However, it did leave some of the original code in place which kind of obscured what was going on and was totally un-needed. I had to tell it to remove this code which supported the vestiges of hash-based routes. This code retention is the kind of thing that can lead to slop. I suppose many people would blame the LLM, but if I had been clear to start with and also said “completely re-write”, my guess is the vestiges would not have existed. คู่มือ: สําหรับรูปแบบสถาปัตยกรรมให้ชัดเจนเกี่ยวกับความต้องการของคุณล่วงหน้า อย่าคิดว่า LLM รู้ว่าคุณต้องการวิธีการที่ซับซ้อนมากขึ้น แต่เป็นมิตรกับ SEO ระบุ: "ฉันต้องการเส้นทางตามเส้นทางด้วย History API สําหรับ SEO" แทนเพียง "ฉันต้องการเส้นทาง" คู่มือ: สําหรับรูปแบบสถาปัตยกรรมให้ชัดเจนเกี่ยวกับความต้องการของคุณล่วงหน้า อย่าคิดว่า LLM รู้ว่าคุณต้องการวิธีการที่ซับซ้อนมากขึ้น แต่เป็นมิตรกับ SEO ระบุ: "ฉันต้องการเส้นทางตามเส้นทางด้วย History API สําหรับ SEO" แทนเพียง "ฉันต้องการเส้นทาง" คําแนะนํา: ฉันยังพบว่า LLMs ป้องกันพยายามที่จะให้ความเข้ากันได้กับรุ่นก่อนหน้านี้ซึ่งอาจนําไปสู่รหัสที่ซับซ้อนเกินไป หากคุณเขียนจากจุดเริ่มต้นคุณต้องเตือนพวกเขาว่าความเข้ากันได้ด้านหลังไม่จําเป็น คําแนะนํา: ฉันยังพบว่า LLMs ป้องกันพยายามที่จะให้ความเข้ากันได้กับรุ่นก่อนหน้านี้ซึ่งอาจนําไปสู่รหัสที่ซับซ้อนเกินไป หากคุณเขียนจากจุดเริ่มต้นคุณต้องเตือนพวกเขาว่าความเข้ากันได้ด้านหลังไม่จําเป็น Confronting The Numbers ล่าสุด Tally Project Size: 60 JavaScript files, 78 HTML files, 5 CSS files 41,405 รหัสบรรทัดรวม (รวมถึงความคิดเห็นและว่างเปล่า) Over 40 custom UI components 70 + เว็บไซต์ไฟล์ ในขณะนี้ไฟล์ดูเหมือนสมเหตุสมผล - ไม่ซับซ้อนมากเกินไป แต่ความรู้สึกของฉันและความรู้สึกของฉันเกี่ยวกับโค้ดหลังจากการพัฒนาซอฟต์แวร์มานานกว่า 40 ปีก็ไม่เพียงพอ ฉันตัดสินใจที่จะเรียกใช้วัดแบบฟอร์มในไฟล์หลัก Core Libraries: File Lines Minified Size lightview.js 603 7.75K lightview-x.js 1,251 20.2K lightview-router.js 182 3K โคมไฟ.js 603 7.75K Lightview-x.js 1,251 20.2K Lightview-router.js ของเรา 182 3K The website บันทึกดี for performance without having had super focused optimization. แกลเลอรี่ส่วนประกอบ Lighthouse แต่จากนั้นมาถึงเมตริกความซับซ้อน The Slop เปิดเผย ฉันถาม Gemini Flash เพื่อประเมินรหัสโดยใช้สามมาตรฐานอย่างเป็นทางการ: เมตริกรวมซึ่ง 0 ไม่สามารถรักษาได้และ 100 เป็นรหัสที่ได้รับการพิสูจน์อย่างสมบูรณ์แบบ / ทําความสะอาด การคํานวณพิจารณา: 1. Maintainability Index (MI): Halstead Volume (measure of code size and complexity) ความซับซ้อนของ Cyclomatic สายรัดของรหัส Comment density Scores above 65 are considered healthy for library code. This metric gives you a single number to track code health over time. เมตรเก่า แต่ยังคงมีค่าซึ่งวัดจํานวนเส้นทางที่เป็นอิสระโดยใช้รหัส ความซับซ้อนของวงจรสูงหมายความว่า: 2. Cyclomatic Complexity: ปากกาที่มีศักยภาพมากขึ้น ยากกว่าในการทดสอบอย่างละเอียด (เมตริกสามารถบอกคุณว่าคุณอาจจําเป็นต้องเขียนจํานวนมาก) โหลดทางปัญญามากขึ้นเพื่อเข้าใจ เมตริกที่ทันสมัยที่วัดความพยายามทางจิตของมนุษย์ต้องเข้าใจรหัส ซึ่งแตกต่างจากความซับซ้อนของ cyclomatic (ซึ่งรักษาการไหลของการควบคุมทั้งหมดอย่างเท่าเทียมกัน) ความซับซ้อนทางปัญญาลงโทษ: 3. Cognitive Complexity: สะพานและวงกลมที่แนบมา (การแนบลึก = การลงโทษที่สูงขึ้น) Chain Boolean ผู้ประกอบการ Recursion ละลายในกระแสเชิงเส้น ขอบเขต: 0-15: รหัสสะอาด - ง่ายต่อการเข้าใจและบํารุงรักษา High Friction - refactoring suggested to reduce technical debt 16-25: Critical - immediate attention needed, maintenance nightmare 26+: Gemini Flash initially searched for an existing metrics library, couldn't find one, then ( ) using the Acorn JavaScript parser—without asking permission. This is both impressive and occasionally problematic. I cover the problem with this case later. Finding: LLMs excel at creating analysis tools. wrote its own complete analyzer metrics-analysis.js Gemini Flash initially searched for an existing metrics library, couldn't find one, then ( ) โดยใช้ Acorn JavaScript Parser - โดยไม่ต้องขออนุญาต สิ่งนี้เป็นที่น่าประทับใจและบางครั้งมีปัญหา ฉันจะอธิบายปัญหาเกี่ยวกับกรณีนี้ในภายหลัง Finding: LLMs excel at creating analysis tools. wrote its own complete analyzer metrics-analysis.js The Verdict สุขภาพโดยรวมดูดี: File Functions Avg Maintainability Avg Cognitive Status lightview.js 58 65.5 3.3 ⚖️ Good lightview-x.js 93 66.5 3.6 ⚖️ Good lightview-router.js 27 68.6 2.1 ⚖️ Good lightview.js 58 65.5 3.3 ️ดี lightview-x.js 93 66.5 3.6 ️ดี lightview-router.js 27 68.6 2.1 ⚖️ Good แต่การเจาะเข้าไปในฟังก์ชั่นแต่ละคนบอกเรื่องที่แตกต่างกัน Two functions hit "Critical" status: (lightview-x.js ) handleSrcAttribute ความซับซ้อนทางปัญญา: 35 Cyclomatic Complexity: 🛑 22 อัตราการบํารุงรักษา: 33.9 (lightview-x.js ) Anonymous Template Processing Cognitive Complexity: 🛑 31 ความซับซ้อนของ Cyclomatic: 13 This was slop. Technical debt waiting to become maintenance nightmares. Can AI Fix Its Own Slop? Here's where it gets interesting. The code was generated by Claude Opus, Claude Sonnet, and Gemini 3 Pro several weeks earlier. Could the newly released clean it up? Gemini 3 Flash I asked Flash to refactor to address its complexity. This seemed to take a little longer than necessary. So I aborted and spent some time reviewing its thinking process. There were obvious places it got side-tracked or even went in circles, but I told it to continue. After it completed, I manually inspected the code and thoroughly tested all website areas that use this feature. No bugs found. handleSrcAttribute Gemini Flash "thinks" extensively. While reviewing all its thought processes would be tedious, important insights flash by in the IDE. When an LLM seems stuck in a loop, aborts and review historical thoughts for possible sidetracks and tell to continue or redirect as needed. Critical Discovery #2: Gemini Flash "thinks" extensively. While reviewing all its thought processes would be tedious, important insights flash by in the IDE. When an LLM seems stuck in a loop, aborts and review historical thoughts for possible sidetracks and tell to continue or redirect as needed. Critical Discovery #2: After the fixes to , I asked for revised statistics to see the improvement. handleSrcAttribute Flash's Disappearing Act Unfortunately, Gemini Flash had deleted its file! It had to recreate the entire analyzer. metrics-analysis.js ค้นหา: Gemini Flash ลบไฟล์ชั่วคราวอย่างรุนแรง หลังจากที่ Flash ใช้สคริปต์หรือเครื่องมือวิเคราะห์ที่มันสร้างมันมักจะลบไฟล์ที่คาดว่าจะเป็นชั่วคราว สิ่งนี้เกิดขึ้นแม้สําหรับไฟล์ที่ต้องใช้ความพยายามอย่างมากในการสร้างและที่คุณอาจต้องการเก็บหรือใช้ซ้ํา หลังจาก Flash ใช้สคริปต์หรือเครื่องมือการวิเคราะห์ที่มันสร้างแล้วมักจะลบไฟล์โดยจําไว้ว่าเป็นไฟล์ชั่วคราว สิ่งนี้เกิดขึ้นแม้แต่สําหรับไฟล์ที่ต้องใช้ความพยายามอย่างมากในการสร้างและที่คุณอาจต้องการเก็บหรือใช้ซ้ํา Finding: Gemini Flash aggressively deletes temporary files. Tell Gemini to put utility scripts in a specific directory (like or ) and explicitly ask it to keep them. You can say: "Create this in /home/claude/tools/ and keep it for future use." Otherwise, you'll find yourself regenerating the same utilities multiple times. Guidance: /home/claude/tools/ /home/claude/scripts/ Tell Gemini to put utility scripts in a specific directory (like หรือ ) and explicitly ask it to keep them. You can say: "Create this in /home/claude/tools/ and keep it for future use." Otherwise, you'll find yourself regenerating the same utilities multiple times. Guidance: /home/claude/tools/ /home/claude/scripts/ The Dev Dependencies Problem When I told Gemini to keep the metrics scripts permanently, another issue surfaced: it failed to officially install dev dependencies like (the JavaScript parser). acorn Flash เพียงคิดว่าเพราะพบแพคเกจใน , it could safely use them. The only reason สามารถใช้ได้เนื่องจากฉันได้ติดตั้ง Markdown Parser ที่ขึ้นอยู่กับมันแล้ว node_modules acorn They'll use whatever's available in without verifying it's officially declared in . This creates fragile builds that break on fresh installs. Finding: LLMs don't always manage dependencies properly. node_modules package.json พวกเขาจะใช้ทุกสิ่งที่มีอยู่ใน โดยไม่ต้องตรวจสอบว่ามันได้รับการประกาศอย่างเป็นทางการใน . This creates fragile builds that break on fresh installs. Finding: LLMs don't always manage dependencies properly. node_modules package.json After an LLM creates utility scripts that use external packages, explicitly ask: "Did you add all required dependencies to package.json? Please verify and install any that are missing." Better yet, review the script's imports and cross-check against your declared dependencies yourself. Guidance: After an LLM creates utility scripts that use external packages, explicitly ask: "Did you add all required dependencies to package.json? Please verify and install any that are missing." Better yet, review the script's imports and cross-check against your declared dependencies yourself. Guidance: The Refactoring Results With the analyzer recreated, Flash showed how it had decomposed the monolithic function into focused helpers: (cognitive: 5) fetchContent (cognitive: 5) parseElements (cognitive: 7) updateTargetContent elementsFromSelector (ทางปัญญา: 2) orchestrator (cognitive: 10) handleSrcAttribute ผลการ Metric Before After Improvement Cognitive Complexity 35 🛑 10 ✅ -71% Cyclomatic Complexity 22 7 -68% Status Critical Slop Clean Code — ความซับซ้อนทางปัญญา 35 10 -71% Cyclomatic Complexity 22 7 -68% Status Critical Slop รหัสที่สะอาด — Manual inspection and thorough website testing revealed zero bugs. The cost? A 0.5K increase in file size - negligible. Emboldened, I tackled the template processing logic. Since it spanned multiple functions, this required more extensive refactoring: Extracted Functions: - iteration logic collectNodesFromMutations processAddedNode - การสแกนตรรกะ transformTextNode - template interpolation สําหรับข้อความ transformElementNode - interpolation และ recursion ของคุณสมบัติ Results: Function Group Previous Max New Max Status MutationObserver Logic 31 🛑 6 ✅ Clean domToElements Logic 12 ⚠️ 6 ✅ Clean MutationObserver Logic 31 6 ✅ ทําความสะอาด domToElements Logic 12 ⚠️ 6 Clean Final Library Metrics หลังจาก refactoring lightview-x.js ได้ปรับปรุงอย่างมีนัยสําคัญ: 93 → 103 (better decomposition) Functions: 66.5 → 66.8 Avg Maintainability: 3.6 → Avg Cognitive: 3.2 All critical slop eliminated. The increased function count reflects healthier modularity - complex logic delegated to specialized, low-complexity helpers. In fact, it is as good or better than established frameworks from a metrics perspective: File Functions Maintainability (min/avg/max) Cognitive (min/avg/max) Status lightview.js 58 7.2 / 65.5 / 92.9 0 / 3.4 / 25 ⚖️ Good lightview-x.js 103 0.0 / 66.8 / 93.5 0 / 3.2 / 23 ⚖️ Good lightview-router.js 27 24.8 / 68.6 / 93.5 0 / 2.1 / 19 ⚖️ Good react.development.js 109 0.0 / 65.2 / 91.5 0 / 2.2 / 33 ⚖️ Good bau.js 79 11.2 / 71.3 / 92.9 0 / 1.5 / 20 ⚖️ Good htmx.js 335 0.0 / 65.3 / 92.9 0 / 3.4 / 116 ⚖️ Good juris.js 360 21.2 / 70.1 / 96.5 0 / 2.6 / 51 ⚖️ Good lightview.js 58 7.2 / 65.5 / 92.9 0 / 3.4 / 25 ⚖️ Good lightview-x.js 103 0.0 / 66.8 / 93.5 0 / 3.2 / 23 ⚖️ Good lightview-router.js 27 24.8 / 68.6 / 93.5 0 / 2.1 / 19 ⚖️ Good react.development.js 109 0.0 / 65.2 / 91.5 0 / 2.2 / 33 ️ดี bau.js 79 11.2 / 71.3 / 92.9 0 / 1.5 / 20 ️ดี htmx.js 335 0.0 / 65.3 / 92.9 0 / 3.4 / 116 ️ดี juris.js 360 21.2 / 70.1 / 96.5 0 / 2.6 / 51 ⚖️ Good 1. LLMs กระจกพฤติกรรมมนุษย์ - สําหรับดีขึ้นและแย่ลง LLMs exhibit the same tendencies as average developers: เร่งไปสู่รหัสโดยไม่ต้องเข้าใจอย่างเต็มที่ Don't admit defeat or ask for help soon enough Generate defensive, over-engineered solutions when asked to improve reliability สร้างรหัสที่สะอาดขึ้นด้วยโครงสร้างและกรอบ ความแตกต่าง? พวกเขาทํา . They can generate mountains of slop in hours that would take humans weeks. faster and at greater volume 2. การคิดช่วย การพิจารณาที่ขยาย (มองเห็นได้ในโหมด "คิด") แสดงตัวเลือกการแก้ไขตนเองและช่วงเวลา "โอ้ แต่" บางครั้ง การคิดมักจะผลผลิตบางครั้งโกลเด้น อย่าทิ้งหรือทําสิ่งอื่น ๆ เมื่องานที่คุณเชื่อว่าซับซ้อนหรือสําคัญจะดําเนินการ LLMs ไม่ค่อยบอกว่า "ฉันหลีกเลี่ยง" หรือ "โปรดให้คําแนะนําฉัน" - ฉันหวังว่าพวกเขาจะบ่อยขึ้น ดูการไหลของความคิดและยกเลิกคําขอการตอบสนองหากจําเป็น อ่านการคิดและเปลี่ยนทิศทางหรือเพียงแค่พูดต่อไปคุณจะได้เรียนรู้มาก 3. Multiple Perspectives Are Powerful When I told a second LLM, "You are a different LLM reviewing this code. What are your thoughts?", magic happened. When faced with an implementation that's critiqued as too abstract, insufficiently abstract, or inefficient, leading LLMs (Claude, Gemini, GPT) won't argue. They'll do a rapid, thorough analysis and return with honest pros/cons of current versus alternative approaches. Finding: LLMs are remarkably non-defensive. เมื่อเผชิญกับการใช้งานที่ถูกวิจารณ์ว่าเป็นสารพิจารณามากเกินไปไม่เพียงพอสารพิจารณาหรือไม่มีประสิทธิภาพ LLMs ชั้นนํา (Claude, Gemini, GPT) จะไม่โต้แย้ง พวกเขาจะทําการวิเคราะห์อย่างรวดเร็วอย่างละเอียดและกลับมาด้วยข้อดี / ข้อเสียที่ซื่อสัตย์ของวิธีการทางเลือกปัจจุบัน Finding: LLMs are remarkably non-defensive. This behavior is actually : beyond what most humans provide How many human developers give rapid, detailed feedback without any defensive behavior? บริษัท หลายแห่งมีสถาปัตยกรรมที่มีประสบการณ์ที่สามารถสอบถามได้โดยนักพัฒนาใด ๆ ในเวลาใด ๆ How many code review conversations happen without ego getting involved? Before OR after making changes, switch LLMs deliberately: Guidance: Make progress with one LLM (e.g., Claude builds a feature) Switch to another (e.g., Gemini) and say: "You are a different LLM reviewing this implementation. What are your thoughts on the architecture, potential issues, and alternative approaches?" Then switch back to the first and ask what it thinks now! This is especially valuable before committing to major architectural decisions or after implementing complex algorithms. The second opinion costs just a few tokens but can save hours of refactoring later. ก่อนหรือหลังจากการเปลี่ยนแปลงเปลี่ยน LLMs อย่างมีนัยสําคัญ: Guidance: Make progress with one LLM (e.g., Claude builds a feature) Switch to another (e.g., Gemini) and say: "You are a different LLM reviewing this implementation. What are your thoughts on the architecture, potential issues, and alternative approaches?" Then switch back to the first and ask what it thinks now! This is especially valuable before committing to major architectural decisions or after implementing complex algorithms. The second opinion costs just a few tokens but can save hours of refactoring later. 4. Structure Prevents Slop Telling an LLM to use "vanilla JavaScript " without constraints invites slop. Vanilla JavaScript is a wonderful but inherently loose language through which a sometimes sloppy or inconsistent browser API is exposed. Without constraints, it's easy to create unmaintainable code—for both humans and LLMs. Specifying a framework (Bau.js, React, Vue, Svelte, etc.) provides guardrails that lead to cleaner, more maintainable code. Finding: Telling an LLM to use "vanilla JavaScript " without constraints invites slop. Vanilla JavaScript is a wonderful but inherently loose language through which a sometimes sloppy or inconsistent browser API is exposed. Without constraints, it's easy to create unmaintainable code—for both humans and LLMs. Specifying a framework (Bau.js, React, Vue, Svelte, etc.) provides guardrails that lead to cleaner, more maintainable code. Finding: คําแนะนํา: เมื่อเริ่มต้นโครงการตามสิ่งที่คุณต้องการบรรลุขอคําแนะนําเกี่ยวกับ: The framework/library to use (React, Vue, Svelte, etc.) The architectural pattern (MVC, MVVM, component-based, etc.) Code organization preferences (feature-based vs. layer-based folders) Naming conventions Whether to use TypeScript or JSDoc for type safety ห้องสมุดอื่น ๆ เพื่อใช้ ... ไม่ป้องกันการค้นพบใหม่ Don't say: "Build me a web app in JavaScript" Do say: "Build me a React application using functional components, hooks, TypeScript, and feature-based folder organization. Follow Airbnb style guide for naming." The more structure you provide upfront, the less slop you'll get. This applies to all languages, not just JavaScript. When starting a project, based on what you want to accomplish ask for advice on: Guidance: กรอบ/ห้องสมุดที่จะใช้ (React, Vue, Svelte ฯลฯ) รูปแบบสถาปัตยกรรม (MVC, MVVM, ขึ้นอยู่กับส่วนประกอบ ฯลฯ) ตัวเลือกการจัดระเบียบรหัส (โฟลเดอร์ที่ขึ้นอยู่กับคุณสมบัติและโฟลเดอร์ที่ขึ้นอยู่กับชั้น) ชื่อของข้อตกลง Whether to use TypeScript or JSDoc for type safety ห้องสมุดอื่น ๆ เพื่อใช้ ... ไม่ป้องกันการค้นพบใหม่ Don't say: "Build me a web app in JavaScript" Do say: "Build me a React application using functional components, hooks, TypeScript, and feature-based folder organization. Follow Airbnb style guide for naming." โครงสร้างมากขึ้นที่คุณให้ไว้ล่วงหน้าคุณจะได้รับแนวโน้มน้อยลง นี่ใช้ได้กับภาษาทั้งหมดไม่เพียง แต่ JavaScript 5. เมตรให้ความจริงวัตถุประสงค์ ฉันรักว่าเมตริกซอฟต์แวร์อย่างเป็นทางการสามารถแนะนําการพัฒนา LLM พวกเขามักจะถูกพิจารณาว่าเป็นเบื่อเกินไปกลไกยากหรือมีราคาแพงที่จะได้รับสําหรับการพัฒนาของมนุษย์ แต่ใน IDE ที่ได้รับการปรับปรุงโดย LLM พร้อมกับ LLM ที่สามารถเขียนรหัสเพื่อทําการวิเคราะห์แหล่งข้อมูลอย่างเป็นทางการ (ไม่จําเป็นต้องใช้การสมัครสมาชิก IDE plugin) พวกเขาควรได้รับความสนใจมากขึ้นกว่าที่พวกเขาทํา They're perfect for: Finding: Formal software metrics can guide development objectively. Identifying technical debt automatically Tracking code health over time Guiding refactoring priorities Validating that "improvements" actually improve things They're perfect for: Finding: Formal software metrics can guide development objectively. การระบุค่าใช้จ่ายทางเทคนิคโดยอัตโนมัติ Tracking code health over time Guiding refactoring priorities การยืนยันว่า "การปรับปรุง" จริงๆปรับปรุงสิ่งต่าง ๆ เมทริกส์ไม่ได้โกหก พวกเขาระบุแนวโน้มที่วิสัยทัศน์ของฉันพลาด คู่มือ: การรวมเมตริกซ์ในกระบวนการทํางาน LLM ของคุณ: หลังจากใช้งานครั้งแรก: ดําเนินการวัดความซับซ้อนในไฟล์ทั้งหมด ระบุฟังก์ชั่นที่มีความซับซ้อนทางปัญญา > 15 หรือความซับซ้อนทาง cyclomatic > 10 Prioritize refactoring: Address Critical (Cognitive > 26) ฟังก์ชั่นแรกแล้ว "ความตึงเครียดสูง" (16-25) ฟังก์ชั่น Don't just say ‘improve this’. Say ‘Refactor handleSrcAttribute to reduce cognitive complexity to target range’. Request targeted refactoring: After refactoring, re-run metrics. Ensure complexity actually decreased and maintainability increased. Sometimes ‘improvements’ just shuffle complexity around. Verify improvements: กําหนดเก้าอี้คุณภาพ: ก่อนที่จะทําเครื่องหมายรหัสให้ลองมีฟังก์ชั่นทั้งหมดที่มีความซับซ้อนทางปัญญา 65 คู่มือ: การรวมเมตริกซ์ในกระบวนการทํางาน LLM ของคุณ: หลังจากใช้งานครั้งแรก: ดําเนินการวัดความซับซ้อนในไฟล์ทั้งหมด ระบุฟังก์ชั่นที่มีความซับซ้อนทางปัญญา > 15 หรือความซับซ้อนทาง cyclomatic > 10 Prioritize refactoring: Address Critical (Cognitive > 26) ฟังก์ชั่นแรกแล้ว "ความตึงเครียดสูง" (16-25) ฟังก์ชั่น Don't just say ‘improve this’. Say ‘Refactor handleSrcAttribute to reduce cognitive complexity to target range’. Request targeted refactoring: After refactoring, re-run metrics. Ensure complexity actually decreased and maintainability increased. Sometimes ‘improvements’ just shuffle complexity around. Verify improvements: กําหนดเก้าอี้คุณภาพ: ก่อนที่จะทําเครื่องหมายรหัสให้ลองมีฟังก์ชั่นทั้งหมดที่มีความซับซ้อนทางปัญญา 65 The Verdict After 40,000 lines of LLM-generated code, I'm cautiously optimistic. แต่เช่นเดียวกับนักพัฒนามนุษย์พวกเขาต้องการ: Yes, LLMs can generate quality code. Clear, detailed specifications ข้อ จํากัด โครงสร้าง ( frameworks, patterns) Regular refactoring guidance การวัดคุณภาพวัตถุประสงค์ Multiple perspectives on architectural decisions The criticism that LLMs generate slop isn't wrong—but it's incomplete. They generate slop for the same reasons humans do: . unclear requirements, insufficient structure, and lack of quality enforcement ความแตกต่างคือความเร็วของ iteration สิ่งที่อาจใช้เวลาหลายเดือนสําหรับทีมงานมนุษย์ในการสร้างและ refactor LLMs สามารถทําได้ภายในชั่วโมง งานทําความสะอาดยังคงอยู่ แต่รุ่นเริ่มต้นเร่งอย่างมาก Looking Forward I'm skeptical that most humans will tolerate the time required to be clear and specific with LLMs - just as they don't today when product managers or developers push for detailed requirements from business staff. The desire to "vibe code" and iterate will persist. นี่คือสิ่งที่เปลี่ยนแปลง: วงจรการตอบสนองจะถูกบีบอัดจากสัปดาห์ถึงชั่วโมง We can now iterate and clean up faster when requirements evolve or prove insufficient. As coding environments evolve to wrap LLMs in better structure - automated metrics, enforced patterns, multi-model reviews -the quality will improve. We're not there yet, but the foundation is promising. The real question isn't whether LLMs can generate quality code. It's whether we can provide them - and ourselves - with the discipline to do so consistently. And, I have a final concern … if LLMs are based on history and have a tendency to stick with what they know, then how are we going to evolve the definition and use of things like UI libraries? Are we forever stuck with React unless we ask for something different? Or, are libraries an anachronism? Will LLMs and image or video models soon just generate the required image of a user interface with no underlying code? Given its late entry into the game and the anchoring LLMs already have, I don’t hold high hopes for the adoption of Lightview, but it was an interesting experiment. You can visit the project at: https://lightview.dev