Kwiimeko "Big Tech" (ngokuyazi, iintlobo kunye namasebenzisi ezininzi, iinkcukacha ezininzi kunye nezidingo ezintsha ngokukhawuleza), ukhuseleko database Iingxaki yokuvimbela idatha ezimbini-ukuba kuxhomekeke into efana ne-financial reconciliation apho yonke i-penny kufuneka efanelekileyo-ukungabonakali, ingaba akuyona efanelekileyo njengoko ucinga. Plus, i-cost ofining them can be surprisingly high. Umgangatho engcono ngokuvamile ukulawula i-mass of deduplication logic kwi-application layer. Ukuba unako ukunceda ukusetyenziswa kwe-database unique index, ukhangela ukwenza oku, okanye okungenani ukhangela ngokugqithisileyo ngaphambi kokuphumelela enye. UNIQUE INDEX 1. Yintoni ndiqala ukufikelela kwi-indices eyenziwe ngexabiso? Ngenxa yokuba ndizawula. Iindeksi ezizodwa ze-database ziyafumaneka kakhulu, ngoko ke? Umgca lokugqibela lokuphendula kwimibelelwano yedatha. Ndingathanda ukuba ndingathanda nangoko. Xa ibhasi e-tabula kufuneka yinto ezizodwa, ndingathanda iindeksi ezizodwa kwi-tabula. Kwiimeko ezininzi, i-reality iye yenza i-wake-up call. Okwangoku, kwiminyaka emininzi, xa i-haar yaba kakhulu, ndingathanda ukongeza iindeksi ye-composite ye-indice kwi-table enezigidi zeemilioni zeengxasi (ngokuthi, ngenxa yeengxasi afana neengxasi ze-composite). iimveliso Yintoni, ke? Ngoko ke, yonke inkqubo yokuguqula lithunyelwe Kwixesha elide, i-master-slave replication lag yaba kwi-rollercoaster, kwaye sinxininzi iingxaki malunga ne-service hiccups. Emva koko, ndingathanda ukuba ndingathanda: le "i-uniqueness" kwinqanaba le-database iye yenza yonke into kunye ne-risk? tenant_id is_deleted iintsuku Kwixesha lokugqibela, yaba enye ingxaki. I-business-wise, thina bonke bayazi iimveliso i-imeyile efanelekileyo. Ikhowudi yakho ye-application uya kukuguquka (isib. ukuba i-downcase) ngaphambi kokucinga i-duplicates ngexesha lokubhalisa. Kodwa i-indice yobugcisa ye-database (eyenza i-case-sensitive ngexesha elidlulileyo) ayifumaneka ngoko. Kwiimeko, ngenxa yeedatha ezininzi okanye i-sync data ye-side-channel eyenziwe ngokufanelekileyo, uya kufika kunye neengxaki ezimbini ze-imeyile "eya" kwi-database. Kwiimeko ezininzi, i-indice yobugcisa "ukuguquka" kule-duplication kwinqanaba le-business okanye, xa uthetha ukuguquka idatha, user@example.com USER@EXAMPLE.COM Kwaye awukwazi ukuqalisa kwizidingo zebhizinisi ezintsha. Umzekelo, ngoko ke "ubungqina kwe-imeyile" yaba elide, kodwa ngoku iimfuno ifumaneka kwi "i-ID ye-tender + i-imeyile ye-uniqueness." Great. Ikhodi le-application kufuneka ifumaneke, ngoko? Kwaye i-indice eyodwa ye-database kufuneka ifumaneke. I-PED kunye ne-new one d. Ukulungiselela njani iintlobo ezimbini zokusebenza? Yintoni kuqala? Yintoni ukuba kukho into efanelekileyo phakathi? Ukwenza iintlobo zokusebenza ezininzi kwiitebhasi ezininzi kunokuthi ukutya i-bomb nganye - i-nervewracking ngokupheleleyo. DROP CREATE Zonke iimvavanyo ziye zithembisa: Kwiimeko ezininzi zeendaba ezininzi, ezininzi kunye nezidingo ezintsha ngokukhawuleza, indlela yokuzonwabisa iindeksi ezizodwa yinto efanelekileyo? Inqaku le nqaku kubandakanya iingxaki zayo kule nqaku. 2. : Yintoni nathi siphinde kakhulu? Iindeksi ezininzi Iindeksi ezininzi Ngaphambi kokufumana iingxaki, nceda siphinde kwaye siphinde ukuba iindeksi ezizodwa ziyafumaneka kakhulu. Zininzi iingxaki ezininzi ezibonakalayo: Ukhuseleko lokugqibela kwe-data integrity: I-barrier lokugqibela yokuvimbela i-data duplicate. Easy ukuvelisa: A few lines of SQL xa ukwakha umbhalo okanye ukongeza i-DDL emva koko, kwaye ungenza. I-Schema as Documentation: I-Schema ifumaneka kwi-Schema; le ndawo ayikwazi ukufumana i-duplicates. Ukukhuthaza ukusebenza kwe-query: Njengoko i-index, i-query kwi-keyword inokufumana ngokukhawuleza. Zonke izinzuzo ziquka kakhulu kwiiprojekthi ezincinane, okanye xa iimveliso ze-data zibonakalayo kwaye i-logic yebhizinisi ayinxalenye kakhulu. Kodwa izinto zibonakalisa kakhulu xa ufike kwi- "i-battleground" ye-big data kunye ne-iteration eshushu. 3. Phantsi kwe "Big Tech" lens: Ezi ezinzuzo zikhuselekileyo? Iindeksi ezininzi Iindeksi ezininzi Nceda siphinde zonke i "izinzuzo" ezidlulileyo kwaye zibonise ukuba ziyafumaneka kwi-scale-based, i-speed-paced technology environment. "The ultimate safeguard"? Is this safeguard reliable? What exactly is it safeguarding against? It doesn't fully recognize business-level "duplicates"! Except the email case sensitivity issue I mentioned earlier (which could be solved by using but introduce more complexity in the DB layer), or phone numbers with or without , or usernames with or without special characters stripped... these nuances, which business logic considers "the same," are beyond the grasp of a database's simplistic "byte-for-byte identical" unique index. It can't prevent "logical duplicates" at the business layer. collation +44 The application layer has to do the heavy lifting anyway. Since all these complex "sameness" checks must be handled in the application code (you can't just throw raw database errors at users, can you?), the application layer is the true workhorse ensuring "business data uniqueness." The database's unique index is, at best, an "auxiliary police officer" whose standards might not even align with the business rules. In distributed systems, it's merely a "local bodyguard." Once you shard your tables in a distributed scenario, an in-table unique index can't ensure global uniqueness. Global uniqueness then relies on ID generation services or application-level global validation. At this point, the "safeguard" provided by the local database index becomes even less significant. This "ultimate safeguard" might miss the mark, has limited coverage, and relying solely on it is a bit precarious. "Easy to implement"? One-time setup, week-long headache. Adding a unique index to a brand new table is indeed just one SQL statement. But more often, you're changing the rules for an old table that's been running for ages and has accumulated mountains of data. Trying to alter a unique index on a table with tens of millions of rows (e.g., changing from a single-field unique to a composite unique) could mean several minutes of table locking! Online DDL tools might save you from service downtime, but the entire process can still be lengthy, resource-intensive, and risky. Agile? Not so fast! In scenarios with rapid iteration, multi-region synchronization, and compliance requirements, a single unique index change at the database level can hold you up for days. So much for agility. So, that initial "simplicity" is like bait compared to the "hell" of modifying it later. "Schema as documentation"? The documentation might not match reality! Yes, a unique index in the table structure acts as a form of "technical documentation." But "documentation" can be misleading. If the "uniqueness" defined by this index doesn't align with the actual, more complex business rules (like the case-insensitivity example), then this "documentation" is not only useless but can also mislead future developers. If changing this "documentation" (i.e., modifying the unique index) involves an epic struggle, why not write down the business rules properly in actual design documents, wikis, or code comments? Those are far easier to update. "A potential query performance boost"? Is the tail wagging the dog? This is a common misconception, or rather, an overemphasized "added value." If you simply want to speed up queries on a specific field or set of fields, you can absolutely create a regular, non-unique index for them! A non-unique index will boost query speeds just fine, and it comes without the write overhead, DDL pains, and rigid business logic constraints of a unique index. Master-slave index inconsistency can instantly "paralyze" replication: I've seen it happen multiple times: the unique index configuration on the primary database is updated (e.g., a field is added, or a constraint is changed), but the index on the replica isn't modified in sync. Then, as soon as data changes on the primary (e.g., a row is inserted that would be considered a duplicate on the replica, or the primary can write it but the replica can't due to the incorrect/outdated index), the binlog is applied to the replica, and bam! . Replication just dies. When this happens, you get data lag, read-write splitting is affected, and it can even impact failover capabilities. What a nightmare, right? Slave_SQL_Running: No I-Let the Application Layer Do the Job—It’s What It’s Good At! Ngokusho yonke into ezininzi kunye neengxaki ezizodwa ze-database, ukhuseleko lokufikelela i-data-uniqueness kufuneka kuqala kuxhomekeke kwi-application layer yethu. Iimpawu zokufanisa i-uniqueness kwi-application layer ziquka ezininzi: I-Flexible and Precise: Yintoni i-business ibonise njenge-duplicate, sinokufumana i-logic ngokufanelekileyo - i-case sensitivity, i-formatting, iimeko ezininzi, nceda ushiye. I-Better User Experience: Ukuba umxokozeli uyenza i-error, sinokufumana i-feedback emangalisayo, ezifana ne-"I-telephone number is already registered. Would you like to log in instead?" kunokuba i-cold, i-cryptic database error. I-Efficient Early Rejection: I-Intercept ibandakanya kwi-service interface layer okanye kwi-gateway layer, phambi kokuba idatha ivela kwi-database, ukunciphisa uhambo olungabikhoyo. I-Idempotency ye-Interface: Yinto i-idepotency efanelekileyo kwi-duplicate operations. Ukuba umdlali u-double-click kwi-submit button, okanye ingxaki ye-network isenza i-retry, i-idempotency efanelekileyo kwi-application layer ibonelela ukuba iinkcukacha ziyafumaneka kwi-duplicate. I-indice eyodwa ayikwazi ukunceda. Ukucinga Ukubonisa ukusetyenziswa kwe-indice eyodwa kuphela xa izinzuzo zayo (ngokuquka i-absolute data backstop ye-last-resort kwiimeko ezininzi) zibonakalisa kakhulu iingxaki ezininzi ezivela kwimeko ezininzi kunye neendaba ezininzi kunye ne-iteration ezincinane (ukunciphisa i-agility, ukuxhaswa kwe-operational). Qinisekisa iinkqubo ezininzi ze-application-layer (validation ye-front-end, ukusetyenziswa kwe-asynchronous, i-idempotency, ukuvelisa i-ID ye-global, njl). Ngokuquka le-indice eyodwa ye-database, ukunceda ukuba unako. Ukuba kufuneka usebenzise enye, ukuyifumana ngok