This is the first in a series of blogs we’re going to bring to you directly from the trenches, going into some of the nitty-gritty technical detail of some of the things we’re doing with the Protocol at the moment. Today’s article comes from Alex Pinto, a recent addition to the blockchain engineering team who’s been spending the past few weeks getting up to speed on using Solidity, and will take us through some of the challenges and particularities of the language. Aventus Today I give you a post about programming for the Ethereum blockchain using the Solidity language. I won’t follow any plan in doing this: my objective is only to write about my obstacles in learning this language and the practical difficulties I encounter in my daily work. I want the freedom to write about any topic without having first to introduce preliminary material, as I’d have to do if I were writing a textbook. If you notice me talking about things I have not explained before, that is by design. Leave me a comment below and I’ll come back to them in a later post. Basic access Today, I want to talk about strings in Solidity. Solidity is, at first, similar in syntax to Javascript and other C-like languages. Because of that, it is easy for a newcomer with a grounding in one of several common and widespread languages to get a quick grasp of what a Solidity program does. Nevertheless, Solidity is mighty in the proverbial details that hide unforeseen difficulties. That is the case of the type and the related type . string bytes Both of these are dynamic array types, which means that they can store data of an arbitrary size. Each element of a variable of type is, unsurprisingly, a single byte. Each element of a variable of type is a character of the string. So far so good, but the initial looks are deceiving. One who comes from other languages might expect the type to provide several useful functions, like: bytes string string determining the string’s length reading or changing the character at a given location in the string joining two strings extracting part of a string Bad news: Solidity’s does none of this! If we need any of the above, we have to do it manually. string So, let’s explore some of these difficulties and see what we can do about them. I open and type the following code in a new file called . Remix string.sol The right side of the screen, in Remix, is taken by the developer’s area. In the tab, I check the option, so that Remix will notify me of errors and code-analysis warnings as I write my code. The static code-analysis is controlled by the options in the tab , and I usually have all options selected. Compile Auto-Compile Analysis In the current case, Remix will report two warnings of the same kind: the methods I have written can potentially have a high-to-infinite gas cost. I will ignore that in this post. The above contract is very minimal. It defines a state variable of type , a method to set it and a method to get it. Let’s test it. store string In the tab, I hit and if there are no problems with the contract, a new area will appear below that button with the address where the contract is located and the functions that are available. Run Deploy Below the working area, Remix shows a detailed record of the transaction’s result. Initially, it shows only a line indicating the account that deployed the contract, the contract and method that was called, ie , and how much Ether was passed to the execution (initially this is shown in Wei, which is the smallest unit of Ether, corresponding to 10^-18 Ether). We can expand it by clicking over the header, revealing logs, execution and transaction costs, available gas, final result, etc. String.(constructor) At this point, I just want to press the button on the right, and notice how that shows beneath it the result: getStore Likewise, there is a new transaction log on the left and by clicking it we can see: in the decoded output. All is well. Now, I type “0123456789” in the textbox to the right of and hit that button. Then I call again and I receive that string. Thumbs up, we can do basic storage/retrieval with strings! setStore getStore Let’s now go for more interesting things. Creating new strings: data location So far, I have accessed a literal string and we have seen how we can change it by assigning to it. But that is only a very coarse way of dealing with strings. Let us create a string character by character. This will introduce us to one peculiarity of Solidity programming: data location. I create a new method that only returns a new string with three specific characters: “Abc”. This is a well-intentioned effort, but does not work. Remix is kind enough to immediately point 4 errors and 1 warning: Two of these are on the same line: string newString = new string(3); Warning: Variable is declared as a storage pointer. Use an explicit “storage” keyword to silence this warning. TypeError: Type string memory is not implicitly convertible to expected type string storage pointer The other three occur in the following lines, eg and are all of the same type: newString[0] = "A"; TypeError: Index access for string is not possible. To understand the first issue, I have to tell you about data location. Writing to the blockchain is very expensive. Every node that runs the transaction has to do the same writing, which makes the transaction more expensive and the blockchain bigger. When a node downloads a block containing this transaction, it will incur larger storage costs because of this writing. In Ethereum, every transaction has an associated cost, called , to incentivise programmers to be as economic as possible. gas When writing a contract, authors have a choice of what kind of data to use: is cheap (i.e. it costs relatively low gas, but the data are volatile and lost after a function finishes executing); is the most expensive (and is absolutely needed for contract state, which must persist from function call to function call); there is also a location (that corresponds to the values in the stack frame of a function that is executing). This is the cheapest location to use, but it has a limited size. In particular, that means that functions may be limited in their number of arguments. memory storage calldata Every data type has a default location. This is from the Solidity : documentation Forced data location: -parameters (not return) of external functions: calldata -state variables: storage Default data location: -parameters (also return) of functions: memory -all other local variables: storage Notice the subtlety: function parameters are by default stored in memory, except if the function is external, in which case they will be stored in the stack (ie calldata). This means that a function that is perfectly alright when can suddenly have too many argumen_ts_ when made . public external Now, let’s come back to our code and examine the line string newString = new string(3); This is a local variable inside the function, and so by default it is in storage. The keyword is used to specify the initial size of a . Memory arrays cannot be resized. On the other hand, we can change the size of a by changing its property, but can’t use with them. new memory dynamic array storage dynamic array length new This is the source of our error. In this case, all we want to do with this string is create it and return it to the outside. Let the outside world decide what to do with it, and whether it is temporary only or important enough to persist on the blockchain. In this example, the storage is not important, and the string will be created in memory. To do that, we add the keyword in the declaration, like this: memory string memory newString = new string(3); Direct access to strings: equivalence with bytes Let’s see the second sort of errors now. This is simple and unavoidable: . From the : Solidity does not currently allow index access to strings FAQ _string_ is basically identical to _bytes_ only that it is assumed to hold the UTF-8 encoding of a real string. Since _string_ stores the data in UTF-8 encoding it is quite expensive to compute the number of characters in the string (the encoding of some characters takes more than a single byte). Because of that, _string s; s.length;_ is not yet supported and not even index access _s[2]_ The alternative is to first transform the string into bytes, and then access it directly. This works because an array type, albeit with some restrictions. string is But there is a trap to watch out for. stores raw data; stores UTF-8 characters. The following code does return the number of characters in : bytes string not always _s The problem here occurs if contains any character that takes more than 1 byte to represent in UTF. In that case, the function returns the length of the input string, and will be more than the number of characters. _s of the byte representation This has also an impact when trying to address a particular character of the string, as we cannot predict at which location the character’s bytes will be. We have to parse the string linearly identifying any multi-byte character, or else make sure we restrict our input to characters of fixed length. If we work exclusively with ASCII strings, for example, we’ll be safe. Returning to our previous function, this works: But for example, the following code which tries to set the third character of a string to X, will fail when it receives multi-byte characters. This returns “AbXdef” for an input of “Abcdef”, but returns “XbÁnç!” for an input of “€bÁnç!” Conclusion There are still many more things that can be said about this topic, but this is a long enough post already, so I’ll wrap up. The key concept regarding the type is that this is an array of UTF-8 characters, and can be seamlessly converted to . This is the only way of manipulating the string at all. But it is important to note that UTF-8 characters do not exactly match bytes. The conversion in either direction will be accurate, but there is not an immediate relation between each byte index and the corresponding string index.For most things, there may be an advantage in representing the string directly as the type (avoiding conversions) and be very careful when using characters that are encoded in UTF by more than one byte. string bytes bytes That’s enough for now. See you another day, with more steps in this coding adventure. About the Author Alex is a software engineer at Aventus, working on the blockchain engineering team. He has 20 years of experience working in technology, completing a PhD in Computer Science as well as a post-doctorate in Cryptography. As part of his research, Alex on Kolmogorov Complexity, Cryptography, Database Anonymization and Code Obfuscation. has published papers Alex also spent seven years lecturing at the University Institute of Maia, including directing the degree programmes for BSc Computer Science and Information Systems and Software. This article was originally posted on . his blog About Aventus is a blockchain-based protocol that delivers increased trust, security and control for the live-event ticketing industry, practically eliminating counterfeit tickets and unfair scalping. Organisers can create, manage and promote their events and associated tickets, dramatically reduce platform costs, and significantly influence secondary markets. Aventus For more information, visit Aventus.io and follow Aventus on Twitter, Telegram and Reddit .

Working with Strings in Solidity

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Alex’s notes on ZK-STARKs

(1/100) Crypto Countdown: Golem

A Research Report on the Trader $JOE DeFi Platform

07/03/2018: Biggest Stories in the Cryptosphere

05/02/2018: Biggest Stories in the Cryptosphere

01/06/2018: Biggest Stories in the Cryptosphere

Alex’s notes on ZK-STARKs

(1/100) Crypto Countdown: Golem

A Research Report on the Trader $JOE DeFi Platform

07/03/2018: Biggest Stories in the Cryptosphere

05/02/2018: Biggest Stories in the Cryptosphere

01/06/2018: Biggest Stories in the Cryptosphere

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps