A Short Guide to Mastering Strings in Golang

I thought I’d write this article after a friend mentioned that he hadn’t dealt with runes before in Go. After doing a quick search on string manipulation in Go, I noticed that a few tutorials and answers in forums were operating on strings as []byte. It’s at this point realized that Go strings and their relationship to runes and bytes aren’t very intuitive so I thought I’d make an effort to see if I could explain it as compactly as possible (for a proper exploration I recommend reading this https://blog.golang.org/strings). This post assumes that at the very minimum you’ve used string literals (e.g. “this is a string”) and string variables (e.g. firstName := “Chris”).

 Fundamentals of Strings

In my mind, there are two really important fundamentals of strings that need to be understood to mastering strings in Go.

  1. The components of a string.
  2. The slice behaviour of a string.

The Components of a String

The first important rule of strings is that strings are made of up runes (not bytes) and, as such, can be cast to a []rune. A rune is literally just a character, like “A”, “b”,  and “*” but can also be “我” or “私”. It’s important to understand that runes represent a single character and that different languages have different definitions of what constitutes a “character”.

In English a character is a letter from the alphabet but, in the Chinese and Japanese examples I’ve provided, those characters both represent an entire word (in those examples the words mean “me”). As you can imagine, languages like Chinese and Japanese have literally thousands of glyphs to represent the thousands of words in the languages. Obviously thousands of glyphs are not going to fit in a single byte so these characters are actually stored as multiple bytes. This leads to the second important rule which is characters/runes in a string are of variable length!

What I’ve noticed in a number of tutorials and forum answers online is that people have incorrectly been casting strings to []byte slices and then perform formatting operations etc. on the slice of bytes. If you did that with something that contained characters from many different languages, you would actually break the meaning of the sentence! Below is an example where you will actually break a sentence in a different language but it’ll turn out fine in English (you can run it yourself at https://play.golang.org/p/gDejvQbEUL)

package main

import (
    "fmt"
)

func AddSpacesAsBytes(phrase string) string {
    phraseAsBytes := []byte(phrase)
    spacedOutPhrase := make([]byte, 0, len(phraseAsBytes) * 2)
 
    for _, b := range phraseAsBytes {
        spacedOutPhrase = append(spacedOutPhrase, b)
        spacedOutPhrase = append(spacedOutPhrase, []byte(" ")...)
    }
 
    return string(spacedOutPhrase)
}

func AddSpacesAsRunes(phrase string) string {
    phraseAsRunes := []rune(phrase)
    spacedOutPhrase := make([]rune, 0, len(phraseAsRunes) * 2)
 
    for _, b := range phraseAsRunes {
        spacedOutPhrase = append(spacedOutPhrase, b, ' ')
    }
 
    return string(spacedOutPhrase) 
}

func main() {
    englishPhrase := "How are you?"
    fmt.Printf("Bytes - '%v'\n", AddSpacesAsBytes(englishPhrase))
    fmt.Printf("Runes - '%v'\n", AddSpacesAsRunes(englishPhrase))
 
    chinesePhrase := "你好吗"
    fmt.Printf("Bytes - '%v'\n", AddSpacesAsBytes(chinesePhrase))
    fmt.Printf("Runes - '%v'\n", AddSpacesAsRunes(chinesePhrase))
}

 

Output looks like –

Bytes - 'H o w   a r e   y o u ? '
Runes - 'H o w   a r e   y o u ? '
Bytes - '� � � � � � � � � '
Runes - '你 好 吗 '

At this point you may be thinking to yourself “meh, not a big deal, I’m only going to cater to English speaking people anyway”. The problem is, the character standards upon which the runes are defined also provide codes for emoji’s so unless you don’t want to cater to the 92% of the online consumers that use emojis daily, you may want to consider using []rune instead of []byte.

 

Slice Behaviour of a String

Now that you know what a rune is (roughly) and now understand (hopefully) why it’s better to use runes rather than bytes, let’s talk about the relationship between strings and []runes.

The third important rule thing to know is that ranging over a string with a for loop returns runes. For example this (play with it here https://play.golang.org/p/6gfuyV1tLC) –

package main

import (
 "fmt"
 "reflect"
)

func main() {
 phrase := "How are you?"
 
 for _, c := range phrase {
  fmt.Println(reflect.TypeOf(c))
 }
}

Will return you this –

int32
int32
.
.
.

(By the way, a rune is defined in Go as an int32 but I haven’t brought this up as it’s not that important and it’s confusing, you can see it here in the source https://golang.org/src/builtin/builtin.go#L90)

The fact that the string is broken down into individual runes and you can cast it to a []rune slice, you may be mistaken for thinking that a string IS a []rune slice. It’s important to know that this is NOT the case as strings are immutable while a []rune slice is mutable. The consequence of this is that you CAN’T build string manipulation operations like the following (play with this example at https://play.golang.org/p/P9sd21DbAv).

package main

import (
    "fmt"
)

func MakeURLSafe(phrase string) string {
    for i, c := range phrase {
        if c == '\'' || c == ' ' || c == '?' {
            phrase[i] = '-'
        }
    }
 
    return phrase
}

func main() {
    phrase := "How's it going?"
    fmt.Println(MakeURLSafe(phrase))
}

In fact it won’t even compile and you’ll get the following error –

main.go:10: cannot assign to phrase[i]

Even if you did the following it still wouldn’t work (play with this example at https://play.golang.org/p/qQyWZ1ZVlb) –

package main

import (
    "fmt"
)

func MakeURLSafe(phrase string) string {
    for i, c := range phrase {
        if c == '\'' || c == ' ' || c == '?' {
            []rune(phrase)[i] = '-'
        }
    }

    return phrase
}

func main() {
    phrase := "How's it going?"
    fmt.Println(MakeURLSafe(phrase))
}

Your output would be –

How's it going?

The reason is that everytime you use “phrase”, a copy of the string is used not the actual value stored in the “phrase” variable.

In order to get the desired result you would need to do this (play at https://play.golang.org/p/x1ipjNt0Jl) –

package main

import (
 "fmt"
)

func MakeURLSafe(phrase string) string {
 phraseAsRunes := []rune(phrase)
 
 for i, c := range phraseAsRunes {
 if c == '\'' || c == ' ' || c == '?' {
 phraseAsRunes[i] = '-'
 }
 }
 
 return string(phraseAsRunes)
}

func main() {
 phrase := "How's it going?"
 fmt.Println(MakeURLSafe(phrase))
}

Which would give you the desired output of –

How-s-it-going-

Update: A Slight Confusion with len(string)

Special thanks to The_Jare for bringing this up.

So to make the confusion worse, when you use len() on a string it actually brings back the number of bytes. With behaviour like this, it’s really no wonder that a lot of people think that a []byte slice is the natural type for a string. As a result, Go does have a number of stdlib functions in the unicode/utf8 package that help with finding the correct length, rather than using len([]rune(string)), such as RuneCountInString.

Summary

Just to recap what I feel are the 4 important rules to properly understanding strings in Go –

  1. strings are made of up runes (not bytes)
  2. characters/runes in a string are of variable length
  3. ranging over a string with a for loop returns runes
  4. strings are immutable while a []rune slice is mutable

I hope this quick tutorial has given you enough reason (and knowledge) to ditch []byte(“my string”) in favour of []rune(“my string”) in the future and to embrace the magic of runes!

One thought on “A Short Guide to Mastering Strings in Golang

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s