How can I see all the characters in a Unicode category?

advertisements

I've read the documentation and can't find any examples.

http://golang.org/pkg/unicode/#IsPunct

Is there a place in the documentation that explicitly lists all characters in these categories? I'd like to see what characters are contained in category P or category M.


It's not in the documentation, but you can still read the source code. The categories you're talking about are defined in this file: http://golang.org/src/pkg/unicode/tables.go

For example, the P category is defined this way:

2029    var _P = &RangeTable{
2030        R16: []Range16{
2031            {0x0021, 0x0023, 1},
2032            {0x0025, 0x002a, 1},
2033            {0x002c, 0x002f, 1},
2034            {0x003a, 0x003b, 1},
2035            {0x003f, 0x0040, 1},
2036            {0x005b, 0x005d, 1},
2037            {0x005f, 0x007b, 28},
                ...
2141            {0xff5d, 0xff5f, 2},
2142            {0xff60, 0xff65, 1},
2143        },
2144        R32: []Range32{
2145            {0x10100, 0x10102, 1},
2146            {0x1039f, 0x103d0, 49},
2147            {0x10857, 0x1091f, 200},
                ...
2157            {0x12470, 0x12473, 1},
2158        },
2159        LatinOffset: 11,
2160    }

And here is a simple way to print all of them:

var p = unicode.Punct.R16
for _, r := range p {
    for c := r.Lo; c <= r.Hi; c += r.Stride {
        fmt.Print(string(c))
    }
}