Char is:module
The Char type and associated operations.
Warning: this is an internal module, and does not have a stable
API or name. Functions in this module may not check or enforce
preconditions expected by public modules. Use at your own risk!
Fast character manipulation functions.
Commonly used character parsers.
Parsec compatibility module
Commonly used character parsers.
Utilities for working with
KnownChar constraints.
This module is only available on GHC 9.2 or later.
Parsers for character streams
Unicode
Char. Import as:
import qualified RIO.Char as C
This module does not export any partial functions. For those, see
RIO.Char.Partial
Access to the Unicode Character Database, implemented as bindings to
the International Components for Unicode (ICU) libraries.
Unicode assigns each codepoint (not just assigned character) values
for many properties. Most are simple boolean flags, or constants from
a small enumerated list. For some, values are relatively more complex
types.
For more information see "About the Unicode Character Database"
http://www.unicode.org/ucd/ and the ICU User Guide chapter on
Properties
http://icu-project.org/userguide/properties.html.
A collection of character utilities, follows the namings in
Data.Char and is intended to be imported qualified. Also, it is
recommended you use the
OverloadedStrings extension to allow
literal strings to be used as symbolic-strings when working with
symbolic characters and strings.
SChar type only covers all unicode characters, following the
specification in
https://smt-lib.org/theories-UnicodeStrings.shtml. However,
some of the recognizers only support the Latin1 subset, suffixed by
L1. The reason for this is that there is no performant way of
performing these functions for the entire unicode set. As SMTLib's
capabilities increase, we will provide full unicode versions as well.
Agda strings uses Data.Text [1], which can only represent unicode
scalar values [2], excluding the surrogate code points
3. To
allow
primStringFromList to be injective we make sure
character values also exclude surrogate code points, mapping them to
the replacement character
U+FFFD.
See #4999 for more information.
The
Char type has 128 nullary constructors, listed in order
according to each character's 7-bit numeric code.
This module provides APIs to access the Unicode character database
(UCD) corresponding to
Unicode Standard version 15.1.0.
This module re-exports several sub-modules under it. The sub-module
structure under
Char is largely based on the
"Property Index
by Scope of Use" in Unicode® Standard Annex #44.
The
Unicode.Char.* modules in turn depend on
Unicode.Internal.Char.* modules which are programmatically
generated from the Unicode standard's Unicode character database
files. The module structure under
Unicode.Internal.Char is
largely based on the UCD text file names from which the properties are
generated.
For the original UCD files used in this code please refer to the
UCD section on the Unicode standard page. See
https://www.unicode.org/reports/tr44/ to understand the
contents and the format of the unicode database files.
Unicode character parsers. The character classification is identical
to the classification in the
Data.Char module.
Functions for identifying and manipulating character codes.
Manipulate
ByteStrings using
Char operations. All Chars
will be truncated to 8 bits. It can be expected that these functions
will run at identical speeds to their
Word8 equivalents in
Data.ByteString.
More specifically these byte strings are taken to be in the subset of
Unicode covered by code points 0-255. This covers Unicode Basic Latin,
Latin-1 Supplement and C0+C1 Controls.
See:
This module is intended to be imported
qualified, to avoid
name clashes with
Prelude functions. eg.
import qualified Data.ByteString.Char8 as C
The Char8 interface to bytestrings provides an instance of IsString
for the ByteString type, enabling you to use string literals, and have
them implicitly packed to ByteStrings. Use
{-# LANGUAGE
OverloadedStrings #-} to enable this.