Read package:hledger-lib
This is the entry point to hledger's reading system, which can read
Journals from various data formats. Use this module if you want to
parse journal data or read journal files. Generally it should not be
necessary to import modules below this one.
Journal reading
Reading an input file (in journal, csv, timedot, or timeclock
format..) involves these steps:
- select an appropriate file format "reader" based on filename
extensionfile path prefixfunction parameter. A reader contains
a parser and a finaliser (usually journalFinalise).
- run the parser to get a ParsedJournal (this may run additional
sub-parsers to parse included files)
- run the finaliser to get a complete Journal, which passes standard
checks
- if reading multiple files: merge the per-file Journals into one
overall Journal
- if using -s/--strict: run additional strict checks
- if running print --new: save .latest files for each input file.
(import also does this, as its final step.)
Journal merging
Journal implements the Semigroup class, so two Journals can be merged
into one Journal with
j1 <> j2. This is implemented by
the
journalConcat function, whose documentation explains what
merging Journals means exactly.
Journal finalising
This is post-processing done after parsing an input file, such as
inferring missing information, normalising amount styles, checking for
errors and so on - a delicate and influential stage of data
processing. In hledger it is done by
journalFinalise, which
converts a preliminary ParsedJournal to a validated, ready-to-use
Journal. This is called immediately after the parsing of each input
file. It is not called when Journals are merged.
Journal reading API
There are three main Journal-reading functions:
- readJournal to read from a Text value. Selects a reader and calls
its parser and finaliser, then does strict checking if needed.
- readJournalFile to read one file, or stdin if the file path is
-. Uses the file path/file name to help select the reader,
calls readJournal, then writes .latest files if needed.
- readJournalFiles to read multiple files. Calls readJournalFile for
each file (without strict checking or .latest file writing) then
merges the Journals into one, then does strict checking and .latest
file writing at the end if needed.
Each of these also has an easier variant with ' suffix, which uses
default options and has a simpler type signature.
One more variant,
readJournalFilesAndLatestDates, is like
readJournalFiles but exposing the latest transaction date (and how
many on the same day) seen for each file. This is used by the import
command.
A hledger journal reader is a triple of storage format name, a
detector of that format, and a parser from that format to Journal. The
type variable m appears here so that rParserr can hold a journal
parser, which depends on it.
Read a JSON file and decode it to the target type, or raise an error
if we can't. Eg: readJsonFile "a.json" :: IO Transaction
readJournal iopts mfile txt
Read a Journal from some handle, with strict checks if enabled, or
return an error message.
The reader (data format) is chosen based on, in this order:
- a reader name provided in iopts
- a reader prefix in the mfile path
- a file extension in mfile
If none of these is available, or if the reader name is unrecognised,
the journal reader is used.
If a file path is not provided, "-" is assumed (and may appear in
error messages,
files output etc, where it will be a slight
lie: it will mean "not from a file", not necessarily "from standard
input".
An easy version of
readJournal which assumes default options,
and fails in the IO monad.
An even easier version of readJournal' which takes a
Text
instead of a
Handle.
Read a Journal from this file, or from stdin if the file path is -,
with strict checks if enabled, or return an error message. XXX or,
calls error if the file does not exist.
(Note strict checks are disabled temporarily here when this is called
by readJournalFiles). The file path can have a READER: prefix.
The reader (data format) to use is determined from (in priority
order): the mformat_ specified in the input options, if any;
the file path's READER: prefix, if any; a recognised file name
extension. if none of these identify a known reader, the journal
reader is used.
The input options can also configure balance assertion checking,
automated posting generation, a rules file for converting CSV data,
etc.
If using --new, and if latest-file writing is enabled in input
options, and not deferred by readJournalFiles, and after passing
strict checks if enabled, a .latest.FILE file will be created/updated
(for the main file only, not for included files), to remember the
latest transaction date processed.
An easy version of
readJournalFile which assumes default
options, and fails in the IO monad.
Read a Journal from each specified file path (using
readJournalFile) and combine them into one; or return the
first error message.
Combining Journals means concatenating them, basically. The parse
state resets at the start of each file, which means that directives
& aliases do not affect subsequent sibling or parent files. They
do affect included child files though. Also the final parse state
saved in the Journal does span all files.
Strict checks, if enabled, are temporarily deferred until all files
are read, to ensure they see the whole journal, and/or to avoid
redundant work. (Some checks, like assertions and ordereddates, might
still be doing redundant work ?)
Writing .latest files, if enabled, is also deferred till the end, and
is done only if strict checks pass.
An easy version of
readJournalFiles' which assumes default
options, and fails in the IO monad.
Read a Journal from the given CSV data (and filename, used for error
messages), or return an error. Proceed as follows:
- Conversion rules are provided, or they are parsed from the
specified rules file, or from the default rules file for the CSV data
file. If rules parsing fails, or the required rules file does not
exist, throw an error.
- Parse the CSV data using the rules, or throw an error.
- Convert the CSV records to hledger transactions using the
rules.
- Return the transactions as a Journal.
Like readFilePortably, but read from standard input if the path is
"-".
Like readFileOrStdinPortably, but take an optional converter.
Read text from a file, converting any rn line endings to n,, using the
system locale's text encoding, ignoring any utf8 BOM prefix (as seen
in paypal's 2018 CSV, eg) if that encoding is utf8.
Like readFilePortably, but read all of the file before proceeding.
Read text from a handle with a specified encoding, using the encoding
package. Or if no encoding is specified, it uses the handle's current
encoding, after first changing it to UTF-8BOM if it was UTF-8, to
allow a Byte Order Mark at the start. Also it converts Windows line
endings to newlines. If decoding fails, this throws an IOException (or
possibly a UnicodeException or something else from the encoding
package).
Read a decimal number from a Text. Assumes the input consists only of
digit characters.
when this journal was last read from its file(s) NOTE: after adding
new fields, eg involving account names, consider updating the Anon
instance in Hleger.Cli.Anon
Set this journal's last read time, ie when its files were last read.
findReader mformat mpath
Find the reader named by
mformat, if provided. ("ssv" and
"tsv" are recognised as alternate names for the csv reader, which also
handles those formats.) Or, if a file path is provided, find the first
reader that handles its file extension, if any.