From 4c9d411ac880eb23a89535b015327b90f9e03bb1 Mon Sep 17 00:00:00 2001
From: qykth-git <14939671+qykth-git@users.noreply.github.com>
Date: Wed, 11 Aug 2021 17:10:23 +0900
Subject: [PATCH] UTF- 8/ANSI input support for Unix (#60)

* Use UTF-8 for multi-byte character I/O everywhare

This fix is required for "libstdc++".
Without this fix, "libstdc++" only works for 7bit ASCII.

It fixes https://github.com/microsoft/pict/issues/24 .

* Use current locale instead of fixed UTF-8 locale

It is good for non-UTF-8 (ANSI/OEM) encoding input.
Use environment variable "LANG" or "LC_CTYPE" to change input encoding.

(example)
$ LC_CTYPE="C.UTF-8" pict utf8-input.txt

* Add document for non-ASCII (UTF-8/ANSI) input encoding
---
 cli/pict.cpp | 4 ++++
 doc/pict.md  | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/cli/pict.cpp b/cli/pict.cpp
index 4a5654c..d1b0770 100644
--- a/cli/pict.cpp
+++ b/cli/pict.cpp
@@ -5,6 +5,7 @@
 
 #include <ctime>
 #include <cstring>
+#include <locale>
 using namespace std;
 
 #include "cmdline.h"
@@ -128,6 +129,9 @@ int main
     IN char* args[]
     )
 {
+    // Use current locale name for multi-byte character I/O everywhare
+    std::locale::global(std::locale(""));
+
     // convert all args to wchar_t's
     wchar_t** wargs = new wchar_t*[ argc ];
     for ( int ii = 0; ii < argc; ++ii )
diff --git a/doc/pict.md b/doc/pict.md
index 97de18a..7edb6d4 100644
--- a/doc/pict.md
+++ b/doc/pict.md
@@ -66,6 +66,12 @@ A comma is the default separator but you can specify a different one using **/d*
 
 By default, PICT generates a pair-wise test suite (all pairs covered), but the order can be set by option **/o** to a value larger than two. For example, if **/o:3** is specified, the test suite will cover all triplets of values thereby producing a larger number of tests but potentially making the test suite even more effective. The maximum order for a simple model is equal to the number of parameters, which will result in an exhaustive test suite. Following the same principle, specifying **/o:1** will produce a test suite that merely covers all values (combinations of 1).
 
+Note: On Unix/Linux environment, input/output file encoding depends on your locale settings. For example, if you want to use UTF-8 encoding, set environment variable **LANG** or **LC_CTYPE** to UTF-8 capable locale like **C.UTF-8**.
+
+Example:
+
+    $ LC_CTYPE="C.UTF-8" pict utf8-input.txt
+
 ## Output Format
 
 All errors, warning messages, and the randomization seed are printed to the error stream. The test cases are printed to the standard output stream. The first line of the output contains names of the parameters. Each of the following lines represents one generated test case. Values in each line are separated by a tab. This way redirecting the output to a file creates a tab-separated value format.
-- 
GitLab