• Edited

Ooper Also what settings do your mysql tables use for charset and collation(this is what node-ustats-lite and node-utstats-2 use):
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Thanks for the info, I was reading a few things mentioning utf8mb4 for mysql where others were having similar problems. Not sure yet if that is the problem but that will be the first thing I check and/or try when I work on it again tomorrow. I probably naively set the charset and collation to utf8 instead.

Looks like it is set to utf8mb4 already. The engine for utstats tables are MyISAM but I don't think that makes a difference when it comes to encoding. Will look into possibly converting the logs to utf16 next.

+--------------------+---------+--------------------+
| database           | charset | collation          |
+--------------------+---------+--------------------+
| information_schema | utf8    | utf8_general_ci    |
| performance_schema | utf8    | utf8_general_ci    |
| mysql              | utf8mb4 | utf8mb4_general_ci |
| utstats            | utf8mb4 | utf8mb4_general_ci |
| flarum             | utf8mb4 | utf8mb4_general_ci |
+--------------------+---------+--------------------+
5 rows in set (0.001 sec)

Just updating the thread with some info I've found in case anyone cares. Finally checked out Engine.StatLogFile.uc from v469d where I found the following relevant stuff:

// OldUnreal File Encoding support
enum EFileEncoding
{
	// Plain ANSI single-byte encoding
	FILE_ENCODING_ANSI,
	// Windows UTF-16LE encoding without a BOM
	FILE_ENCODING_UTF16LE,
	// Windows UTF-16 encoding with a BOM
	FILE_ENCODING_UTF16LE_BOM,
	// UTF-8 encoding without a BOM
	FILE_ENCODING_UTF8,
	// UTF-8 encoding with a BOM
	FILE_ENCODING_UTF8_BOM
};

var EFileEncoding Encoding; // must be set before calling OpenLog
...
...
...
defaultproperties
{
	Encoding=FILE_ENCODING_UTF16LE
	StatLogFile="../Logs/unreal.ngStats.Unknown.log"
}

So by default it should indeed be UTF16LE encoding. Also, found a pretty good write up on all the encoding stuff:
https://kunststube.net/encoding/

A few different options seem clear for a solution now. Just needs a bit of work and testing to get it sorted out hopefully.

I'm a PHP Developer, but I've no idea if this can help on your case.
you can convert to UTF16LE with:

$str = mb_convert_encoding($str, "utf-8", "UTF-16LE");

Issue fixed finally. Stats are back to importing automatically every few minutes.

Thanks for all the help and discussion guys. I verified that the logs, and the special char that broke the importing, were actually encoded in UTF-16LE. My editor was not detecting the correct encoding.

mb_convert_encoding did the trick 🍻

25 days later

I think it's happening again ⚠️

    sus

    You are right and I didn't see your post until now so the forum broke again 🤣 I have stopped the importing again for now. Will look into the issue first thing tomorrow.

    12 days later

    Maybe it's further up in the text and I just missed it, sorry.

    To avoid multiple loading, you could set a primary key for the target table that is derived from one of the natural attributes (e.g., the date the map was started). If this isn't feasible with a primary key, a unique key (UK) can also help. It might be possible to resolve the UK, which includes multiple columns, but this would make the inserts longer.

    Alternatively, you can also pursue the following solution strategy: You don't load the data immediately into the actual target table, but rather into a staging table. You can then perform the inserts row by row using a cursor. If the UK matches (see above), an existing record is rejected, and the remaining ones are entered into the target table.

    If you are interested in the rejected data, you can simply write the data to an error table in the event of an error using this procedure.

    Depending on how results are calculated (the ranks mentioned above), there are also workarounds: either you calculate only a subset of the ranking in a suitable loop (e.g., only player names beginning with "A," then "B," etc.).

    Alternatively, you have to look at the query to determine the ranking. Nested full table scans are expensive. Here, you could speed up the query by using indexes on the appropriate columns.

    If all else fails, you can consider rewriting the ranking calculation (query rewrite) or storing intermediate results in a temporary table. If the database starts swapping data with large amounts of data, the query quickly becomes imperformant.

    I took a look at the code, and yes, to say the least, recalculating player stats is terrible. Can you please run the SQL? It will show you how many players statistics should be generated for.

    SELECT count(*) as cnt FROM
    (
    SELECT 	
      p.id, 
    	p.matchid, 
    	p.pid, 
    	p.gid,
    	m.gamename
    FROM uts_player p, 
    		 uts_pinfo pi,
    		 uts_match m
    WHERE 	pi.id = p.pid 
    	AND 	pi.banned <> 'Y' 
    	AND	  m.id = p.matchid
    -- ORDER BY p.matchid ASC, p.playerid ASC
    )

    PS: I don't understand why the original select contains an ORDER BY statement. It shouldn't really matter which player or match the ranking is based on. But perhaps there's a deeper meaning to it that I'm not currently seeing.

    A comprehensive ranking calculation is then created for each result row. You can see this in the while loop in admin/recalcranking.php:

      while( ...) {
         ...
         
         include('import/import_ranking.php');
      }

    WOW! I can well imagine this getting out of hand.

    PS: The include within the while loop is terrible, by the way. I think this whole thing could be solved today with the PHP option of snippets, no, it's called traits.

    Suggestion: You could probably speed up the ranking by restricting the ranking to "active" players. For example, by only considering matches from the last 90, 180, or 365 days. It would then look something like this:

    SELECT 	
      p.id, 
    	p.matchid, 
    	p.pid, 
    	p.gid,
    	m.gamename
    FROM uts_player p, 
    		 uts_pinfo pi,
    		 uts_match m
    WHERE 	pi.id = p.pid 
    	AND 	pi.banned <> 'Y' 
    	AND	  m.id = p.matchid
    	AND   m.time < _______         /*   <=== NEW!  */
    ORDER BY p.matchid ASC, p.playerid ASC

    You'll have to take a look at what the "time" column looks like in the database (it's defined as varchar(14), so you don't specify time values!!).

    Other aspects that caught my attention:
    The colleagues who wrote the tool are familiar with UT and statistics, as you can see. But they have no idea about databases.

    • In admin/check.php, most CREATE TABLE scripts are coded with "CREATE TABLE uts...". Only one table has the suffix "CREATE TABLE IF NOT EXIST uts....", which would be the safer option.

    • Every column in every table is defined with "NOT NULL." Why? Presumably because of the subsequent HTML output?!

    • Numeric data types are usually initialized with an incorrect default (... DEFAULT '0')

    • More or less everything in the database is processed using single-record processing. Set-based operations, especially for INSERTS, UPDATES, etc., would be much more efficient.

    [Okay, that's high-level criticism. The guys did a great job 🙂]

    Otherwise: I hope your admin key is secure. The site is easily accessible. Could you guess the key using a dictionary?

    I haven't checked whether it's an Apache or something else server, but maybe you (@snowguy) are securing the admin pages with an http login (htpasswd)?

      5 days later

      Titan

      Thanks for checking into it and providing your insights.

      Titan Can you please run the SQL?

      MariaDB [utstats]> SELECT count(*) FROM (SELECT p.id, p.matchid, p.pid, p.gid, m.gamename FROM uts_player p, uts_pinfo pi, uts_match m WHERE pi.id = p.pid AND pi.banned <> 'Y' AND m.id = p.matchid) AS cnt;
      +----------+
      | count(*) |
      +----------+
      |   973492 |
      +----------+
      1 row in set (1.512 sec)
      

      And just to help with a visual of what the first query returns: (Edited for brevity obviously)

      MariaDB [utstats]> SELECT p.id, p.matchid, p.pid, p.gid, m.gamename FROM uts_player p, uts_pinfo pi, uts_match m WHERE pi.id = p.pid AND pi.banned <> 'Y' AND m.id = p.matchid;
      
      | 1156121 |   88977 | 28807 |   9 | Capture the Flag |
      | 1156109 |   88976 | 28807 |   9 | Capture the Flag |
      | 1156097 |   88975 | 28807 |   9 | Capture the Flag |
      | 1156138 |   88978 | 28808 |   9 | Capture the Flag |
      | 1156126 |   88977 | 28808 |   9 | Capture the Flag |
      | 1156114 |   88976 | 28808 |   9 | Capture the Flag |
      | 1156102 |   88975 | 28808 |   9 | Capture the Flag |
      | 1156139 |   88978 | 28809 |   9 | Capture the Flag |
      | 1156127 |   88977 | 28809 |   9 | Capture the Flag |
      | 1156115 |   88976 | 28809 |   9 | Capture the Flag |
      | 1156103 |   88975 | 28809 |   9 | Capture the Flag |
      | 1156141 |   88978 | 28810 |   9 | Capture the Flag |
      | 1156129 |   88977 | 28810 |   9 | Capture the Flag |
      | 1156117 |   88976 | 28810 |   9 | Capture the Flag |
      | 1156105 |   88975 | 28810 |   9 | Capture the Flag |
      | 1156130 |   88977 | 28811 |   9 | Capture the Flag |
      | 1156142 |   88978 | 28811 |   9 | Capture the Flag |
      | 1156118 |   88976 | 28811 |   9 | Capture the Flag |
      | 1156106 |   88975 | 28811 |   9 | Capture the Flag |
      +---------+---------+-------+-----+------------------+
      973492 rows in set (3.240 sec)
      

      And with the Ordered By:

      MariaDB [utstats]> SELECT p.id, p.matchid, p.pid, p.gid, m.gamename FROM uts_player p, uts_pinfo pi, uts_match m WHERE pi.id = p.pid AND pi.banned <> 'Y' AND m.id = p.matchid ORDER BY p.matchid ASC, p.playerid ASC;
      
      | 1156125 |   88977 |  7553 |   9 | Capture the Flag |
      | 1156126 |   88977 | 28808 |   9 | Capture the Flag |
      | 1156127 |   88977 | 28809 |   9 | Capture the Flag |
      | 1156128 |   88977 |   266 |   9 | Capture the Flag |
      | 1156129 |   88977 | 28810 |   9 | Capture the Flag |
      | 1156130 |   88977 | 28811 |   9 | Capture the Flag |
      | 1156131 |   88977 |   197 |   9 | Capture the Flag |
      | 1156132 |   88978 |  1739 |   9 | Capture the Flag |
      | 1156133 |   88978 | 28807 |   9 | Capture the Flag |
      | 1156134 |   88978 |   175 |   9 | Capture the Flag |
      | 1156135 |   88978 |  1359 |   9 | Capture the Flag |
      | 1156136 |   88978 | 27511 |   9 | Capture the Flag |
      | 1156137 |   88978 |  7553 |   9 | Capture the Flag |
      | 1156138 |   88978 | 28808 |   9 | Capture the Flag |
      | 1156139 |   88978 | 28809 |   9 | Capture the Flag |
      | 1156140 |   88978 |   266 |   9 | Capture the Flag |
      | 1156141 |   88978 | 28810 |   9 | Capture the Flag |
      | 1156142 |   88978 | 28811 |   9 | Capture the Flag |
      | 1156143 |   88978 |   197 |   9 | Capture the Flag |
      +---------+---------+-------+-----+------------------+
      973492 rows in set (4.035 sec)
      

      Titan PS: I don't understand why the original select contains an ORDER BY statement. It shouldn't really matter which player or match the ranking is based on. But perhaps there's a deeper meaning to it that I'm not currently seeing.

      Not sure that the ordered by is needed either.

      Titan PS: The include within the while loop is terrible, by the way. I think this whole thing could be solved today with the PHP option of snippets, no, it's called traits.

      Good tip, I wasn't really aware that was being done or the implications of it.

      Titan Suggestion: You could probably speed up the ranking by restricting the ranking to "active" players. For example, by only considering matches from the last 90, 180, or 365 days. It would then look something like this:

      Sounds like a solid idea.

      Titan Otherwise: I hope your admin key is secure. The site is easily accessible. Could you guess the key using a dictionary?

      I use a very strong random password. Quantum computers would need to advance quite a bit to crack it in any sane time span 🙂

      Titan I haven't checked whether it's an Apache or something else server, but maybe you (@snowguy) are securing the admin pages with an http login (htpasswd)?

      It's nginx, but that's a good idea. I'll look into adding additional restrictions to the admin area.

      Write a Reply...