• Edited

Not tested it(have to install old php), should save having to delete one by one, luckily utstats doesn't cache totals so there is no need to recalculate those totals. I've not touched PHP in years will have a play around later to see if it works or not and it not just be pseudo code.

SELECT id,time,COUNT(*) as total_matches FROM uts_match GROUP BY time HAVING total_matches > 1;

use the dates returned from the query above

SELECT id FROM uts_match WHERE time IN (...)

use the match ids returned from the query above in the ones below

DELETE FROM uts_events WHERE matchid IN (...);
DELETE FROM uts_killsmatrix WHERE matchid IN (...);
DELETE FROM uts_match WHERE id IN (...);
DELETE FROM uts_player WHERE matchid IN (...);
DELETE FROM uts_weaponstats WHERE matchid IN (...);
DELETE FROM uts_chartdata WHERE mid IN (...);

Use the admin recalculate rankings to fix the rankings```
5 days later

I made the script to delete the duplicate matches and that works well. Unfortunately, when I run recalculate rankings it just runs the process for hours at full cpu usage. On my local test server, the recalc rankings seemed to stop running when the browser timed out. I removed the link for the totals page last year sometime because that was also timing out in the browser before it could be fully processed and the web crawler bots were hitting it regularly too. Lots of stuff can be done to fix all of this but it will take some work and time.

It just seems like 80k matches is probably too much to hold onto and expect sql queries to complete in a timely manner. I'm considering separating the stats by years and for the time being just emptying the database and starting it fresh for 2025. I still have all the old matches and a backup of the database so the old stats won't be lost. Just need to decide what I'm going to do.

The quickest way to get the stats fully functional again is to start fresh with only the matches from 2025. Is that what those of you who care about stats would want?

    I'm not going to speak for everyone but I generally look to see my accuracy of my last few games. So getting the 2025 stats would be great.

    snowguy Is that what those of you who care about stats would want?

    Why not delete it all together, who cares about stats anyway? 🙄 Saves you the trouble and time fixing it....

    • Edited

    Rankings are pretty meaningless with the way the combo server is, if you only play normal ctf maps you will be nowhere, play joust, andaction, woot, you will get a lot more ranking points.

    I'm guessing ranking is only calculated for the players in the current imported log, you could just empty the rankings table and then just have rankings for more recent matches without losing years of data or waiting hours for the recalculate rankings(rankings would just be current active players more or less, if older players come back their rankings would return as well).

    edit

    looking at the recalculating code it's not very efficient, but it is 20ish year old code so some stuff may have not existed back then.

      Ooper Rankings are pretty meaningless with the way the combo server is, if you only play normal ctf maps you will be nowhere, play joust, andaction, woot, you will get a lot more ranking points.

      Agreed, and still all the aliases make them even worse.

      Ooper I'm guessing ranking is only calculated for the players in the current imported log, you could just empty the rankings table and then just have rankings for more recent matches without losing years of data or waiting hours for the recalculate rankings(rankings would just be current active players more or less, if older players come back their rankings would return as well).

      Good idea for a short term solution. I will give it a shot. Eventually, something else will need to be done like separate DBs per year or some such as some functions are broken with this many matches/players in the DB and it will only get worse in time.

      The totals page takes way too long to load, more than 90 seconds IIRC. I could increase the web server's max timeout for this but waiting 2 or so minutes for the totals page to load doesn't seem practical. Pretty sure there are inbuilt timeouts in browsers too if they aren't actively receiving data from the site during processing.

      Even deleting a single match in the admin section is broken because it locks up my browser before the drop down list can fully populate with matches. I'd expect there are other problems also, haven't really tested all the functions.

      Ooper looking at the recalculating code it's not very efficient, but it is 20ish year old code so some stuff may have not existed back then.

      Yeah, can't really blame the old design or implementation. It is quite the endeavor to put something like this together (as you most certainly know), even with today's tools, let alone 20 years ago. Maybe if you had a paid QA team they would test for performance in the worst case. As a hobby project like this one, you would never get anywhere if you tried to be perfect and optimize everything from the onset. This is the current state of the tables ,just for reference, to help judge the complexity of the queries:

      Well, all was going well until another special character stopped the import from happening. Tired of messing with it today. Will look at it again when I get some more time.

      If you made it on the rankings from the 2 new matches that did import, enjoy your rank while you can 🙂

      From what I could tell utstats calculates all the totals and stuff per page load making it slow(|uk| stats home page takes ages to load and that's just basic server totals), that's one of the main reasons I calculate totals on import and have tables with the cached player,gametype,map data so it's done once per import rather then every time someone views certain pages. It does have the downside of making things a bit more complicated when you just want to delete a single match or merging players compared to utstats.

      Not sure what encoding utstats uses for the log importing but UTF16 is what I ended up using.

        • Edited

        Ooper

        The server is running v469d on Linux and the logs seem to be UTF-8 now, thankfully. I distinctly remember when I set everything up here that I made sure everything was set to use UTF-8: ut-server log files -> transfer to web server -> php import/read -> mysql connection -> mysql database storage -> php query connection -> php output/write -> browser. I was doing an extra processing step to make the log files UTF-8 compatible back then but now I am skipping that step.

        I see in the release notes of v469c they fixed the log file encoding:

        Added custom file encoding support to Engine.StatLog. This new feature allows UnrealScript mods to write files in plain ANSI, UTF-8, or UTF-16LE. Please refer to the Engine.StatLog source code for instructions

        I should probably read through the source code in Engine.StatLog just for my info. I don't have a dev environment set up at the moment higher than v469b. Will need to patch and extract headers when I'm at that computer. Do you know if anyone has put the .uc files on github or somewhere else to quickly browse?

        Anyway, a character from one of the player names is causing issues with either the encoding of the file and/or the database accepting it. My editor identifies the file as a different encoding with that character present and the database fails to accept the string. I will strip the IPs from the log so you can test with nodestats2 if you want.

        • Edited

        I convert all logs to utf16 during import without modifying the original log, there where issues with logs that have been imported by utstats looking like they were mandarin, and doing that fixed it.

        Also what settings do your mysql tables use for charset and collation(this is what node-ustats-lite and node-utstats-2 use):
        ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

          • Edited

          Ooper Also what settings do your mysql tables use for charset and collation(this is what node-ustats-lite and node-utstats-2 use):
          ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

          Thanks for the info, I was reading a few things mentioning utf8mb4 for mysql where others were having similar problems. Not sure yet if that is the problem but that will be the first thing I check and/or try when I work on it again tomorrow. I probably naively set the charset and collation to utf8 instead.

          Looks like it is set to utf8mb4 already. The engine for utstats tables are MyISAM but I don't think that makes a difference when it comes to encoding. Will look into possibly converting the logs to utf16 next.

          +--------------------+---------+--------------------+
          | database           | charset | collation          |
          +--------------------+---------+--------------------+
          | information_schema | utf8    | utf8_general_ci    |
          | performance_schema | utf8    | utf8_general_ci    |
          | mysql              | utf8mb4 | utf8mb4_general_ci |
          | utstats            | utf8mb4 | utf8mb4_general_ci |
          | flarum             | utf8mb4 | utf8mb4_general_ci |
          +--------------------+---------+--------------------+
          5 rows in set (0.001 sec)
          

          Just updating the thread with some info I've found in case anyone cares. Finally checked out Engine.StatLogFile.uc from v469d where I found the following relevant stuff:

          // OldUnreal File Encoding support
          enum EFileEncoding
          {
          	// Plain ANSI single-byte encoding
          	FILE_ENCODING_ANSI,
          	// Windows UTF-16LE encoding without a BOM
          	FILE_ENCODING_UTF16LE,
          	// Windows UTF-16 encoding with a BOM
          	FILE_ENCODING_UTF16LE_BOM,
          	// UTF-8 encoding without a BOM
          	FILE_ENCODING_UTF8,
          	// UTF-8 encoding with a BOM
          	FILE_ENCODING_UTF8_BOM
          };
          
          var EFileEncoding Encoding; // must be set before calling OpenLog
          ...
          ...
          ...
          defaultproperties
          {
          	Encoding=FILE_ENCODING_UTF16LE
          	StatLogFile="../Logs/unreal.ngStats.Unknown.log"
          }
          

          So by default it should indeed be UTF16LE encoding. Also, found a pretty good write up on all the encoding stuff:
          https://kunststube.net/encoding/

          A few different options seem clear for a solution now. Just needs a bit of work and testing to get it sorted out hopefully.

          I'm a PHP Developer, but I've no idea if this can help on your case.
          you can convert to UTF16LE with:

          $str = mb_convert_encoding($str, "utf-8", "UTF-16LE");

          Issue fixed finally. Stats are back to importing automatically every few minutes.

          Thanks for all the help and discussion guys. I verified that the logs, and the special char that broke the importing, were actually encoded in UTF-16LE. My editor was not detecting the correct encoding.

          mb_convert_encoding did the trick 🍻

          25 days later

          I think it's happening again ⚠️

            sus

            You are right and I didn't see your post until now so the forum broke again 🤣 I have stopped the importing again for now. Will look into the issue first thing tomorrow.

            12 days later

            Maybe it's further up in the text and I just missed it, sorry.

            To avoid multiple loading, you could set a primary key for the target table that is derived from one of the natural attributes (e.g., the date the map was started). If this isn't feasible with a primary key, a unique key (UK) can also help. It might be possible to resolve the UK, which includes multiple columns, but this would make the inserts longer.

            Alternatively, you can also pursue the following solution strategy: You don't load the data immediately into the actual target table, but rather into a staging table. You can then perform the inserts row by row using a cursor. If the UK matches (see above), an existing record is rejected, and the remaining ones are entered into the target table.

            If you are interested in the rejected data, you can simply write the data to an error table in the event of an error using this procedure.

            Depending on how results are calculated (the ranks mentioned above), there are also workarounds: either you calculate only a subset of the ranking in a suitable loop (e.g., only player names beginning with "A," then "B," etc.).

            Alternatively, you have to look at the query to determine the ranking. Nested full table scans are expensive. Here, you could speed up the query by using indexes on the appropriate columns.

            If all else fails, you can consider rewriting the ranking calculation (query rewrite) or storing intermediate results in a temporary table. If the database starts swapping data with large amounts of data, the query quickly becomes imperformant.

            I took a look at the code, and yes, to say the least, recalculating player stats is terrible. Can you please run the SQL? It will show you how many players statistics should be generated for.

            SELECT count(*) as cnt FROM
            (
            SELECT 	
              p.id, 
            	p.matchid, 
            	p.pid, 
            	p.gid,
            	m.gamename
            FROM uts_player p, 
            		 uts_pinfo pi,
            		 uts_match m
            WHERE 	pi.id = p.pid 
            	AND 	pi.banned <> 'Y' 
            	AND	  m.id = p.matchid
            -- ORDER BY p.matchid ASC, p.playerid ASC
            )

            PS: I don't understand why the original select contains an ORDER BY statement. It shouldn't really matter which player or match the ranking is based on. But perhaps there's a deeper meaning to it that I'm not currently seeing.

            A comprehensive ranking calculation is then created for each result row. You can see this in the while loop in admin/recalcranking.php:

              while( ...) {
                 ...
                 
                 include('import/import_ranking.php');
              }

            WOW! I can well imagine this getting out of hand.

            PS: The include within the while loop is terrible, by the way. I think this whole thing could be solved today with the PHP option of snippets, no, it's called traits.

            Suggestion: You could probably speed up the ranking by restricting the ranking to "active" players. For example, by only considering matches from the last 90, 180, or 365 days. It would then look something like this:

            SELECT 	
              p.id, 
            	p.matchid, 
            	p.pid, 
            	p.gid,
            	m.gamename
            FROM uts_player p, 
            		 uts_pinfo pi,
            		 uts_match m
            WHERE 	pi.id = p.pid 
            	AND 	pi.banned <> 'Y' 
            	AND	  m.id = p.matchid
            	AND   m.time < _______         /*   <=== NEW!  */
            ORDER BY p.matchid ASC, p.playerid ASC

            You'll have to take a look at what the "time" column looks like in the database (it's defined as varchar(14), so you don't specify time values!!).

            Other aspects that caught my attention:
            The colleagues who wrote the tool are familiar with UT and statistics, as you can see. But they have no idea about databases.

            • In admin/check.php, most CREATE TABLE scripts are coded with "CREATE TABLE uts...". Only one table has the suffix "CREATE TABLE IF NOT EXIST uts....", which would be the safer option.

            • Every column in every table is defined with "NOT NULL." Why? Presumably because of the subsequent HTML output?!

            • Numeric data types are usually initialized with an incorrect default (... DEFAULT '0')

            • More or less everything in the database is processed using single-record processing. Set-based operations, especially for INSERTS, UPDATES, etc., would be much more efficient.

            [Okay, that's high-level criticism. The guys did a great job 🙂]

            Otherwise: I hope your admin key is secure. The site is easily accessible. Could you guess the key using a dictionary?

            I haven't checked whether it's an Apache or something else server, but maybe you (@snowguy) are securing the admin pages with an http login (htpasswd)?

              Write a Reply...