It is actually almost instant so I had to add a small delay to make it more "human-like"

It's quite fast because a neural network translates the input string into a fixed size vector which is compared with an existing database of vectors of other pre-translated semantic patterns. This allows me to cover many different wordings of the same commands (e.g. the vectors for "Repair the ship" and "Fix the damage please" are quite similar). When finding a similar pattern, I can simply return its recorded response or call its attached action.
The database currently contains around 10k pairs of patterns and responses / actions. But even with 1M of patterns there shouldn't be any problem with the performance since it is a simple cosine similarity calculation
