0 votes

Hello, I'm writing a script that will allow strings to refer properly to a custom player character or something of that nature (chiefly, implementing pronouns and the grammatical rules to go along with them). Right now the issue I'm having trouble with is how I would go about dynamically capitalizing {placeholder}s in accordance with their context within a string (the most obvious example being when they appear at the beginning of a sentence or string).

Oversimplified example:

var testcase = "Susie is a {species}. {species}s have soft fur and four legs."
print(testcase.format({"species" : "cat"})

naïve output: Susie is a cat. cats have soft fur and four legs.

By now I've designed a function which in its current state looks like this:

assert(typeof(a) == TYPE_STRING)
var i : int = 0 # Character pointer

var s = ""
var t = ""
var skip = []
var res = a 
while i < res.find_last("{"):
    var cap = false

    # Locate and store the placeholder
    i = res.find("{",0)
    while i in skip:
        i = res.find("{",i+1)
    s = res.substr(i, (res.find("}",i+1) - i) + 1)

    match s:
        "{NAME}": t = pcname # yes this is against style; i'm not bothered
        "{THEY}" : t = set[prns.SUB] # "set" is the selected array containing the pronouns themselves
        "{THEM}" : t = set[prns.OBJ] #  and "prns" is an enum defining the indices of each class of pronoun.
        "{THEIR}" : t = set[prns.POS] # This structure can present its own problems but it's what I have right now
        "{THEIRS}" : t = set[prns.IPOS] # Placeholder name differs from abbreviated technical name of pronoun
        "{THEYRE}" : t = set[prns.PRES] # so as to be more natural to type in a sentence
        "{THEYVE}" : t = set[prns.HAVE]
        "{IS}" : t = set[prns.IS] # Singular "they" requires this
        "{S}" : t = set[prns.S] # For grammatical reasons
        "{ES}" : t = set[prns.ES] # Also for grammatical reasons
        "{N}" : t = set[prns.N] # Ditto
        _ : # placeholder not found
            t = s 

    if i == 0:
        cap = true
    elif res[i-2] == "." and res[i-1] == " ":
        cap = true
    t = t.capitalize() if cap else t
    res.erase(i, (res.find("}",i+1) - i) + 1)
    res = res.insert(i,t)

Forgive my beginner programming skills but I can't help but feel like there's a better or more efficient way to do this. This doesn't seem like that out-there of a use case so maybe I'm missing something obvious? Yet I can't find any questions or instruction relating to this online.

EDIT: I've made a few fixes which should prevent improper capitalization or hanging given the wrong string but my question still stands.

Godot version 3.2.3.stable.official
in Engine by (53 points)

1 Answer

0 votes
Best answer

Just a thought, but assuming the tokenized strings are all canned, why not just record the appropriate case within the token representation itself? So, for your example above, maybe something like this:

var testcase = "Susie is a {species}. {Species}s have soft fur and four legs."

Note that I've used two different character cases for the species token as required for the sentence structure.

If something like that would work, you'd then just need to find the tokens in a case insensitive way, but then transfer the token's actual case to the substituted string. That might make the substitution a bit trickier, but at least then, you'd have absolute control over the use of case in the sentences, rather than relying on an algorithm to "figure it out".

by (10,914 points)
selected by

Regarding your general method of parsing an input string, finding the tokens, and substituting them in some output string...

I'd assume a more elegant solution could be devised leveraging regular expressions to locate the tokens and basic string replace calls to substitute them, with some logic in between to determine appropriate substitution strings.

That's probably a path I'd investigate if I were doing this. That said, I don't have any code I can point to that'd be particularly helpful here ATM...



I was hoping there'd be some way to not have to essentially reimplement format(), but honestly now that it's all said and done the function doesn't really seem all that bad. When first considering how to approach this problem I rejected the idea of just encoding the case in the token because the only way I could think to do it at the time was to create a duplicate for every token, which is obviously absurd, but I've used a token[1].casecmp_to("a") to implement your suggested approach and it works perfectly
This solution doesn't work with diacritics, but I wasn't really planning to use them in tokens anyhow

Yes, exactly. You don't want to duplicate the tokens, just compare them in a case insensitive manner. Anyway, glad you have it working.

Welcome to Godot Engine Q&A, where you can ask questions and receive answers from other members of the community.

Please make sure to read How to use this Q&A? before posting your first questions.
Social login is currently unavailable. If you've previously logged in with a Facebook or GitHub account, use the I forgot my password link in the login box to set a password for your account. If you still can't access your account, send an email to webmaster@godotengine.org with your username.