By GenoProSupport - Tuesday, August 30, 2005
|
Not all languages have the same phrase structure. I decided to create a topic dedicated to this issue. From the Universal Report Generator:
rjn (8/18/2005)
GenoProSupport (8/18/2005) Every problem can be solved. This is my philosiphy anyway. Send me several samples Finnish translation in the context of VBScript and I will draft a routine to handle this.That's great! I didn't quite understand what you meant by sending samples in the context of vbscript... I'm having yet difficulties with that language so I'm not sure if I can do that, whatever it was :? Still I can elaborate and send you more specific information and examples of suffixes in possessive structures that I can think of: Basic possessive structure in Finnish: add suffix " n" Examples: Rami -> Rami nToni -> Toni nRisto -> Risto nMarika -> Marika netc. Exceptions: 1) When the last letter is a CONSONANT (except s), add suffix " in" Examples: Tom -> Tom inAslak -> Aslak inJasmin -> Jasmin inMikael -> Mikael inAbraham -> Abraham inElisabet -> Elisabet in2) But if the last letter is consonant s, it is left out and replaced by suffix " ksen": Examples: Armas -> Arma ksenIiris -> Iiri ksenJoonas -> Joona ksenMarkus -> Marku ksenJohannes -> Johanne ksen3) If there is double consonant kk, pp or tt just before ending vowel, it is reduced to single consonant k,p or t: Examples: E ppu -> Epun Pe kka -> Pekan Ti tta -> Titan Se ppo -> Sepon Tuu kka -> Tuukan Rii tta -> Riitan Jukkape kka -> Jukkapekan Markku-Pe kka -> Markku-Pekan (in last two cases you see that only latter kk is affected) 4) Rare case: If name ends "tar", an extra t is added and suffix is " en": Examples: Ilmatar -> Ilmattar enSuometar -> Suomettar en I am thinking of defining some rules for possessive names. GenoPro would lookup each rule, and if one matches the pattern, the processing would stop. In Finnish the rules sould look like this: <PossessiveRules> <Rule EndWith="s" ReplaceWith="ksen" /> <Rule EndWith="tar" ReplaceWith="ttarten" /> <Rule EndWith="kk?" ReplaceWith="k?" /> <Rule EndWith="pp?" ReplaceWith="p?" /> <Rule EndWith="tt?" ReplaceWith="t?" /> <Rule Append="n" /> </PossessiveRules> |
In English, the rules would look like this: <PossessiveRules> <Rule EndWith="s" Append="'" /> <Rule EndWith="'" /> <!-- Do nothing. The processing will stop at this rule if the noun ends with the apostrophe --> <Rule Append="'s" /> <!-- Otherwise, append the "'s" to the noun --> </PossessiveRules> |
|
By Rjn - Wednesday, August 31, 2005
|
I think this is a great idea.
Having such rules they could easily be modified for each language, and there wouldn't have to be a common list. Items could be added by need or deleted.
This kind of list of rules might be a way to solve the place name and well as last name inflection I told you about. If there was a rule, for example, for names ending ...mäki or ...joki that in genetive they become ...mäen and ...joen, 1.1. Kivimäki -> Kivimäen 1.2. Marjamäki -> Marjamäen 2.1. Seinäjoki -> Seinäjoen 2.2. Kauhajoki -> Kauhajoen 2.3. Törnävän joki -> Törnävän joen etc. and all other not mentioned on "rules for irregular inflection" list would follow the general rule (add n), it just might work in most of the cases.
|
By GenoProSupport - Wednesday, August 31, 2005
|
Also, having a pattern in a rule using a * or ? wildcard could help to simplify the rules. I prefer to avoid patterns as much as possible, so the rules are much easier to read. For instance, it is easier to read<Rule EndWith="a" Append="n" /> |
than reading <Rule Pattern="*a" Append="n" /> |
|
By Alfi - Wednesday, September 7, 2005
|
I think - instead of: should become: <Rule EndWith="a" ReplaceWith="an"> <Rule EndWith="i" ReplaceWith="in"> <Rule EndWith="o" ReplaceWith="on"> <Rule Append="in"> |
|
By V.L.o - Wednesday, September 7, 2005
|
Something like thet could help me too!
|
By GenoProSupport - Wednesday, September 7, 2005
|
I think we should have two topcis for the linguistic rules. One topic for possessive rules and another topic for place prefix rules. This way, we can discuss the minute details of each rules without confusion.
|
By Yehudad - Wednesday, September 7, 2005
|
In Hebrew there is no EndWith. We can append the word "של" - "SHEL" before or after the individual. The best example that I can give in English is: "His parents" and "the parents of" it depends on the sentance you want to write. In the case that you append after the name, there is the gender to consider. It should look like this:
<PossessiveRules> <Rule AppendFirst="של" /> <Rule AppendLast_M="שלו" /> <Rule AppendLast_F="שלה" /> </PossessiveRules> |
|
By GenoProSupport - Wednesday, September 7, 2005
|
You raise a good point. The possive rules must support gender too. The routine will look likeLanguageDictionary.PossessiveName("Daniel", "car", "M") LanguageDictionary.PossessiveName(i.Name.First, "car", i) |
to produce The method PossessiveName will apply the PossessiveRules to produce text for the possession. The gender will be optional, so you could use PossessiveName("Daniel", "car") without having to supply "M" or "F".
|
By Alfi - Wednesday, September 7, 2005
|
Before dealing with possessive rules, it should be given a thought about dividing this behaviour into 3 subgroups: "Possessive" nominal affixes (of NOUNS) "Possessive" pronominal affixes (of PRONOUNS) "Possessive" adjectival affixes (of ADJECTIVES preceeding their NOUNS) By the way, affixes can be of 3 types: prefixes, infixes (as the Finnish "k" before "s"), and suffixes.
|
By maru-san - Wednesday, September 7, 2005
|
If possible the possesive pronoun should not be written in the file "Lang.vbs" since japanese character will not be accepted there.The japanese pronoun is "no = の" and is written as a suffix.
|
By Yehudad - Wednesday, September 7, 2005
|
As I understand all the rules should be in the dictionary.xml file and the report generator will read them from there.
|
By GenoProSupport - Wednesday, September 7, 2005
|
Yehudad (9/8/2005) As I understand all the rules should be in the dictionary.xml file and the report generator will read them from there.Absolutely. The file dictionary.xml may have as many sections to describe linguistic rules, such as possessive names, plural form, place prefix, etc. I would like to have all rules with the same syntax for orthogonality, so they can be accessed from the LanguageDictionary object (ie LanguageDictionary.PossessiveName(...) , LanguageDictionary.Plural(...) , LanguageDictionary.PlacePrefix(...) , etc) <LinguisticRules>
<PossessiveName> <Rule #1 /> <Rule #2 /> ... </PossessiveName> <Plural> <Rule #1 /> <Rule #2 /> ... </Plural> <PlacePrefix> <Rule #1 /> <Rule #2 /> ... </PlacePrefix>
</LinguisticRules> |
I think this proposed solution would work.
|
By Yehudad - Thursday, September 8, 2005
|
GenoProSupport I think this proposed solution would work.I'm sure that it will work with the help of the good people here. What I love in the dictionary.xml is its flexibility. The ability to edit the file without the need to know any programming language.
|
By genome - Wednesday, November 15, 2006
|
GenoProSupport (8/30/2005)
I am thinking of defining some rules for possessive names. GenoPro would lookup each rule, and if one matches the pattern, the processing would stop. In Finnish the rules sould look like this: <PossessiveRules> <Rule EndWith="s" ReplaceWith="ksen" /> <Rule EndWith="tar" ReplaceWith="ttarten" /> <Rule EndWith="kk?" ReplaceWith="k?" /> <Rule EndWith="pp?" ReplaceWith="p?" /> <Rule EndWith="tt?" ReplaceWith="t?" /> <Rule Append="n" /> </PossessiveRules> |
The next skin update has an interim solution to this issue by using 'regular expression' syntax in the dictionary. The Finnish rules can be expressed as <PossessiveProperNoun T="s$:ksen;tar$:ttarsen;kk(.$);k$1;pp(.$):p$1;tt(.$):t$1;(.$):$1n" /> |
Not quite as readable as Dan's proposed solution but that's life!
|
By Yehudad - Wednesday, November 15, 2006
|
Ron (11/15/2006)
The next skin update has an interim solution to this issue by using 'regular expression' syntax in the dictionary. The Finnish rules can be expressed as <PossessiveProperNoun T="s$:ksen;tar$:ttarsen;kk(.$);k$1;pp(.$):p$1;tt(.$):t$1;(.$):$1n" /> |
Not quite as readable as Dan's proposed solution but that's life! Ron, can you explain, please, the meaning of the "$" and "(.$)" signs? what they stands for (although I think I got it... )?
|
By genome - Wednesday, November 15, 2006
|
Sorry, the Dictionary will have the following by way of explanation. <!-- PossessiveProperNoun - Conversion of Proper Nouns to their possessive form This is arranged as repeating pairs of 'regular expressions' as <part to find>:<replacement>; $ in the '<part to find>' represents the end of the word, so '(s$)' means ending in 's' . is a wild card presenting any character so (.$) means ending with any character. $n in the '<replacement>' means 'matched substring 'n' so $1 means the 1st bracketed string in '<part to find>' Therefore (s$):$1'; means: if word ends with 's' replace 's' by itself ($1) followed by ' (apostrophe) and (.$):$1's means replace any last character in a word by that last character followed by 's Another example: to always add the Japanese character 'の' use (.$):$1の; as the PossessiveProperNoun translation string (you will need to be using a font that supports Japanese to see this!) Once a replacement is made no further pairs are tested. simple isn't it! For more information on regular expressions in VBscript google for 'Windows 5.6 Script Documentation' and download script56.chm and open it. --> <PossessiveProperNoun T="(s$):$1';(.$):$1's;" />
|
Hope that will help to clarify things.
|
By Yehudad - Wednesday, November 15, 2006
|
Of course... Thanks for all of your efforts to the Report Generator.
|