Element identifier format

Element identifier format

https://support.genopro.com/Topic30397.aspx

Print Topic | Close Window

By Nand - Wednesday, July 18, 2012

I received a genogram which I exported to XML. The resulting file contains strangely formatted element identifiers.
Instead of the usual "ind00001" for an individual ID, it reads "I1". The same applies to family IDs.

GenoPro does not seem to have any problem with this and even after modifying the genogram it uses the same short identifier format after saving the new version and exporting it. Is there a flag somewhere that sets this format? I could not find one.

Sample:

<Individuals>
<Individual ID="I1">

and:

<PedigreeLink PedigreeLink="Parent" Family="F13" Individual="I1"/>

Regards,
Nand

By jcmorin - Wednesday, July 18, 2012

The identifier are only used to link the object (individual, pedigree, family) together.

In the more recent version we use smaller ID to make the file smaller.

For a single individual I know you can specify the ID in the table layout.

Just curious, why the ID matter to you?

By Nand - Wednesday, July 18, 2012

Hi J.C.

It matters to me because I'm processing the XML file directly in my "merge" tool and I'm using the IDs as a numerical index.

In the past, GenoPro used the GEDCOM format for the IDs (like ind00001) and now, all of a sudden this changed. I agree that the new version is "better" in some cases (forget about sorting) and that it removes the 99999 limit, but I would like to know how to easily differentiate between both. GenoPro does not seem to have any problem with this, so I presume there's a switch somewhere. Or are you testing "on the fly" when loading the file? Like if it starts with "ind" it's the old version, if it starts with "I" it's the new one. Let us not make the same mistake as "they" did with UTF-8 where you sometimes also have to "guess" what you are dealing with.

Regards,
Nand

By Nand - Wednesday, July 18, 2012

Regarding the table layout we now have two different numberings in the the ID column. Some people will see formats like "ind00074" in one table others "I74" in another table and those like me, who are comparing tables, may see both at the same time. Fun. But no problem, convering to a uniform format is no issue.

Question: why did you not opt for the simpliest version, without the letter prefix?

Like in:

<PedigreeLink PedigreeLink="Parent" Family="13" Individual="1"/>

(not Family="F13" and Individual="I1")

That would save you another 2 unicode bytes per reference.

By genome - Wednesday, July 18, 2012

I have many of the compact format identifiers in my .gno. They arose when I imported a Gedcom file many years ago. Evidently some other packages hold identifiers in this format and export the same to gedcom, and so mixed format is unavoidable, irrespective of how GenoPro derives its own as data could arrive from various other packages. e.g. typically one could receive a gedcom from a relative with many overlapping records which are to be merged.

But does the format impact on a merge? Many years ago I started to write, but never actually finished, a VBscript to merge files, the only way I could see of achieving a merge was to compare key items like name, date of birth/death etc. Where a match is vague, e.g. no dates only names available I decided to prompt for confirmation of a match before merging.

By jcmorin - Wednesday, July 18, 2012

Nand (18-Jul-2012)

Question: why did you not opt for the simpliest version, without the letter prefix?

Like in:

<PedigreeLink PedigreeLink="Parent" Family="13" Individual="1"/>

(not Family="F13" and Individual="I1")

That would save you another 2 unicode bytes per reference.

We wanted all IDs to be unique... assigning a number to an identifier would make some IDs having the same value, such as an individual with the ID 1 and a family with the same ID.

By vlepore - Wednesday, July 18, 2012

I confirm everything he says Genoma:
The presence of ID with a different format (123 or I123, or ind00123), is due to the import of different file.ged, acquired with several software. GenoPro accept any ID in the Import function, just erase the identical ID. Even with the manual acquisition of the "Permanent ID" in the Family Folder, you can insert any ID (even aaa, bbb, ccczxz, 1, ...). Then with Save, builds the missing ID always in the format ind00123, increasing by 1 the highest ID of that format.

Given that the ID has value only as a key link between records, should provide a function for a complete reconstruction of the IDs in a single format.

Moreover, it seems very dangerous to perform a merge automatically. I would consider it safer to ask for confirmation of a match before merging.

I hope I given to understand with my Google-english.
Greetings

By Nand - Thursday, July 19, 2012

Wait a moment. We are mixing up three different things here.

(1) The fact that I'm processing identifiers in a numeric format has nothing to do with the merge tool but is used in a completely different project that is of no benefit to GenoPro users.

(2) My basic question was: how does GenoPro differentiate between "ind00001" and "I1" without analyzing the values. The official answer seems to be that it does not differentiate them at all that it just processes them as identifier strings. I strongly doubt about this, but anyhow, so be it.

(3) I never mentioned an "automatic" merge, did I? That would be suicidal. On the contrary, version 1 will actually ask you to identify a matching individual pair by specifying the identifiers of both individuals. I have to start somewhere, hence my renewed interest in identifiers. Version 2 is intended to help you find matching pairs. But that's another story and has nothing to do with this topic.

Stay tuned,
Nand