аватар question@mail.ru · 01.01.1970 03:00

Python 3 multidimensional dictionary

Studying Python 3.X, I went on the basics. Before that, I wrote on php.

now I decided to complete the plan:

  • read CSV
  • as the file lines read, process them and write them in an array, creating the desired structure
  • ...

    now My code

       import  csv  def   gene_name  ( gene, polfzm, genotype ):   retu   '_'  .join ([gene, polfzm, genotype]). strip (). Replace (, ) data = {} geenes = []   of   open  ( 'example-2.csv' ,  '`` )  as  file:  reader = csv.dictreader (File, Delimiter =  ';' )   for  row  in  reader:  group = row [ 'group' ]  person_id = row [ 'person_id' ]  ab = row [ '' genotype ']  name = gene_name (row [ 'gene_name' ], row [ 'polfzm_name' ], ab)  data [group] [name_id] = ab # arises Error      

    and there is a problem. If I understand correctly, then a word is suitable for storing my data in a structured form - data . In PHP, I could easily create the structure/attachment of the data [group] [person_id] [name] = ab . Immediately there is an error keyerror: '3'

    what's the mistake of my approach. In the courses that I passed, did not work with multidimensional structures. Did I choose the type of variable for storing data correctly - a dictionary? What is the right thing to do here?

    the structure of the data that needs to be obtained

    an example of data csv

      Person_id; Gene_name; Polfzm_name; Genotype; Expr1; Group  1086 ; at1; f11; t/c RS2036914; take a rally; take;  3    1086 ; at1; f11; s/t rs2289252; take away;  3    1086 ; at1; itga2; c/t rs1126643; ct; take away;  3    1086 ; at1; itgb3; t/c; t/c RS5918; take a rally;  3    1086 ; at1; vegf; c/t rs3025039; take a rally;  3    1085 ; at2; f11; t/c rs2036914; tt; take a rally;  3    1085 ; at2; f11; s/t RS2289252; take a rally;  3    1085 ; at2; itga2; c/t rs1126643; tt; take on frequent;  3    1085 ; at2; itgb3; t/c rs5918; tt; take a rally;  3    1085 ; at2; vegf; c/t; RS3025039; take a rally;  3    23 ; Foreign Ministry  6 ; ac; alu inds/del; ID; class = ""> 1    23 ; Foreign Ministry  6 ; f1; thr312ala; pregnanthrm;  1    23 ; Foreign Ministry  6 ; f11; t/c RS2036914; tc; pregnanturm;  1    23 ; Foreign Ministry  6 ; f11; s/t rs2289252; ct; pregnant gram;  1    23 ; Foreign Ministry  6 ; f13; val34leu; valleu; pregnant gram;  1    23 ; the Ministry of Foreign Affairs  6 ; f2; g20210a; gg; pregnant gram;  1    23 ; Foreign Ministry  6 ; f5; arg506gln; gg; pregnant core;  1    23 ; FMI  6 ; itga2; c/t RS1126643; CC; PCP;  1    23 ; Foreign Ministry  6 ; itgb3; t/c RS5918; tc; class = ""> 1    23 ; Foreign Ministry  6 ; vegf; g-634c; gg; taken;  1  

    data - flat data sample from a relational database. T.K. There were connections, and the data was selected from several tables, many values ​​from the line to the line are repeated. For example, the first 5 lines of data (not taking into account the hat) - belong to one person (person_id), one group (Group), but have different genes (Gene_name), etc.

    , according to data, it is planned to make multiple comparisons, search for combinations, etc. - the order of ~ 4 million passes, so you need such a structure as not to run along all lines of CSV so many times

  • аватар answer@mail.ru · 01.01.1970 03:00

    Initially, your dictionary is empty. When requesting data for any key from an empty dictionary, a KeyError exception will occur. In this case, you can use defaultdict:

    from collections import defaultdict# Dictionary with default value of another dictionary,#with default value of regular dictionarydata = defaultdict(lambda: defaultdict(dict))group = 'group'person_id = '3'name =">'3'name = 'John Smith'ab = 'ab'data[group][person_id][name]= abprint(data[group][person_id][name]) # Outputs abOutpts ab

    Full code:

    import csvfrom collections import defaultdictfrom pprint import pprintdef/span> gene_namene_name(gene, polfzm, genotype):    retu '_'.joian>.join([gene, polfzm, genotype]).strip().replace("" "", ""-"")data = defaultdict(lambda: defaultdict(dict))gen(dict))genes = []with open('example-2.csv',xample-2.csv', 'r') as file:    reader = csv.DictReader(file, delimiter=';')    for row in reader:        group = row['group']        person_id = row['person_id']        ab = row['genotype']        name = gene_name(row['gene_name'], row['polfzm_name'], ab)        data[group][person_id][name] = abpprint(data)

    Output using your data as an example:

    defaultdict(lambda>; at 0x7f73a7732ee0>,            {'1': defaultdict(<classlass="">class 'dict'gt;, {'23': {'ACE_Alu-Ins/Del_ID': 'ID', 'F11_T/C-rs2036914_TC': 'TC', 'F11_C/T-rs2289252_CT': 'CT', 'F13_Val34Leu_ValLeu': 'ValLeu', 'F1_Thr312Ala_ThrThr': 'ThrThr', 'F2_G20210_GG': 'GG', 'F5_Arg506Gln_GG': 'GG', 'ITGA2_C/T-rs1126643_CC': 'CC',                                      'ITGB3_T/C-rs5918_TC': 'TC',                                      'VEGF_G-634C_GG': 'GG'}}),             '3': defaultdict(<classan>: defaultdict(<class 'dict'gt;, {'1085': {'F11_T/C-rs2036914_TT': 'TT', 'F11_C/T-rs2289252_CC': 'CC', 'ITGA2_C/T-rs1126643_TT': 'TT', 'ITGB3_T/C-rs5918_TT': 'TT', 'VEGF_C/T-rs3025039_CC': 'CC'}, '1086': {'F11_T/C-rs2036914_CC': 'CC', 'F11_C/T-rs2289252_TT': 'TT',                                        'ITGA2_C/T-rs1126643_CT': 'CT',                                        'ITGB3_T/C-rs5918_CC': 'CC',                                        'VEGF_C/T-rs3025039_CC': 'CC'}})3025039_CC': 'CC'}})})

    defaultdict is the same dictionary, in fact, it works exactly the same way (well, except for creating a default value when requesting a missing key). It fits the task you described, only when output via print is displayed differently.

    For output in a more familiar form, you can serialize it in json, for example:

    import jsonprint(json.dumps(data, indent=4, ensure_ascii=False))

    r>

    Solution option without defaultdict:

    import csvfrom pprint import pprintdef pprintdef gene_name(gene, polfzm, genotype):    retu  '_'.join([gene, polfzm, genotype]).strip().replace("" "", ""-"")data =data = dict()genes = []withclass="">with  open('example-2.csv', 'r') asas file:    reader = csv.DictReader(file, delimiter=';')    for row in reader:        group = row['group']        person_id = row['person_id']        ab = row['genotype']        name = gene_name(row['gene_name'], row['polfzm_name'], ab)                if group not"">not in data:                  data[group] = dict()            dat   data[group][person_id] on_id] = dict() # micro-optimization elif lass="">elif person_id not inin data[group]:            data[group][person_id] = dict()                data   data[group][person_id][name] = abpprint(data)

    The result is a regular dictionary.


    And another one an option with its own class instead of defaultdict, unlimited dictionary nesting and support for pickle (though without support for json serialization):

    from collections importimport UserDictclass NestedDict(UserDict): t):    des="">def __getitem__(self, key):        if key not class="">not in self.data:            self.data[key] = NestedDict()                retu self.data[key]d = NestedDict()d['Lorem']['ipsum']['dolor']['sit']['amet']['consectetur']['adipiscing']['elit'] = 1print(d)'elit'] = 1print(d)print(type(d))import picklep = pickle.dumps(d)d1 = pickle.loads(p)print(d1)
    {'Lorem': {'ipsum': {'dolor': {'sit': {'amet': {'consectetur': {'adipiscing': {'elit': 1}}}}}}}}<class '__main__.n<class '__main__.NestedDict'>{'Lorem': {'ipsum': {'dolor': {'sit': {'amet': {'consectetur': {'adipiscing': {'elit': 1}}                                    

    Latest

    Similar