2.2.3. Scripts of application of graphs and of extraction of concordances by analysis graph

2.2.3.1. Application of Unitex graphs

To apply graphs, I used Unitex external programs in command line. It had to make the following operations:

  1. Normalize
  2. create a directory file_snt for each abstract
  3. Tokenize
  4. Dico
  5. Locate
  6. Concord

Normalize allows to normalize text separators and products a file .snt.
Tokenize divides text in lexical units and products four files: tokens.txt, tok_by_freq.txt, tok_by_alph.txt and stat.n.
These files are generated in a directory file_snt which we have to create before to use the Tokenize command.
Dico allows to apply dictionaries.
Locate allows to apply a graph on fst2 format and generates the index file of concordances. To generate the concordance file on HTML format, we use the Concord command.

So, we obtain the following sequence for apply the graph « argumentation.fst2 » to the file « 14c as a tool to trace terrestrial carbon in a complex lake: implications for food-web structure and carbon cycling.txt »:

  1. ./Normalize unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling.txt
  2. mkdir unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling_snt
  3. ./Tokenize unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling.snt
  4. ./Dico -t unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling.snt Unitex3.0/English/Dela/dela-en-public.bin
  5. ./Locate -t unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling.snt unitex/English/Graphs/argumentation.fst2
  6. ./Concord unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling_snt/concord.ind -fCourier new -s12 -l40 -r55

The result of this script is a directory which contains others folders which have several concordances files. Then, we try to regroup these concordances files by type of analysis.

2.2.3.2. Extraction of concordances by analysis graph

We used a script to extract concordances and regroup them by analysis type. The result of this extraction is a set of 40 concordances files:

2.2.3.3. Creation of analysis database

Then, we exported the data extracted by Unitex (number of occurrences and forms of a motif) to a new csv file which contains the initial data of projects and the new data which we have extracted with Unitex. We added two fields (one for the number of occurrences of the motif, other for forms of the motif) by type of analysis:

  • action_occ
  • action_forms
  • agricultural_occ
  • agricultural_forms
  • biodiversity_occ
  • biodiversity_forms
  • composedwords_occ
  • composedwords_forms
  • cultural_occ
  • cultural_forms
  • disservice_occ
  • disservice_forms
  • ecosystem_occ
  • ecosystem_forms
  • ecosystem_function_occ
  • ecosystem_function_forms
  • ecosystem_service_occ
  • ecosystem_service_forms
  • entity_occ
  • entity_forms
  • expected_occ
  • expected_forms
  • expected_se2_occ
  • expected_se2_forms
  • fields_occ
  • fields_forms
  • future_occ
  • future_forms
  • future_se2_occ
  • future_se2_forms
  • goal_occ
  • goal_forms
  • goal_se2_occ
  • goal_se2_forms
  • management_occ
  • management_forms
  • organism_occ
  • organism_forms
  • past_occ
  • past_forms
  • past_se2_occ
  • past_se2_forms
  • place_occ
  • place_forms
  • present_occ
  • present_forms
  • present_se2_occ
  • present_se2_forms
  • production_provision_occ
  • production_provision_forms
  • provision_occ
  • provision_forms
  • regulation_occ
  • regulation_forms
  • reqWOS_occ
  • reqWOS_forms
  • reqWOS_niveau3_occ
  • reqWOS_niveau3_forms
  • resilience_occ
  • resilience_forms
  • se1_occ
  • se1_forms
  • se2_occ
  • se2_forms
  • se2_niveau3_occ
  • se2_niveau3_forms
  • service_occ
  • service_forms
  • service_function_occ
  • service_function_forms
  • support_occ
  • support_forms
  • types_ecosystems_occ
  • types_ecosystems_forms
  • types_ecosystems_tempag_occ -types_ecosystems_tempag_forms
  • vu_occ
  • vu_forms
  • level