2.2.3.1. Application of Unitex graphs
To apply graphs, I used Unitex external programs in command line. It had to make the following operations:
- Normalize
- create a directory file_snt for each abstract
- Tokenize
- Dico
- Locate
- Concord
Normalize allows to normalize text separators and products a file .snt.
Tokenize divides text in lexical units and products four files: tokens.txt, tok_by_freq.txt, tok_by_alph.txt and stat.n.
These files are generated in a directory file_snt which we have to create before to use the Tokenize command.
Dico allows to apply dictionaries.
Locate allows to apply a graph on fst2 format and generates the index file of concordances. To generate the concordance file on HTML format, we use the Concord command.
So, we obtain the following sequence for apply the graph « argumentation.fst2 » to the file « 14c as a tool to trace terrestrial carbon in a complex lake: implications for food-web structure and carbon cycling.txt »:
- ./Normalize unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling.txt
- mkdir unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling_snt
- ./Tokenize unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling.snt
- ./Dico -t unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling.snt Unitex3.0/English/Dela/dela-en-public.bin
- ./Locate -t unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling.snt unitex/English/Graphs/argumentation.fst2
- ./Concord unitex/English/Corpus/Projects/14c\ as\ a\ tool\ to\ trace\ terrestrial\ carbon\ in\ a\ complex\ lake\:\ implications\ for\ food-web\ structure\ and\ carbon\ cycling_snt/concord.ind -fCourier new -s12 -l40 -r55
The result of this script is a directory which contains others folders which have several concordances files. Then, we try to regroup these concordances files by type of analysis.
2.2.3.2. Extraction of concordances by analysis graph
We used a script to extract concordances and regroup them by analysis type. The result of this extraction is a set of 40 concordances files:
- Future
- Past
- Present
- Goals
- Expected
- Future and ecosystem services
- Past and ecosystem services
- Present and ecosystem services
- Goals and ecosystem services
- Expected and ecosystem services
- Ecosystem services and keywords of level 3
- Query WOS and keywords of level 3
- Ecosystem service
- Ecosystem function
- Service or function
- Production or provision of ecosystem services
- Query WOS
- Ecosystem services 1
- Ecosystem services 2
- Provision
- Regulation
- Support
- Cultural
- Ecosystem
- Types of ecosystems
- Types of ecosystems of temperate agriculture
- Agricultural
- Biodiversity
- Resilience
- Entity
- Actions
- Disservices
- Fields
- Management
- Organisms
- Places
- Services
- Vu
- Composed words
2.2.3.3. Creation of analysis database
Then, we exported the data extracted by Unitex (number of occurrences and forms of a motif) to a new csv file which contains the initial data of projects and the new data which we have extracted with Unitex. We added two fields (one for the number of occurrences of the motif, other for forms of the motif) by type of analysis:
- action_occ
- action_forms
- agricultural_occ
- agricultural_forms
- biodiversity_occ
- biodiversity_forms
- composedwords_occ
- composedwords_forms
- cultural_occ
- cultural_forms
- disservice_occ
- disservice_forms
- ecosystem_occ
- ecosystem_forms
- ecosystem_function_occ
- ecosystem_function_forms
- ecosystem_service_occ
- ecosystem_service_forms
- entity_occ
- entity_forms
- expected_occ
- expected_forms
- expected_se2_occ
- expected_se2_forms
- fields_occ
- fields_forms
- future_occ
- future_forms
- future_se2_occ
- future_se2_forms
- goal_occ
- goal_forms
- goal_se2_occ
- goal_se2_forms
- management_occ
- management_forms
- organism_occ
- organism_forms
- past_occ
- past_forms
- past_se2_occ
- past_se2_forms
- place_occ
- place_forms
- present_occ
- present_forms
- present_se2_occ
- present_se2_forms
- production_provision_occ
- production_provision_forms
- provision_occ
- provision_forms
- regulation_occ
- regulation_forms
- reqWOS_occ
- reqWOS_forms
- reqWOS_niveau3_occ
- reqWOS_niveau3_forms
- resilience_occ
- resilience_forms
- se1_occ
- se1_forms
- se2_occ
- se2_forms
- se2_niveau3_occ
- se2_niveau3_forms
- service_occ
- service_forms
- service_function_occ
- service_function_forms
- support_occ
- support_forms
- types_ecosystems_occ
- types_ecosystems_forms
- types_ecosystems_tempag_occ -types_ecosystems_tempag_forms
- vu_occ
- vu_forms
- level