# Bulk Processing - [Introduction](#introduction) - [Python Regular Expressions](Python-Regular-Expressions) - [GAM Configuration](gam.cfg) - [Meta Commands and File Redirection](Meta-Commands-and-File-Redirection) - [Definitions](#definitions) - [Batch files](#batch-files) - [CSV files](#csv-files) - [CSV files with redirection and select](#csv-files-with-redirection-and-select) - [Automatic batch processing](#automatic-batch-processing) - [Process Google Sheet commands and save results](#process-google-sheet-commands-and-save-results) ## Introduction Batch and CSV file processing can improve performance by executing Gam commands in parallel. The variables `num_threads`, `num_tbatch_threads` and `auto_batch_min` in `gam.cfg` control parallelism. ## Definitions * [Command data from Google Docs/Sheets/Storage](Command-Data-From-Google-Docs-Sheets-Storage) `gdoc ` and `gsheet ` ## Batch files There are two types of batch processing, one that uses processes and one that uses threads. Using processes is higher performance but `gam csv` commands are not supported. * `gam batch` - gam commands are run as processes, gam csv commands are not allowed in the batch file * `gam tbatch` - gam commands are run as threads, gam csv commands are allowed in the batch file ``` gam batch |-|(gdoc ) [charset ] [showcmds []] gam tbatch |-|(gdoc ) [charset ] [showcmds []] ``` * `` - A flat file containing Gam commands * `-` - Gam commands coming from stdin * `gdoc ` - A Google Doc containing Gam commands * `showcmds` - Write `timestamp,command number/number of commands,command` to stderr when each command starts; write `timestamp, command number/numberof commands,complete` to stderr when command completes Batch files can contain the following types of lines: * Blank lines - Ignored * \# Comment line - Ignored * gam \ - Execute a GAM command * commit-batch * GAM waits for all running GAM commands to complete * GAM continues * commit-batch \ * GAM waits for all running GAM commands to complete * GAM prints \ and waits for the user to press any key * GAM continues * sleep \ - Batch processing will suspend for \ seconds before the next command line is processed * To be effective, this should immediately follow commit-batch * print \ - Print \ on stderr * set \ \ * Subsequent lines will have %\% replaced with \ * clear \ * Subsequent lines will not be scanned for %\% Tbatch files can also contain the following line: * execute \ \ - Execute an arbitrary command; use the full path to specify \ ### Example * You need to create accounts for your new students and assign them to groups based on their graduation year. * You have a CSV file NewStudents.csv with columns: Email,First,Last,GradYear,Password * You have a batch file NewStudents.bat containing these commands: ``` gam csv NewStudents.csv gam create user "~Email" firstname "~First" lastname "~Last" org "/Students/~~GradYear~~" password "~Password" commit-batch gam update group seniors sync members ou /Students/2020 gam update group juniors sync members ou /Students/2021 gam update group sophomores sync members ou /Students/2022 gam update group highschool sync members ous "'/Students/2020','/Students/2021','/Students/2022'" ``` * Execute the batch file ``` gam redirect stdout ./NewStudents.out redirect stderr ./NewStudents.err tbatch NewStudents.bat showcmds ``` ## CSV files ``` gam csv |-|(gsheet )|(gdoc ) [charset ] [warnifnodata] [columndelimiter ] [noescapechar ] [quotechar ] [fields ] (matchfield|skipfield )* [showcmds []] [skiprows ] [maxrows ] gam gam loop |-|(gsheet )|(gdoc ) [charset ] [warnifnodata] [columndelimiter ] [noescapechar ] [quotechar ] [fields ] (matchfield|skipfield )* [showcmds []] [skiprows ] [maxrows ] gam ``` * `gam csv` - Use parallel processing * `gam loop` - Use serial processing * `` - A CSV file and the one or more columns that contain data * `-` - The one or more columns that contain data from stdin * `gsheet ` - A Google Sheet and the one or more columns that contain data * `gdoc ` - A Google Doc and the one or more columns that contain data * `columndelimiter ` - Columns are separated by ``; if not specified, the value of `csv_input_column_delimiter` from `gam.cfg` will be used * `noescapechar ` - Should `\` be ignored as an escape character; if not specified, the value of `csv_input_no_escape_char` from `gam.cfg` will be used * `quotechar ` - The column quote characer is ``; if not specified, the value of `csv_input_quote_char` from `gam.cfg` will be used * `fields ` - The column headings of a CSV file that does not contain column headings. * `(matchfield|skipfield )*` - The criteria to select rows from the CSV file; can be used multiple times; if not specified, all rows are selected * `showcmds` - Write `timestamp,command number/number of commands,command` to stderr when each command starts; write `timestamp, command number/numberof commands,complete` to stderr when command completes * `skiprows ` - Skip filtered rows from the CSV file/Google Sheet. * `skiprows 0` - All rows are processed, this is the default * `skiprows N` - The first N filtered rows are skipped * `maxrows ` - Limit the number of filtered rows processed from the CSV file/Google Sheet after any skipped rows. * `maxrows 0` - All rows are processed, this is the default * `maxrows N` - N filtered rows are processed ### Use CSV file values in command line You can make substitutions in `` with values from the CSV file. - Reference the field xxx with `~xxx` if the argument contains no other text - Reference the field xxx with `~~xxx~~` if the argument contains other text - An argument containing exactly `~xxx` is replaced by the value of field xxx - An argument containing instances of `~~xxx~~` has `~~xxx~~` replaced by the value of field xxx - An argument containing instances of `~~xxx~!~pattern~!~replacement~~` has `~~xxx~!~pattern~!~replacement~~` replaced by re.sub(pattern, replacement, value of field xxx) See: https://docs.python.org/3/library/re.html If an argument is specifying a file path and it starts with a `~`, e.g., `targetfolder "~/Documents/GamWork"`, GAM will flag it as an error: ``` ERROR: Header "/Documents/GamWork/" not found in CSV headers of "Owner,id,title". ``` Put a space in front of the `~`: `targetfolder " ~/Documents/GamWork"` to avoid the error. ### Example * You need to update the work addresses of a set of users * You want a note field that shows their email address as name AT domain.com * You have a CSV file Users.csv with columns: primaryEmail,Street,City,State,ZIP ``` gam csv Users.csv gam update user "~primaryEmail" address type work unstructured "~~Street~~, ~~City~~, ~~State~~ ~~ZIP~~" primary note text_plain "~~primaryEmail~!~^(.+)@(.+)$~!~\1 AT \2~~" ``` * You want to do the above using a Google Sheet ``` gam csv gsheet "" gam update user "~primaryEmail" address type work unstructured "~~Street~~, ~~City~~, ~~State~~ ~~ZIP~~" primary note text_plain "~~primaryEmail~!~^(.+)@(.+)$~!~\1 AT \2~~" ``` ## CSV files with redirection and select You should use the `multiprocess` option on any redirected files: `csv`, `stdout`, `stderr`. ``` gam redirect csv ./filelistperms.csv multiprocess csv Users.csv gam user "~primaryEmail" print filelist fields id,name,mimetype,basicpermissions gam redirect csv - multiprocess todrive csv Users.csv gam user "~primaryEmail" print filelist fields id,name,mimetype,basicpermissions ``` If you want to select a `gam.cfg` section for the command, you can select the section at the outer `gam` and save it or select the section at the inner `gam`. ``` gam select
save redirect csv ./filelistperms.csv multiprocess csv Users.csv gam user "~primaryEmail" print filelist fields id,name,mimetype,basicpermissions gam redirect csv ./filelistperms.csv multiprocess csv Users.csv gam select
user "~primaryEmail" print filelist fields id,name,mimetype,basicpermissions gam select
save redirect csv - multiprocess todrive csv Users.csv gam user "~primaryEmail" print filelist fields id,name,mimetype,basicpermissions gam redirect csv - multiprocess todrive csv Users.csv gam select
user "~primaryEmail" print filelist fields id,name,mimetype,basicpermissions ``` ## Automatic batch processing You can enable automatic batch (parallel) processing when issuing commands of the form `gam ...`. In the following example, if the number of users in group sales@domain.com exceeds 1, then the `print filelist` command will be processed in parallel. ``` gam config auto_batch_min 1 redirect csv ./filelistperms.csv multiprocess group sales@domain.com print filelist fields id,name,mimetype,basicpermissions gam config auto_batch_min 1 redirect csv - multiprocess todrive group sales@domain.com print filelist fields id,name,mimetype,basicpermissions ``` With automatic batch processing, you should use the `multiprocess` option on any redirected files: `csv`, `stdout`, `stderr`. If you want to select a `gam.cfg` section for the command, you must select and save it for it to be processed correctly. ``` gam select
save config auto_batch_min 1 redirect csv ./filelistperms.csv multiprocess group sales@domain.com print filelist fields id,name,mimetype,basicpermissions ``` ## Process Google Sheet commands and save results You want to process data from a Google Sheet tab and save the results to another tab in the same sheet. Make a Google sheet with two tabs: Commands, Results; get the File ID and the two tab IDs. Put your command data in the Commands tab. Run your command, write the results to Results.txt ``` gam redirect stdout ./Results.txt multiprocess redirect stderr stdout csv gsheet user@domain.com id: gam ... Command ``` Upload Results.txt to the Results tab of the sheet. ``` gam user user@domain.com update drivefile localfile Results.txt retainname gsheet id: ```