# Bulk Processing - [Introduction](#introduction) - [Python Regular Expressions](Python-Regular-Expressions) - [GAM Configuration](gam.cfg) - [Meta Commands and File Redirection](Meta-Commands-and-File-Redirection) - [Definitions](#definitions) - [Batch files](#batch-files) - [CSV files](#csv-files) - [CSV files with redirection and select](#csv-files-with-redirection-and-select) - [Automatic batch processing](#automatic-batch-processing) ## Introduction Batch and CSV file processing can improve performance by executing Gam commands in parallel. The variables `num_threads`, `num_tbatch_threads` and `auto_batch_min` in `gam.cfg` control parallelism. ## Definitions * [Command data from Google Docs/Sheets/Storage](Command-Data-From-Google-Docs-Sheets-Storage) `gdoc ` and `gsheet ` ## Batch files There are two types of batch processing, one that uses processes and one that uses threads. Using processes is higher performance but `gam csv` commands are not supported. * `gam batch` - gam commands are run as processes, gam csv commands are not allowed in the batch file * `gam tbatch` - gam commands are run as threads, gam csv commands are allowed in the batch file ``` gam batch |-|(gdoc ) [charset ] [showcmds []] gam tbatch |-|(gdoc ) [charset ] [showcmds []] ``` * `` - A flat file containing Gam commands * `-` - Gam commands coming from stdin * `gdoc ` - A Google Doc containing Gam commands * `showcmds` - Write `timestamp,command number/number of commands,command` to stderr when each command starts; write `timestamp, command number/numberof commands,complete` to stderr when command completes Batch files can contain the following types of lines: * Blank lines - Ignored * \# Comment line - Ignored * gam \ - Execute a GAM command * commit-batch * GAM waits for all running GAM commands to complete * GAM continues * commit-batch \ * GAM waits for all running GAM commands to complete * GAM prints \ and waits for the user to press any key * GAM continues * print \ - Print \ on stderr * set \ \ * Subsequent lines will have %\% replaced with \ * clear \ * Subsequent lines will not be scanned for %\% Tbatch files can also contain the following line: * execute \ \ - Execute an arbitrary command; use the full path to specify \ ### Example * You need to create accounts for your new students and assign them to groups based on their graduation year. * You have a CSV file NewStudents.csv with columns: Email,First,Last,GradYear,Password * You have a batch file NewStudents.bat containing these commands: ``` gam csv NewStudents.csv gam create user ~Email firstname ~First lastname ~Last org "/Students/~~GradYear~~" password ~Password commit-batch gam update group seniors sync members ou /Students/2020 gam update group juniors sync members ou /Students/2021 gam update group sophomores sync members ou /Students/2022 gam update group highschool sync members ous "'/Students/2020','/Students/2021','/Students/2022'" ``` * Execute the batch file ``` gam redirect stdout ./NewStudents.out redirect stderr ./NewStudents.err tbatch NewStudents.bat showcmds ``` ## CSV files ``` gam csv |-|(gsheet )|(gdoc ) [charset ] [warnifnodata] [columndelimiter ] [quotechar ] [fields ] (matchfield|skipfield )* [showcmds []] [maxrows ] gam gam loop |-|(gsheet )|(gdoc ) [charset ] [warnifnodata] [columndelimiter ] [quotechar ] [fields ] (matchfield|skipfield )* [showcmds []] [maxrows ] gam ``` * `gam csv` - Use parallel processing * `gam loop` - Use serial processing * `` - A CSV file and the one or more columns that contain data * `-` - The one or more columns that contain data from stdin * `gsheet ` - A Google Sheet and the one or more columns that contain data * `gdoc ` - A Google Doc and the one or more columns that contain data * `columndelimiter ` - Columns are separated by ``; if not specified, the value of `csv_input_column_delimiter` from `gam.cfg` will be used * `quotechar ` - The column quote characer is ``; if not specified, the value of `csv_input_quote_char` from `gam.cfg` will be used * `fields ` - The column headings of a CSV file that does not contain column headings. * `(matchfield|skipfield )*` - The criteria to select rows from the CSV file; can be used multiple times; if not specified, all rows are selected * `showcmds` - Write `timestamp,command number/number of commands,command` to stderr when each command starts; write `timestamp, command number/numberof commands,complete` to stderr when command completes * `maxrows ` - Limit the number of filtered rows processed from the CSV file/Google Sheet. * `maxrows 0` - All rows are processed, this is the default * `maxrows N` - N filtered rows are processed ### Use CSV file values in command line You can make substitutions in `` with values from the CSV file. - Reference the field xxx with `~xxx` if the argument contains no other text - Reference the field xxx with `~~xxx~~` if the argument contains other text - An argument containing exactly `~xxx` is replaced by the value of field xxx - An argument containing instances of `~~xxx~~` has `~~xxx~~` replaced by the value of field xxx - An argument containing instances of `~~xxx~!~pattern~!~replacement~~` has `~~xxx~!~pattern~!~replacement~~` replaced by re.sub(pattern, replacement, value of field xxx) See: https://docs.python.org/3/library/re.html If an argument is specifying a file path and it starts with a `~`, e.g., `targetfolder "~/Documents/GamWork"`, GAM will flag it as an error: ``` ERROR: Header "/Documents/GamWork/" not found in CSV headers of "Owner,id,title". ``` Put a space in front of the `~`: `targetfolder " ~/Documents/GamWork"` to avoid the error. ### Example * You need to update the work addresses of a set of users * You want a note field that shows their email address as name AT domain.com * You have a CSV file Users.csv with columns: primaryEmail,Street,City,State,ZIP ``` gam csv Users.csv gam update user ~primaryEmail address type work unstructured "~~Street~~, ~~City~~, ~~State~~ ~~ZIP~~" primary note text_plain "~~primaryEmail~!~^(.+)@(.+)$~!~\1 AT \2~~" ``` * You want to do the above using a Google Sheet ``` gam csv gsheet "" gam update user "~primaryEmail" address type work unstructured "~~Street~~, ~~City~~, ~~State~~ ~~ZIP~~" primary note text_plain "~~primaryEmail~!~^(.+)@(.+)$~!~\1 AT \2~~" ``` ## CSV files with redirection and select You should use the `multiprocess` option on any redirected files: `csv`, `stdout`, `stderr`. ``` gam redirect csv ./filelistperms.csv multiprocess csv Users.csv gam user ~primaryEmail print filelist fields id,title,permissions,owners.emailaddress ``` If you want to select a `gam.cfg` section for the command, you can select the section at the outer `gam` and save it or select the section at the inner `gam`. ``` gam select
save redirect csv ./filelistperms.csv multiprocess csv Users.csv gam user ~primaryEmail print filelist fields id,title,permissions,owners.emailaddress gam redirect csv ./filelistperms.csv multiprocess csv Users.csv gam select
user ~primaryEmail print filelist fields id,title,permissions,owners.emailaddress ``` ## Automatic batch processing You can enable automatic batch (parallel) processing when issuing commands of the form `gam ...`. In the following example, if the number of users in group sales@domain.com exceeds 1, then the `print filelist` command will be processed in parallel. ``` gam config auto_batch_min 1 redirect csv ./filelistperms.csv multiprocess group sales@domain.com print filelist fields id,title,permissions,owners.emailaddress ``` With automatic batch processing, you should use the `multiprocess` option on any redirected files: `csv`, `stdout`, `stderr`. If you want to select a `gam.cfg` section for the command, you must select and save it for it to be processed correctly. ``` gam select
save config auto_batch_min 1 redirect csv ./filelistperms.csv multiprocess group sales@domain.com print filelist fields id,title,permissions,owners.emailaddress ```