# Week 2 Discussion - Working with Unix ## Self-Help ```{exercise} :label: self_help_cmd_man What could the command `man` stand for (as an abbreviation)? ``` ```{solution} self_help_cmd_man :class: dropdown From `man man`: an interface to the system reference *manuals* ``` ```{exercise} :label: self_help_man_section_numbers Browse the manual page for the command `man`. `man` supports section numbers from 1 to 9. What could be motivation behind this? Hint: try `man 5 passwd` and `man passwd` ``` ```{solution} self_help_man_section_numbers :class: dropdown `man` also contains documentation about configuration files like `/etc/passwd`. So the keyword `passwd` could refer to `/etc/passwd` and the command `passwd`. To differentiate between different types of terms, sections are used. ``` ```{exercise} :label: self_help_man_passwd_vs_etc_passwd Why does `man passwd` show the manual for the command `passwd` but not for the configuration file `/etc/passwd`? Browse `man man` and find the section where this is explained. ``` ```{solution} self_help_man_passwd_vs_etc_passwd :class: dropdown From the section [`DEFAULTS`](https://man.archlinux.org/man/man.1#DEFAULTS): > The order of sections to search ... . By default it is as follows: 1 ... 8 ... 5 ``` ```{exercise} :label: self_help_man_browsing_shortcuts Which keys are very useful when browsing a man page? ``` ```{solution} self_help_man_browsing_shortcuts :class: dropdown - `/` for searching, then `n` and `Shift+n` for jumping to the next and previous, respectively - `q` for quitting ``` ```{exercise} :label: self_help_apropos_naming_motivation Why could the authors of the program `apropos` could have chosen this name? ``` ```{solution} self_help_apropos_naming_motivation :class: dropdown [Cambridge Dictionary - apropos](https://dictionary.cambridge.org/us/dictionary/english/apropos) > used to introduce something that is related to or connected with something that has just been said: > I had an email from Sally yesterday - apropos (of) which, did you send her that article? `apropos` lists manpages related to a keyword, that could be the reason why the authors have chosen this name. ``` ```{exercise} :label: self_help_comparing_different_methods You have a directory `2021-Cambridge-travel` with text and image files. The former have the file extensions `txt` and `odt`, and the latter `jpg`. You want to compress the directory using the tool `tar` and using the algorithm `lzma`. Try the following approaches to achieve your goal. Which approach do you like most? - using `man` and `apropos` - using a search engine - `curl cheat.sh/COMMAND_LINE_TOOL`, e.g., `curl cheat.sh/tar` ``` ```{solution} self_help_comparing_different_methods :class: dropdown Personally I like to search in the `man` first. If the manual page does not exist, is too complex to read, or does not have examples, then I proceed with the web search. For very common tasks web search may be much faster, but some results on the web may be outdated. The reason is that the first results you get are determined by an algorithm which probably favors the most-clicked ones. It takes a while until a more up-to-date page moves to the top. Compared to the web results manual pages tend to be up-to-date. A major disadvantage of manpages is that they traditionally tend to have a strict structure which begins with the description and has some examples in the end - if you have luck. Most times you want to have a short example how to use this tool. I recently discovered [cheat.sh](https://cheat.sh), which provides examples of the command (like *cheat sheets*) and thus overcomes the disadvantage of manual pages. For example, compare `man cut` with [cheat.sh/cut](https://cheat.sh/cut) ``` ````{exercise} :label: self_help_builtin_echo_vs_standalone_echo You look at the manual page for `echo` and see the following options: > ... > --help display this help and exit > --version output version information and exit > ... Then you try the following command: ```bash echo --version ``` which outputs `--version` instead of the version number of `echo`. What could be the reason? ```` ````{solution} self_help_builtin_echo_vs_standalone_echo :class: dropdown Some basic commands like `echo`, `alias`, `which` are [builtin in Bash](https://www.gnu.org/software/bash/manual/html_node/Shell-Builtin-Commands.html). The documentation for builtin commands are in the Bash manual (`man bash`). But what does `man echo` show you then? Well, this shows you the documentation of the standalone `echo` program, which is on the path `/usr/bin/echo`: ```sh $ /usr/bin/echo --version # outputs: # echo (GNU coreutils) 9.0 # ... ``` `/usr/bin/echo` would be used if the shell you are using does not provide their `echo`. Shells tend to integrate `echo` in their core, because `echo` is a very frequently used command. Integrating may be more advantageous for runtime, in other words, it may then run faster compared to running `/usr/bin/echo`. You may be asking > How can I know if a command that I am using is a builtin or a standalone command? You can find it out by using `type`. For example `type echo` in Bash outputs: > echo is a shell builtin But `type grep`: > grep is /usr/bin/grep ```` ## Get Wild ```{exercise} :label: get_wild_iso8601 - What is ISO 8601? - What is its advantage? ``` ```{solution} get_wild_iso8601 :class: dropdown [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) is an international standard to communicate date and time. For example [in USA the date is written as month-date-year](https://en.wikipedia.org/wiki/Date_and_time_notation_in_the_United_States). A date information like `2021-01-02` could cause confusion. ``` ```{exercise} :label: get_wild_searching_for_date_command Find a command which generates today's date in ISO 8601 format. Optional challenge: Use only `apropos` and `man` as an exercise (instead of web search). `apropos -s 1` only searches in section 1 (commands) ``` ```{solution} get_wild_searching_for_date_command :class: dropdown `date -I` ``` ```{exercise} :label: get_wild_globbing_wildcard - What is a *wildcard*? - When would you use a wildcard? ``` ```{solution} get_wild_globbing_wildcard :class: dropdown A (single) character which represents a set of characters. In a directory with different filetypes, a wildcard could be used to filter only images, e.g., `ls *.jpg` ``` ```{exercise} :label: get_wild_ls_globbing You have a folder with files that have the following format: `date-name.filesuffix` (date in ISO 8601 format). How would you list files which have the year 2010 to 2019 the month May in their name? ``` ````{solution} get_wild_ls_globbing :class: dropdown ```bash ls 201*-05* ``` ```` ## Search ```{exercise} :label: search_regex Which regex characters do you know, and what does this character resemble? ``` ```{solution} search_regex :class: dropdown - single character `.` - alternation `|` - quantifiers `+`, `*`, `?`, `{n,m}` - subexpressions `(...)` - bracket expressions `[...]` - negation `[^...]` - anchors `^`, `$` - escape character `\` - character classes `[:alnum:]`, `[:alpha:]`, ... ``` ```{exercise} :label: search_regex_charsets What does `[:alnum:]` mean in `grep`? ``` ```{solution} search_regex_charsets :class: dropdown According to [`grep` man-page](https://man.archlinux.org/man/core/grep/grep.1.en#Character_Classes_and_Bracket_Expressions) `[:alnum:]` is a character class, and as the name suggests, `[:alnum:]` resembles an alphanumeric character. > ... For example, [[:alnum:]] means the character class of numbers and letters in the current locale. In the C locale and ASCII character set encoding, this is the same as [0-9A-Za-z]. ``` ```{exercise} :label: search_regex_extra_exercises Do the exercises on [RegexOne interactive exercises](https://regexone.com) If you are looking more advanced exercises look at [Regex Crossword](https://regexcrossword.com) ``` ```{exercise} :label: search_grep_vs_egrep What is the difference between `grep` and `egrep`? ``` ```{solution} search_grep_vs_egrep :class: dropdown `grep` uses *basic regular expression* (BRE) as default. In basic regular expressions the metachars `?`, `+`, `{`, `|`, `(`, and `)` lose their special meaning and become literal chars, therefore we have to use their backslash-escaped versions. But other metachars like `-` (when used a bracket expression), `*`, `^`, `$`, `[`, `]`, and `.` do not have to be escaped. `egrep` stands for `grep -E`. ``--extended-regexp`` or ``-E`` switches to ERE (*extended regular expression*) mode and we can use many regex metachars without escaping. Note that [`grep` man - description](https://man.archlinux.org/man/core/grep/grep.1.en#DESCRIPTION) states: > the variant programs egrep and fgrep are the same as grep -E and grep -F, respectively. These variants are deprecated, but are provided for backward compatibility. We should better use `grep -E` instead of `egrep`. ``` ```{exercise} :label: search_find What is the difference between: 1. `find . -name lsi` and `find -name lsi`? 2. `find -iname lsi` and `find -name lsi`? ``` ````{solution} search_find :class: dropdown 1. From [`find` man-page - description](https://man.archlinux.org/man/find.1#DESCRIPTION): > ... If no starting-point is specified, `.` is assumed. There is no difference. Both commands will search recursively (will not only look in the current directory but also subdirectories) for the file with the name `lsi`. 2. `-iname` stands for case-insensitive name, so the pattern `lsi` will search additionally for all capital and small-letter combinations like `Lsi`, `lSI`, etc. ```` ```{exercise} :label: search_ls_and_shell_expanding_vs_find What is the difference between `ls *.jpg` and `find -iname '*.jpg'`? ``` ````{solution} search_ls_and_shell_expanding_vs_find :class: dropdown Practical difference: The former lists the jpg files in the current directory. The second one searches not only in the current directory, but also in subdirectories (also called recursive search). Moreover a subtle but important difference: Former command lists the files in the current folder after expanding the pattern `*.jpg` using Bash, then lists the expanded filenames, e.g., `ls f1.jpg f2.jpg g1.jpg`. In the latter no shell expansion is done, because the pattern is quoted (`'*.jpg'`) which protects from shell expansion. Then `'*.jpg'` is interpreted directly by `find`. Note that Bash leaves the pattern string `'*jpg'` untouched if there are no filenames that match this glob pattern. That is the reason why the following command fails with the given error message: ```console $ ls *non-existing-suffix ls: cannot access '*non-existing-suffix': No such file or directory ``` In this example we see that `ls` receives `*non-existing-suffix` untouched by Bash. More about shell expansion in chapter Shell Programming. ```` ```{exercise} :label: search_glob_vs_regex What is the difference between *globbing* and *regular expressions*? ``` ````{solution} search_glob_vs_regex :class: dropdown [Glob patterns or globbing](https://en.wikipedia.org/wiki/Glob_(programming)) are mainly used in describing filenames and file paths, but regular expressions can be used for any string. Glob patterns tend to be shorter than regular expression patterns, but less powerful in return. For example we cannot quantify a character using an interval in a glob pattern: ```bash mktemp XXX.txt mktemp XXXX.txt mktemp XXXXX.txt mktemp XXXXXX.txt # Now try to find the files which are three to five characters long ls ?{3,5}.txt # does not work find -iname '?{3,5}.txt' # does not work either # Regex helps: find -regextype egrep -regex './.{3,5}\.txt' ``` Note that we used the regex type `egrep` in the last command because there are different regular expression syntaxes, and `-regex` in `find` uses the basic [`findutils-default` regular expression syntax](https://www.gnu.org/software/findutils/manual/html_node/find_html/findutils_002ddefault-regular-expression-syntax.html) as default, which does not support a quantifier in an interval. If you are working with files on the shell, then glob patterns should be most of the time sufficient for your work. ```` ````{exercise} :label: search_find_globbing_problem ```bash # create files and folders with random filenames mktemp XXX.a mktemp XXXX.a mktemp XXX.c DIR=$(mktemp -d XXX) mktemp -p $DIR XXXXX.a mktemp -p $DIR XXXXX.b mktemp -p $DIR XXXXX.c # searching for .a files find -iname *.a # errors out find -iname '*.a' # works # searching for .b files find -iname *.b # works find -iname '*.b' # works # searching for .c files find -iname *.c # works but only shows a single file find -iname '*.c' # works and shows all .c files ``` 1. Why does the first `find` command errors out but the second one works? 1. Why do both the third and fourth `find` command work? 1. Why does the fifth command only shows a single file compared to the sixth command? ```` ```{solution} search_find_globbing_problem :class: dropdown 1. In the first case the shell expands `*.a`, but in the second case not. For a more detailed explanation look to [“paths must precede expression” error message in find(1) manpages](https://man.archlinux.org/man/find.1#“paths_must_precede_expression”_error_message). 1. There are no .b files in the current directory, so Bash leaves this pattern untouched without expanding, so there is no different between the third and fourth `find`. 1. Shell expands `*.c` to the only existing file in the directory, so the command searches only for this file. For an elaborate explanation look to [bash - find and globbing (and wildcards) - Stack Exchange](https://unix.stackexchange.com/a/429309) ```` ## Configure ```{exercise} :label: configure_check_bash_history You have a file called `diary.txt`. Somehow the file seems to be corrupted. You suspect that your sibling could have played with your shell and edited the file to annoy you. Do you have an idea how to find out what happened to `diary.txt`? ``` ````{solution} configure_check_bash_history :class: dropdown ```bash grep diary.txt ~/.bash_history ``` ```` ```{exercise} :label: configure_alias_cmd At first sight Unix command line may seem very clunky and inefficient. For example navigating to directories using `cd` and `cd ..` may take more time compared to navigating in a file explorer. Imagine that you have a file called `projects/inf1/notes.md` that you access very often. How could you access this file very efficiently using shell? ``` ````{solution} configure_alias_cmd :class: dropdown Creating an alias using `alias` in the shell configuration file, e.g., `.bash_profile` for Bash: ``` alias inf1notes='vim ~/projects/inf1/notes.md' ``` ```` ## Differentiate ```{exercise} :label: differentiate_diff_vs_sdiff What is the difference between `diff` and `sdiff`? ``` ```{solution} differentiate_diff_vs_sdiff :class: dropdown `diff` outputs only the lines which differ between files. `sdiff` shows a side-by-side comparison. Advice: also try `vimdiff` if you want to show differences and edit files at the same time. ``` ```{exercise} :label: differentiate_checksum_integrity_concept You have a 4GB sequencing data which you want to store in the cloud for the next five years. How can you ensure that the data is intact (not corrupted) when you download this data after five years? ``` ```{solution} differentiate_checksum_integrity_concept :class: dropdown Backups could help against data loss, but in case of corruption you may need at least three backups to find out which data is the corrupted one. A better solution is to create a checksum of the file and store the checksum along with the file. The probability for a corruption of a large file is much greater than its checksum. Using the checksum you can check if the sequence data is corrupted or not. In other words you check the *data integrity* ``` ```{exercise} :label: differentiate_checksum_cmds Find at least three checksum generation commands on your shell ``` ```{solution} differentiate_checksum_cmds :class: dropdown `shasum` for SHA1, `sha256sum` for SHA256, `md5sum` for MD5 ``` ## Pipes ```{exercise} :label: pipes_three_cmds Write a sequence of at least three commands which are piped together ``` ````{solution} pipes_three_cmds :class: dropdown ```bash curl -sH "Accept: text/plain" https://icanhazdadjoke.com/ | tr '[a-z]' '[A-Z]' | cowsay ``` ```` ## Make ```{exercise} :label: makefile_concepts_rule_and_dependencies - What does a *rule*, *dependency* mean in context of `make`? - What happens with a rule when we invoke `make`? ``` ````{solution} makefile_concepts_rule_and_dependencies :class: dropdown The following depicts a rule: ```makefile target: files-that-target-depends-on(dependencies) recipe-how-to-make-the-target ``` If we `make`, then the recipe will be processed, but only if - the target does not exist, or - one of the dependencies has changed, so the target should be remade. `make` reexecutes the recipe, if the modification date of one of the dependencies is newer than the target itself. ```` ```{exercise} :label: makefile_capitalize You have three files in your directory `seq1.txt`, `seq2.txt`, and `seq3.txt` which contain sequence of letters. You want to open these files using an ancient program which only supports reading capital letters. - write a makefile which create the capital letter versions of these files with the name `seq*-capitalized.txt`. You can use the `tr` command. - optional: Write a makefile which can capitalize any txt file and store it as `*-capitalized.txt`. Hint: use wildcards in makefile ``` ````{solution} makefile_capitalize :class: dropdown first version: ```makefile all: \ seq1-capitalized.txt \ seq2-capitalized.txt \ seq3-capitalized.txt seq1-capitalized.txt: seq1.txt echo $< | tr '[a-z]' '[A-Z]' > $@ seq2-capitalized.txt: seq2.txt echo $< | tr '[a-z]' '[A-Z]' > $@ seq3-capitalized.txt: seq3.txt echo $< | tr '[a-z]' '[A-Z]' > $@ ``` ```` ````{solution} makefile_capitalize :class: dropdown ```makefile SOURCE_FILES := $(wildcard *.txt) TARGET_FILES := $(SOURCE_FILES:.txt=-capitalized.txt) %-capitalized.txt: %.txt echo $< | tr '[a-z]' '[A-Z]' > $@ ``` ```` ## Summary and reflection ```{exercise} :label: learning_objectives Did you reach the following learning objectives for this week? Discuss with your partner. - Solve problems by consulting documentation - Use wildcards on the command line to work with multiple files and folders - Use grep, egrep, metacharacters, regular expressions, and find to search in file and directories - Customize Bash - Examine differences among files - Use pipes to deploy the output of one command as the input of another command ``` ## Last weeks review ```{exercise} :label: w2_last_weeks_review Look at least ten problems from last weeks. A short review of last weeks will reinforce what you have already learned. ```