[arch-general] grep

Jeanette C. julien at mail.upb.de
Mon Oct 14 20:23:49 UTC 2019


Hey hey Silvio,
hm this looks more like a challenge for a whole script. I can script, but I'm 
not always the most efficient.

If your .md files always look the same, i.e. there is always the exact line 
"date: yyyy-mm-dd" and you can be sure that one script folder will have all 
articles, because they are originally written in that language, I'd have an 
idea. Say your articles are all created in German:
grep -e "date: 2019-10-1" content/de/blog/*.md >orig.list
LINES=`wc -l orig.list | awk '{ print $2 }'` # get number of entries
# do the same for the ohter folders:
grep -e "date: 2019-10-1" content/en/blog/*.md >en.list
grep -e "date: 2019-10-1" content/fr/blog/*.md >fr.list
# complete for other folders

# now check
CURLINE=1
while [[ $CURLINE -le $LINES ]]; do
   CURDATE=`sed -n ${CURLINE}p orig.list # get an article date
   for FILE in en.list fr.list ru.list and_so_on; do
     COUNT=`grep -c -e "${CURDATE}" ${FILE}
 	 if [[ $COUNT -eq 0 ]]; then # not found in translation
 	   echo ${CURDATE} missing in ${FILE}" >missing.files;
     fi;
   done
   let CURLINE=CURLINE+1; # go to next original date
done
rm *.list # remove your temporary files

You can beautify the echo line, at the moment this would write something
like:
date: 2019-10-12 ru.list
But it should do. I'm sure a more elegant script can be written.

Best wishes,

Jeanette

-- 
  * Website: http://juliencoder.de - for summer is a state of sound
  * Youtube: https://www.youtube.com/channel/UCMS4rfGrTwz8W7jhC1Jnv7g
  * SoundCloud: https://soundcloud.com/jeanette_c
  * Twitter: https://twitter.com/jeanette_c_s
  * Audiobombs: https://www.audiobombs.com/users/jeanette_c
  * GitHub: https://github.com/jeanette-c

I thought love was just a tingling of the skin <3
(Britney Spears)


More information about the arch-general mailing list