Regex is the last occurrence of all characters between two strings

advertisements

I'm trying to extract the torrent name from torrent files. Without looking to deep in how torrent files are structured I noticed that I only need to match last occurrence of all characters between two strings which in my case are : * 12:piece lengthi.

Here is the beginning of Arch Linux iso torrent file:

d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi

I need to extract archlinux-2015.07.01-dual.iso witch is in between : and 12:piece lengthi. I checked this pattern with other torrent files in my case it will work! I can't figure out how to combine the regex (?<=:)(.*)(?=12:piece lengthi) and :(?:.(?!:))+$ if they are even correct at all.

I'm trying to make a bash script with grep OR awk OR sed or something that could with a linux command.

Final perfectly working solution (thoroughly tested): This works with all types of non-standard characters for example Cyrillic.

torrent_title=$(tr -d "\n" < "$filename" | iconv -f utf-8 -t utf-8 -c | sed 's/.*:\(.*\)12:piece lengthi.*/\1/')

Update:All suggestion work but Torrent files are binary files for example I tried grep --text and strings file | piped to grep or sed but random strings from the binary file are messing up the output.

Update 2 and SOLVED IT: so the final command is this

head -1 file.torrent| strings | tr -d "\n\r" | iconv -f utf-8 -t utf-8 -c| sed 's/.*:\(.*\)12:piece lengthi.*/\1/

I figured that the info is only in the first line of the file. In my original example post I forgot to copy a couple of more strings at the end

 d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi524288e6:pieces25840:

witch are part of the first line so for that I needed to slightly change hek2mgl sed answer.

Update 3 The right way to do it is to use a parser, I learned it the hard way.


I would use sed for that, like this:

sed 's/.*:\(.*\)12:piece lengthi/\1/' input.torrent