Friday, November 05, 2010

Make grep stop after the first match - The macosxhints Forums

Make grep stop after the first match - The macosxhints Forums:

Make grep stop after the first match

Question: I'm trying to use grep to find all the href links in my asp files (except in-page links that start with #).
I want it to show me the href="blahblah" part only - nothing before href or after href's closing quote.
Here's what I have so far. This:
grep -i -r -o 'href="[^#].*"' rentals.asp
gives me this (green=good, blue=bad):
href="<%= webaddr %>/rentalStyle.css" type="text/css"
href="<%= webaddr %>/rentals_s.asp"
href="<%= webaddr %>/rentals_o.asp">rentals
href="<%= webaddr %>/search.asp">Search
Solution:
"I think the question has to do with the 'greediness' of grep when multiple matches occur in the same line (in this case, the double-quote character). To match everything up to the first closing quotes, try:
Code:

grep -i -r -o 'href='[^#][^']*'' rentals.asp

----------------
Similar question to get URL links in html source code like:

< href="ftp://encodeftp.cse.ucsc.edu:21/users/akundaje/rawdata/peaks/jul2010/idr0_02/narrowPeak_blacklistFiltered/wgEncodeBroadHistoneGm12878CtcfStdAlnRep0.bam_VS_wgEncodeBroadHistoneGm12878ControlStdAlnRep0.bam.regionPeak.gz">wgEncodeBroadHistoneGm12878CtcfStdAlnRep0.bam_VS_wgEncodeBroadHistoneGm12878ControlStdAlnRep0.bam.regionPeak.gz< / a >

hint code from commandlinefu :
egrep -o "ftp://[^[:space:]]*.gz"
will get
ftp://encodeftp.cse.ucsc.edu:21/users/akundaje/rawdata/peaks/jul2010/idr0_02/narrowPeak_blacklistFiltered/wgEncodeSydhTfbsK562bTr4UcdAlnRep0.bam_VS_wgEncodeSydhTfbsK562bInputUcdAlnRep1.bam.regionPeak.gz">wgEncodeSydhTfbsK562bTr4UcdAlnRep0.bam_VS_wgEncodeSydhTfbsK562bInputUcdAlnRep1.bam.regionPeak.gz

So, it should be corrected into :
egrep -o "ftp://[^[:space:]^\"]*.gz"

Smart, ha!
- Sent using Google Toolbar"

No comments:

Post a Comment