Make grep stop after the first match
Question: I'm trying to use grep to find all the href links in my asp files (except in-page links that start with #).
I want it to show me the href="blahblah" part only - nothing before href or after href's closing quote.
Here's what I have so far. This:
grep -i -r -o 'href="[^#].*"' rentals.aspgives me this (green=good, blue=bad):
href="<%= webaddr %>/rentalStyle.css" type="text/css"Solution:
href="<%= webaddr %>/rentals_s.asp"
href="<%= webaddr %>/rentals_o.asp">rentals
href="<%= webaddr %>/search.asp">Search
"I think the question has to do with the 'greediness' of grep when multiple matches occur in the same line (in this case, the double-quote character). To match everything up to the first closing quotes, try:
Code:
grep -i -r -o 'href='[^#][^']*'' rentals.asp
----------------
Similar question to get URL links in html source code like:
< href="ftp://encodeftp.cse.ucsc.edu:21/users/akundaje/rawdata/peaks/jul2010/idr0_02/narrowPeak_blacklistFiltered/wgEncodeBroadHistoneGm12878CtcfStdAlnRep0.bam_VS_wgEncodeBroadHistoneGm12878ControlStdAlnRep0.bam.regionPeak.gz">wgEncodeBroadHistoneGm12878CtcfStdAlnRep0.bam_VS_wgEncodeBroadHistoneGm12878ControlStdAlnRep0.bam.regionPeak.gz< / a >
hint code from commandlinefu :
egrep -o "ftp://[^[:space:]]*.gz"
will get
ftp://encodeftp.cse.ucsc.edu:21/users/akundaje/rawdata/peaks/jul2010/idr0_02/narrowPeak_blacklistFiltered/wgEncodeSydhTfbsK562bTr4UcdAlnRep0.bam_VS_wgEncodeSydhTfbsK562bInputUcdAlnRep1.bam.regionPeak.gz">wgEncodeSydhTfbsK562bTr4UcdAlnRep0.bam_VS_wgEncodeSydhTfbsK562bInputUcdAlnRep1.bam.regionPeak.gz
So, it should be corrected into :
egrep -o "ftp://[^[:space:]^\"]*.gz"
Smart, ha!
- Sent using Google Toolbar"
No comments:
Post a Comment