Here is an example of how to use awk to extract a column from a text file.
The default field separator is a space, so this will work with some text files:
cat somefile.txt | awk '{print $2}'
But, if the fields are separated by a delimiter, such as tab, or comma, just specify the field separator by using "-F" before the print statement
cat somefile.txt | awk -F"\t" '{print $2}'
Unix and linux distributions generally come with sed and awk, among other things.
The problem.
You want to get a list of companies to research before investing in them, and paste only the stock symbols into a Yahoo finance portfolio. You run a stock screen on a site such as magicformulainvesting.com, or investors.com but if you try to paste the text into a spreadsheet it may show up as a single row, with no way to extract the column.Here is the process
1.) Run a Stock Screen
Get output similar to this:Company Name (in alphabetical order) Ticker Market Cap ($ Millions) Price From Most Recent Quarter Data
Almost Family Inc AFAM 201.49 09/24 06/30
Apollo Group Inc APOL 3,317.35 09/24 05/31
Argan Inc AGX 231.55 09/24 07/31
Block (H&R) Inc. HRB 4,635.98 09/24 07/31
Body Central Corp BODY 179.63 09/24 06/30
CACI International Inc. CACI 1,333.66 09/24 06/30
Capella Education Co CPLA 478.40 09/24 06/30
CF Industries Holdings Inc CF 13,579.97 09/24 06/30
Cisco Systems Inc CSCO 99,628.89 09/24 07/31
Dell Inc DELL 17,586.96 09/24 07/31
Deluxe Corp DLX 1,567.52 09/24 06/30
Dice Holdings Inc DHX 522.27 09/24 06/30
Dolby Laboratories Inc DLB 3,681.33 09/24 06/30
Express Inc EXPR 1,313.50 09/24 07/31
GameStop Corp. GME 2,755.19 09/24 07/31
Great Northern Iron Ore Properties GNI 129.44 09/24 06/30
GT Advanced Technologies Inc GTAT 703.72 09/24 06/30
Iconix Brand Group Inc ICON 1,269.35 09/24 06/30
Intersections Inc INTX 188.20 09/24 06/30
ITT Educational Services Inc ESI 839.48 09/24 06/30
j2 Global Inc JCOM 1,435.93 09/24 06/30
KLA-Tencor Corp KLAC 7,857.04 09/24 06/30
Kulicke and Soffa Industries Inc KLIC 763.82 09/24 06/30
Lender Processing Services Inc LPS 2,435.63 09/24 06/30
LML Payment Systems Inc LMLP 96.04 09/24 06/30
ManTech International Corp MANT 900.87 09/24 06/30
Maxygen Inc MAXY 73.00 09/24 06/30
Metabolix Inc MBLX 54.72 09/24 06/30
Microsoft Corp MSFT 257,967.20 09/24 06/30
Momenta Pharmaceuticals Inc MNTA 754.55 09/24 06/30
Nature's Sunshine Products Inc NATR 254.74 09/24 06/30
Nevsun Resources Ltd NSU 913.88 09/24 06/30
Nu Skin Enterprises Inc. NUS 2,263.52 09/24 06/30
PDL BioPharma Inc PDLI 1,086.98 09/24 06/30
PetMed Express Inc PETS 201.72 09/24 06/30
Pozen Inc POZN 194.06 09/24 06/30
Questcor Pharmaceuticals Inc. QCOR 1,138.54 09/24 06/30
Raytheon Co. RTN 19,209.52 09/24 06/30
Seagate Technology Plc STX 12,175.83 09/24 06/30
Spirit Airlines Inc SAVE 1,272.78 09/24 06/30
Strayer Education Inc STRA 792.35 09/24 06/30
TeleNav Inc TNAV 250.19 09/24 06/30
Ubiquiti Networks Inc UBNT 1,103.50 09/24 06/30
Unisys Corp UIS 939.02 09/24 06/30
USA Mobility Inc USMO 263.79 09/24 06/30
Utstarcom Holdings Corp UTSI 159.41 09/24 06/30
Vaalco Energy Inc EGY 505.50 09/24 06/30
Veeco Instruments Inc VECO 1,229.09 09/24 06/30
Vonage Holdings Corp VG 514.25 09/24 06/30
Warner Chilcott Plc WCRX 3,334.05 09/24 06/30
2.) Copy and paste to a text file
vi magic_20120925_50M_50.txt
3.) Extract the column
Use awk with a field separator of "\t" for tab, and print the second column:
cat magic_20120925_50M_50.txt | awk -F"\t" '{print $2}'
AFAM
APOL
AGX
HRB
BODY
CACI
CPLA
CF
CSCO
DELL
DLX
DHX
DLB
EXPR
GME
GNI
GTAT
ICON
INTX
ESI
JCOM
KLAC
KLIC
LPS
LMLP
MANT
MAXY
MBLX
MSFT
MNTA
NATR
NSU
NUS
PDLI
PETS
POZN
QCOR
RTN
STX
SAVE
STRA
TNAV
UBNT
UIS
USMO
UTSI
EGY
VECO
VG
WCRX
4.) Then just copy the text, and paste it into your portfolio.
That's all there is to it.
For more advanced processing, you may have to use printf, which follows the standard C format.
Disclaimer: We do not provide investing advice or recommend stocks. We may or may not have positions in some of these stocks. To understand what to do with the information from the stock screener, it is best to read the book listed on the site.
More awk Examples
We will use a one-liner using echo, so the incoming data and the output are both visible.Here it is in action on a comma separated list:
$ echo "Almost Family Inc, AFAM, 201.49, 09/24, 06/30" | awk -F, '{print $2}'
AFAM
Where have we seen comma separated lists before? Oh yeah, spreadsheets can be saved as .csv files.
Here we can print any number of fields:
$ echo "Almost Family Inc, AFAM, 201.49, 09/24, 06/30" | awk -F, '{print $2,$3}'
AFAM 201.49
Now suppose we want columns one and two, but in reverse order? We try this:
$ echo "Almost Family Inc, AFAM, 201.49, 09/24, 06/30" | awk -F, '{print $2 $1}'
AFAMAlmost Family Inc
Not quite what we want. Or should we say Almost..., no pun intended? What we really want is column two, followed by a space, and then column one. Separating the fields by a comma may give us what we want.
$ echo "Almost Family Inc,AFAM,201.49,09/24,06/30" | awk -F, '{print $2,$1}'
AFAM Almost Family Inc
Much better, but if we want exact control of the output, we could use printf instead of print.
$ echo "Almost Family Inc, AFAM, 201.49, 09/24, 06/30" | awk -F, '{printf("%s %s\n",$2,$1)}'
AFAM Almost Family Inc
Notice how printf follows the C or perl syntax, and requires a newline character "\n", unless you want all the output on the same line. We use "%s" for string, and anything between the quotes including spaces, and special characters such as tab \t, and newline \n will give us the format we want. Then after the close quote we give it a comma separated list of variables. The number of variables must match the format in the quotes. In the example above we want to print two variables with a space in between, and a newline at the end. The first part of printf is for the format, and the second is for the variables.