# Excel: Extract text with FIND and MID

Data analysis begins with usable data, and that means every piece organized nicely into its own field where we can count, sort and otherwise test it out.

What if you get a spreadsheet where the pieces of data are all packed in one field? Say, something like this (which I cobbled together from Major League Baseball data in honor of the Nationals’ first playoff appearance):

NAME: Sean Burnett POS: RP AGE: 30 WT: 200 BORN: Dunedin, FL SALARY: 2350000 NAME: Tyler Clippard POS: RP AGE: 27 WT: 200 BORN: Lexington, KY SALARY: 1650000 NAME: Ross Detwiler POS: SP AGE: 26 WT: 174 BORN: St. Louis, MO SALARY: 485000 NAME: Christian Garcia POS: RP AGE: 27 WT: 215 BORN: Miami, FL SALARY: N/A NAME: Gio Gonzalez POS: SP AGE: 27 WT: 205 BORN: Hialeah, FL SALARY: 3335000 NAME: Mike Gonzalez POS: RP AGE: 34 WT: 215 BORN: Robstown, TX SALARY: N/A NAME: Ryan Mattheus POS: RP AGE: 28 WT: 215 BORN: Sacramento, CA SALARY: 481000 NAME: Craig Stammen POS: RP AGE: 28 WT: 200 BORN: Coldwater, OH SALARY: 485000 NAME: Drew Storen POS: RP AGE: 25 WT: 180 BORN: Indianapolis, IN SALARY: 498750 NAME: Jordan Zimmermann POS: SP AGE: 26 WT: 218 BORN: Auburndale, WI SALARY: 2300000 |

Let’s say you want to extract the city of each player’s birth into a separate field. The varying length of each player’s name means the birth place isn’t always in the same position in the string, so a typical text-to-columns operation won’t work. So, how to do it?

The answer lies in two very handy Excel functions: FIND and MID.

FIND locates characters you specify and returns its numeric place in the string.

MID returns X characters from a string beginning at a location X you specify.

For example, we can locate the position where each city name begins by using FIND to locate the string “BORN:” in each cell. The city name itself always starts six characters after the position of that string, so we add six to the result:

=FIND("BORN:",A2)+6 |

In the first row above, the functions returns 50. In the second row, 52. We’ll feed that value to the MID function as the starting point for our extraction.

MID takes three arguments: Text or cell, position to start, number of characters to return. So, we use the above FIND function as the second argument and, for now, extract 10 characters:

=MID(A2,FIND("BORN:",A2)+6,10) |

That gets us part of the way there. We’re starting in the right spot, but 10 characters isn’t always the length of the city and state, so it leads to choppy results:

Dunedin, F Lexington, St. Louis, Miami, FL Hialeah, F Robstown, Sacramento Coldwater, Indianapol Auburndale |

What we need to do is tell MID the exact number of characters to return each time even though the length of the city varies. We can figure this out using FIND again.

The city name is always followed by the word “SALARY”. So, if we search for the position of that word and subtract the position of “BORN,” we’ll get the length of what’s between the two. The ultimate formula looks like spaghetti but works just fine:

=MID(A2,FIND("BORN:",A2)+6,(FIND("SALARY",A2)-(FIND("BORN:",A2)+6))) |

Used on the example text, it returns:

Dunedin, FL Lexington, KY St. Louis, MO Miami, FL Hialeah, FL Robstown, TX Sacramento, CA Coldwater, OH Indianapolis, IN Auburndale, WI |

That’s it. Fairly handy and further proof that Excel is a versatile part of the data journalists’ tool kit.

When I have this problem I always use code to solve it. I prefer to parse every row with PHP or Python and then clean the data using the functions available on the language. I supposed everyone use the tool they feel better using it.

Agreed — if you’re a coder, that’s a good way to go. I like using regular expressions, for example, or string functions in Python.

For those people who don’t code but use Excel quite a bit, this is an approach they can take without too much pain.

This was super helpful! Thank you!

This was great..thank you!

Excellent! Now I can move on to the truly horrific part of my project.

Lyla,

I hope it’s not that bad!

very helpful

thank you

Anthony,

Thanks man! Perfect to what I need.

salomsalom,

May I ask why coders and programmers with years and decades of experience feel the need to hijack every one else’s blog posts? I am new to coding and programming. I searched for (and found) this post which does exactly what I need it to.

If you want me to know how you do it why not write your own blog post so beginners like me have a choice of ways to accomplish this? Otherwise your comment adds nothing regardless of how much you know.

Troy

Sean Burnett,RP,30,200,Dunedin, FL, 2350000.

What if the string is as shown above. How do we extract “Dunedin” using excel text functions??

Thanks in advance

Samuel,

I don’t have a ready answer, but whatever the answer is it’s going to be clunky. See this discussion:

https://groups.google.com/forum/#!topic/microsoft.public.excel/FAteST1C2Ks

Nevertheless, if that’s the string you’re given, then it’s simply a comma-delimited list that you can separate into columns using Excel’s text-to-columns feature. I’d go that route before trying to compete in Excel gymnastics.

This has been really useful, thank you!

this is exactly what i needed! thanks! excel for the win

Very instructive, solved my issue.

Thanks!!

This has set me on the right path to solving my problem. I am taking specific pieces of data from a huge chunk of html pulled into the document.

I have a unique code on each piece of data to start the search at but it appears that if I used a FIND term which is not unique, like “The” or “0” which appear in the data multiple times… it just returns #VALUE!

However if I continue typing after “The” until it becomes a unique string it works… for that single piece of data alone. It is very frustrating.

Jeff,

Extracting data from big pieces of HTML might be a task better served by using Regular Expressions inside some type of scripting language, such as Python.

Jeff,

You need Python, and Beautiful Soup, and probably some love from the StackExchange community!

Very helpful information! Made task easy to accomplish!

Thank you very much!

Regards,

Maria P.

Nicely explained. Since im just learning excel, here a little question: Could i use LEFT/RIGHT function to do this? From what i understand, these functions require one less argument (i hope this is right, i mostly have my knowledge from this beginners site: http://www.excel-aid.com/the-excel-left-and-right-functions-2.html, i hope the information given there is correct). So i could maybe use FIND with one of these functions, then return the city name plus some other data that is still left (either to the left or the right, depending on the function you use). Then i have something like “NAME: Sean Burnett POS: RP AGE: 30 WT: 200 BORN: Dunedin, FL” which i could use the RIGHT function on in cunjunction with FIND, and that would give me the city name. Of course, this is impractical, and i would have to temporarily store unnecessary data (probably in a hidden column), but i was curious if i get the concept. Sorry for my bad english btw.

Thank You Very Much for the detailed explanation of both the function. It really helped me.

Mary,

It’s possible, though I haven’t tried with this example.

Canada (CA) – Quebec – Bromont – 2 De L’Aeroport Blvd || Bromont PQ CA

Hi Anthony,

I have an above example where I want to extract the Country, State and City from in three seperate columns.

In the above example the First one is Country (Canada (CA)) and Second one after first “-” (Quebec)is State and the third one after second “-” (Bromont) is City

Can you please help me to make a formula to get the result for all three.

Purshottam,

You should be able to do it if you simply apply the techniques in the post. You’ll just have to write three separate formulas — one in each of three cells — to pull in the three values.

Hi Anthony,

How to extract numeric value from below text in excel :

CPG3_ROM_NETALIA_14378

Please help me to make a formula to get the result

please help me to calculate two different times and dates which are in one cell with some other data…how can isolate or seperate them from the others data.i believe that the calculation will be bettrt and easier when the calculation data are isolated.what do you mean,do anyone knows?

hello. what is the program(formula) for finding the third word in a sentence?

thanks Sir

its really very useful and wonderful

thanks this method worked a treat for me – although i’m dealing with messier data so have some more work to do and might need to use a bit of the old manual labor!

Alex,

Understood. I’ve resorted to the brute-force method many a time with data!

Hi Anthony.

With mid and find can you do an “OR” option.

Example- say the weight was either in kgs or lbs and I

want it to find WTKG or WTLB in the in the middle of the string.

NAME: Sean Burnett WTKG: 200 KGBORN: Dunedin, FL

NAME: Tyler Clippard WTLB: 200 LBS BORN: Lexington, KY

NAME: Ross Detwiler WTKG: 174 KG BORN: St. Louis, MO NAME: NAME: Christian Garcia WTLB: 215 BORN: Miami, FL

Jennifer,

Yes, if you nest them. See if this helps:

https://groups.google.com/forum/#!topic/excel-vba-help/C1bavzNYwl4

Anthony-

Thank you. Very helpful.

A4 Residual Hills F1

A5 Aluvial Plain #VALUE!

A6 Hills Residual F1

A7 harvest Hills F1

A8 Plain Water #VALUE!

A9 Structural Hills F1

A10 Land use #VALUE!

A11 Residual Plain #VALUE!

Please solve my problem =IF(FIND(“Hills”,A4),”F1″,IF(FIND(“Plain”,A4),”F2″,IF(FIND(“Land”,A4),”F3″””)))

in this wherever Hills are the that should be f1.

If you want to know more about “Searching for text in Excel”, check this link ……..

http://www.exceltip.com/excel-editing/searching-for-text-in-microsoft-excel.html

Hi,

Thanks for the help on FIND function!

I was glad to see the Nationals make it to the playoffs this year!

I went to the same college that Jordan Zimmerman went to, UWSP, and the small town he is from is about 20 minutes from Stevens

Point! I actually dated a gal from Auberndale! I was very proud

to see him get a no-hitter on the last day of the regular season!

Made Point prouder I am sure!

Service Request/Sales Order Handling/201295/4601491 QUART_POBLET_B__CARCER 201407018338-S1 SWAP/Ordering/Normal

Hi I want to extract 201407018338-S1 from the above string

It’s really helpful…

exactly what i want

Really good information for every student.

Thank you very much!!!

Finally somebody took the time to make an example easy to understand.

Hey,

I have data in following format

3):XYZ #2(3):224 #3(7):MIAMI #4(3):IND #5(15):14301-8909493-3 #6(1):8

I want extract XYZ in column 1 and 224 in Column 2 and 14301-8909493-3 in column 3 and 8 in column 4

Please help me on this.

Hey Anthony,

You have shared really useful tips about text extract in excel. It is very very useful for me as I have just started to learn excel.

Thanks

Aarti.

Hey Aarti,

Its useful for every one not only for you. Thanks Mr. Anthony to share this command. Please share more useful commands about Excel which helps to all blog readers.

Regards

Rajesh

Hi, I want to extract the Town and the State from the following, I’m trying to use a “,” as a separator but it doesn’t work. Any ideas how to do it? Thanks

355 Country Club Drive, Downingtown, PA 19335

“=MID(C4,FIND(“,”,C4)+1,11)

Mary,

There are probably a couple of ways to do this, and this is where Excel formulas start to get complicated. But still:

To get the town, try:

To get the state, use:

It’s worth taking some time, especially with the first formula, to understand the logic. We’re basically looking for the first comma, and then calculating the number of characters to extract using a nested find. Crazy, I know.

Thank you for taking the time to answer this! Great source!

Elaine

Sir – within one cell I have up to four pieces of information. Which looks like

Address of Installed: 23.Bridgeway.Ave

Correct Address : 24.Bridgeway.Ave

Address of Installed: 25.Bridgeway.Ave

Correct Address : 26.Bridgeway.Ave

Sometimes the cell has one of these sometimes it has 5. How do I extract Installed addresses into one cell, and the correct addresses into another cell without using VBA?

Thank you,

Joe

I want to extract the check number from this string of information (This is generic info that shows you what I’m looking at):

ORIG CO NAME:BCBS ILLINOIS ORIG ID:99999999 DESC DATE: CO ENTRY DESCR:HCCLAIMPMTSEC:CCD TRACE#:0101010101010 EED:199999 IND ID:C15272E54732230 IND NAME:CP20150929E547322300-1 TRN*1*C162547E85472617*1357954792*CP 20151104E224821515-1962644807\ PAYABLE TRN: 9574575158FB

The information that I need is after the TRN*1* until the next *. The C162547E85472617. The check numbers are not always the same length. All of the formulas that I have tried are not giving me what I’m looking for. Any suggestions?

Lots of help! Thanks!