Excel: Extract text with FIND and MID

Data analysis begins with usable data, and that means every piece organized nicely into its own field where we can count, sort and otherwise test it out.

What if you get a spreadsheet where the pieces of data are all packed in one field? Say, something like this (which I cobbled together from Major League Baseball data in honor of the Nationals’ first playoff appearance):

NAME: Sean Burnett POS: RP AGE: 30 WT: 200 BORN: Dunedin, FL SALARY: 2350000
NAME: Tyler Clippard POS: RP AGE: 27 WT: 200 BORN: Lexington, KY SALARY: 1650000
NAME: Ross Detwiler POS: SP AGE: 26 WT: 174 BORN: St. Louis, MO SALARY: 485000
NAME: Christian Garcia POS: RP AGE: 27 WT: 215 BORN: Miami, FL SALARY: N/A
NAME: Gio Gonzalez POS: SP AGE: 27 WT: 205 BORN: Hialeah, FL SALARY: 3335000
NAME: Mike Gonzalez POS: RP AGE: 34 WT: 215 BORN: Robstown, TX SALARY: N/A
NAME: Ryan Mattheus POS: RP AGE: 28 WT: 215 BORN: Sacramento, CA SALARY: 481000
NAME: Craig Stammen POS: RP AGE: 28 WT: 200 BORN: Coldwater, OH SALARY: 485000
NAME: Drew Storen POS: RP AGE: 25 WT: 180 BORN: Indianapolis, IN SALARY: 498750
NAME: Jordan Zimmermann POS: SP AGE: 26 WT: 218 BORN: Auburndale, WI SALARY: 2300000

Let’s say you want to extract the city of each player’s birth into a separate field. The varying length of each player’s name means the birth place isn’t always in the same position in the string, so a typical text-to-columns operation won’t work. So, how to do it?

The answer lies in two very handy Excel functions: FIND and MID.

FIND locates characters you specify and returns its numeric place in the string.

MID returns X characters from a string beginning at a location X you specify.

For example, we can locate the position where each city name begins by using FIND to locate the string “BORN:” in each cell. The city name itself always starts six characters after the position of that string, so we add six to the result:

=FIND("BORN:",A2)+6

In the first row above, the functions returns 50. In the second row, 52. We’ll feed that value to the MID function as the starting point for our extraction.

MID takes three arguments: Text or cell, position to start, number of characters to return. So, we use the above FIND function as the second argument and, for now, extract 10 characters:

=MID(A2,FIND("BORN:",A2)+6,10)

That gets us part of the way there. We’re starting in the right spot, but 10 characters isn’t always the length of the city and state, so it leads to choppy results:

Dunedin, F
Lexington,
St. Louis,
Miami, FL 
Hialeah, F
Robstown, 
Sacramento
Coldwater,
Indianapol
Auburndale

What we need to do is tell MID the exact number of characters to return each time even though the length of the city varies. We can figure this out using FIND again.

The city name is always followed by the word “SALARY”. So, if we search for the position of that word and subtract the position of “BORN,” we’ll get the length of what’s between the two. The ultimate formula looks like spaghetti but works just fine:

=MID(A2,FIND("BORN:",A2)+6,(FIND("SALARY",A2)-(FIND("BORN:",A2)+6)))

Used on the example text, it returns:

Dunedin, FL 
Lexington, KY 
St. Louis, MO 
Miami, FL 
Hialeah, FL 
Robstown, TX 
Sacramento, CA 
Coldwater, OH 
Indianapolis, IN 
Auburndale, WI

That’s it. Fairly handy and further proof that Excel is a versatile part of the data journalists’ tool kit.

24 Responses to “Excel: Extract text with FIND and MID”

  1. salomsalom says:

    When I have this problem I always use code to solve it. I prefer to parse every row with PHP or Python and then clean the data using the functions available on the language. I supposed everyone use the tool they feel better using it.

  2. Anthony says:

    Agreed — if you’re a coder, that’s a good way to go. I like using regular expressions, for example, or string functions in Python.

    For those people who don’t code but use Excel quite a bit, this is an approach they can take without too much pain.

  3. sbakke says:

    This was super helpful! Thank you!

  4. Michael says:

    This was great..thank you!

  5. Lyla says:

    Excellent! Now I can move on to the truly horrific part of my project.

  6. Anthony says:

    Lyla,

    I hope it’s not that bad!

  7. cam says:

    very helpful
    thank you

  8. troy says:

    Anthony,

    Thanks man! Perfect to what I need.

    salomsalom,

    May I ask why coders and programmers with years and decades of experience feel the need to hijack every one else’s blog posts? I am new to coding and programming. I searched for (and found) this post which does exactly what I need it to.

    If you want me to know how you do it why not write your own blog post so beginners like me have a choice of ways to accomplish this? Otherwise your comment adds nothing regardless of how much you know.

    Troy

  9. Samuel says:

    Sean Burnett,RP,30,200,Dunedin, FL, 2350000.

    What if the string is as shown above. How do we extract “Dunedin” using excel text functions??

    Thanks in advance

  10. Anthony says:

    Samuel,

    I don’t have a ready answer, but whatever the answer is it’s going to be clunky. See this discussion:

    https://groups.google.com/forum/#!topic/microsoft.public.excel/FAteST1C2Ks

    Nevertheless, if that’s the string you’re given, then it’s simply a comma-delimited list that you can separate into columns using Excel’s text-to-columns feature. I’d go that route before trying to compete in Excel gymnastics.

  11. Michael says:

    This has been really useful, thank you!

  12. anne says:

    this is exactly what i needed! thanks! excel for the win :)

  13. Wim Vaasen says:

    Very instructive, solved my issue.

    Thanks!!

  14. Jeff says:

    This has set me on the right path to solving my problem. I am taking specific pieces of data from a huge chunk of html pulled into the document.

    I have a unique code on each piece of data to start the search at but it appears that if I used a FIND term which is not unique, like “The” or “0″ which appear in the data multiple times… it just returns #VALUE!

    However if I continue typing after “The” until it becomes a unique string it works… for that single piece of data alone. It is very frustrating.

  15. Anthony says:

    Jeff,

    Extracting data from big pieces of HTML might be a task better served by using Regular Expressions inside some type of scripting language, such as Python.

  16. Amanda says:

    Jeff,

    You need Python, and Beautiful Soup, and probably some love from the StackExchange community!

  17. Maria says:

    Very helpful information! Made task easy to accomplish!

    Thank you very much!

    Regards,
    Maria P.

  18. Mary says:

    Nicely explained. Since im just learning excel, here a little question: Could i use LEFT/RIGHT function to do this? From what i understand, these functions require one less argument (i hope this is right, i mostly have my knowledge from this beginners site: http://www.excel-aid.com/the-excel-left-and-right-functions-2.html, i hope the information given there is correct). So i could maybe use FIND with one of these functions, then return the city name plus some other data that is still left (either to the left or the right, depending on the function you use). Then i have something like “NAME: Sean Burnett POS: RP AGE: 30 WT: 200 BORN: Dunedin, FL” which i could use the RIGHT function on in cunjunction with FIND, and that would give me the city name. Of course, this is impractical, and i would have to temporarily store unnecessary data (probably in a hidden column), but i was curious if i get the concept. Sorry for my bad english btw. :)

  19. Sujo says:

    Thank You Very Much for the detailed explanation of both the function. It really helped me.

  20. Anthony says:

    Mary,

    It’s possible, though I haven’t tried with this example.

  21. Purshottam says:

    Canada (CA) – Quebec – Bromont – 2 De L’Aeroport Blvd || Bromont PQ CA

    Hi Anthony,

    I have an above example where I want to extract the Country, State and City from in three seperate columns.
    In the above example the First one is Country (Canada (CA)) and Second one after first “-” (Quebec)is State and the third one after second “-” (Bromont) is City
    Can you please help me to make a formula to get the result for all three.

  22. Anthony says:

    Purshottam,

    You should be able to do it if you simply apply the techniques in the post. You’ll just have to write three separate formulas — one in each of three cells — to pull in the three values.

  23. Kapil says:

    Hi Anthony,

    How to extract numeric value from below text in excel :
    CPG3_ROM_NETALIA_14378

    Please help me to make a formula to get the result

  24. lourdiman says:

    please help me to calculate two different times and dates which are in one cell with some other data…how can isolate or seperate them from the others data.i believe that the calculation will be bettrt and easier when the calculation data are isolated.what do you mean,do anyone knows?

Leave a Reply