How To Use Sas Count Words Frequency In An Eassy
A jargon-free, piece of cake-to-learn SAS base of operations class that is tailor-made for students with no prior knowledge of SAS.
- 90+ Lessons
- 150+ Practice Exercises
- v Coding Projects
- 1000+ Satisfied Students
The SCAN function in SAS
The Browse office in SAS provides a uncomplicated and convenient way to parse out words from graphic symbol strings. The SCAN function tin be used to select individual words from text or variables which contain text and so store those words in new variables. This article provides a number of unlike examples and uses for the Scan function, including some of the near commonly used options to help you become the nearly from this function.
In particular, this article will cover:
- Selecting the nth word in a character string.
- Selecting the last discussion in a character string.
- Handling unlike word delimiters.
- Using Browse with Practice LOOPs to parse long character strings.
Software
Before nosotros continue, brand sure you take access to SAS Studio. It'due south gratuitous!
Data Sets
In this article, the CARS and BASEBALL datasets from the SASHELP library will be used to illustrate a number of dissimilar uses for the Scan function.
Selecting the Nth Word in a Grapheme String
Let'southward start with an example to demonstrate how to find the first discussion in a character string and then store the result in a separate variable. The most basic use of the Scan function requires simply two arguments. After specifying SCAN and an open parenthesis, the first part of the function is to specify the graphic symbol string that you are planning to select words from. This can be either a variable or an explicit character string. In this first case we are using the explicit character cord, "I am an Expert SAS Programmer".
The second statement is the count, which is the numeric position of the discussion inside the character cord that you want to search. So, to return the first give-and-take, we tin can explicitly specify a number 1. This could also be replaced with a variable containing the desired count value.
The SAS syntax is equally follows:
data example;
first_word = scan( "I am a SAS Programming Expert",1 );
run;
As yous can see in the output below, the new variable FIRST_WORD has been created and its value is the beginning discussion, "I" from the grapheme string, "I am a SAS Programming Expert":
data instance;
text = "I am a SAS Programming Expert";
first_word = browse( text ,one);
run;
The output dataset now contains both the original TEXT variable and the newly created FIRST_WORD variable which contains the get-go word from the TEXT variable, "I":
To select additional words, such as the 2d, third and quaternary word, nosotros tin can modify the count argument of the Scan function. To select the second word from a string, simply prepare the count statement to 2. For the tertiary word, set the count equal to 3, and and so on.
In the following example, we create 3 additional variables, SECOND_WORD, THIRD_WORD and FOURTH_WORD, which select the second, third and fourth word respectively from the TEXT variable:
data example;
text = "I am a SAS programming expert";
first_word = scan(text,1);
second_word = browse(text,2);
third_word = scan(text,3);
fourth_word = scan(text,4);
run;
Do you accept a hard time learning SAS?
Accept our Applied SAS Training Course for Accented Beginners and learn how to write your first SAS plan!
Selecting the Last Word in a Grapheme String
Using the Scan part, you also the accept the ability to read from right to left, effectively allowing you lot to capture the last word in a character string.
To tell SAS to read from right to left, nosotros merely modify the count statement to be a negative number to indicate the word number that nosotros would like to read, starting from the right and moving left. So, to select the give-and-take "Expert" in our TEXT variable, nosotros can utilize a count of -1, equally shown here:
data example;
text = "I am a SAS Programming Proficient";
last_word =browse(text,-1);
run;
As you can meet in the output data, we now have a new variable, LAST_WORD, which contains the last word of the text cord, "Expert":
Alternatively, instead of using a negative count you can use the "b" modifier bachelor with the Browse function. By specifying a "b" argument with the Scan office, you can tell SAS to read from right to left instead of the default left to right. Notation when using a modifier with the Browse part, the modifier needs to be the fourth argument, so you must always explicitly state the third argument (the delimiter) together with the fourth modifier argument so that SAS won't treat your modifier every bit the delimiter!
Here is the syntax with the "b" modifier included:
information example;
text = "I am a SAS Programming Expert";
last_word = scan(text,ane," ", "b" );
run;
Become a Certified SAS Specialist
Become access to 2 SAS base of operations certification prep courses and 150+ practice exercises
Treatment Different Word Delimiters
And so far, the examples we take looked at have merely had blanks or spaces equally the delimiter between words. What happens when in that location is a unlike delimiter, such as a comma?
In the example below, the code has been modified then that the words in the character string of the text variable are delimited with a comma instead of spaces. Hither, we are trying to select the fourth discussion:
information example;
text ="I,am,a,SAS,Programming,Expert";
fourth_word = scan(text,iv);
run;
Every bit yous can meet from the output data shown below, the SCAN part however works fifty-fifty with commas as the delimiter:
The reason this still works is because by default, with any calculator using ASCII characters, the SCAN part will automatically check for any of the following characters equally delimiters:
blank ! $ % & ( ) * + , - . / ; < ^ :
When your data contains a delimiter between words not found in the default listing, you lot tin utilize thecharlist argument (the tertiary statement) with the Browse function to specify your own custom delimiter.
For example, if the words in your character string are delimited with a plus sign (+), y'all simply need to enclose the plus sign in quotations as the third argument to the browse role.
The syntax below demonstrates how to select the 5th word from a plus sign delimited character string:
data instance;
text ="I+am+a+SAS+Programming+Good";
fifth_word = browse(text,5,"+");
run;
In the output data below, y'all can see the fifth word in the cord has been successfully selected:
In the SASHELP.BASEBALL dataset, the NAME variable contains a list of offset, last and heart names. The construction is equally follows: <last name>,<firstname><bare><middlename>. You would like to create two new variables: LASTNAME and GIVEN_NAMES.
Since commas and spaces are default delimiters, nosotros start without specifying our own delimiter:
data baseball;
set sashelp.baseball;
lastname = browse(proper noun,1);
given_names = scan(name,ii);
continue name given_names lastname;
run;
At get-go glance information technology may announced as though the results are correct, but after further inspection you will detect that some names were not parsed properly. For example, Andy Van Slyke'due south given name should take been "Andy" and non "Slyke" equally shown beneath:
data baseball;
set sashelp.baseball game;
lastname = scan(name,one, "," );
given_names = scan(proper noun,2, "," );
keep name given_names lastname;
run;
Using Browse with Do LOOPS to Parse Long Grapheme Strings
When combined with a simple Exercise LOOP and a SAS , the SCAN function makes it piece of cake to parse out each word from a character cord into carve up variables.
For example, in the SASHELP.CARS dataset, you would like to parse out each discussion from the MODEL variable into 5 dissever variables. Since the words of the total model name are delimited by spaces, no modification is needed to the delimiter argument and the default tin can be used.
The code below uses a DO LOOP to scan the MODEL variable then create the variables MODELNAME1 to MODELNAME5:
data cars_parse;
set sashelp.cars;
array modelname[5] $15 model1-model5;
do i = 1 to 5;
modelname[i] = browse(model,i,", ");
stop;
proceed model model1-model5;
run;
As you can come across in the output data shown partially below, nosotros now have 5 new MODEL variables, with one word per variable:
Master SAS in 30 Days
Get latest articles from SASCrunch
SAS Base of operations Certification Examination Prep Course
Two Certificate Prep Courses and 300+ Practice Exercises
Source: https://sascrunch.com/scan-function/
Posted by: lewisplar1972.blogspot.com
Inline Feedbacks
View all comments
browse at a try two names is non possible