Some SAS useful Tips

Number of Obs in a Dataset
Using PROC SUMMARY for Descriptive and Frequency Statistics
Last Date of Month
SAS to CSV
Does a Dataset Exist
Reordering Variables
Additional Codes to an Existing Format
Concatenating Datasets
Deleting SAS Datasets based on a Date
Reading Variable Length Record Files
Changing the Height of a HEADLINE in PROC REPORT when using ODS RTF
How Quantiles are Calculated
Variance Calculation Differences
Getting the Comment text of an Excel Cell
Getting the Background Color of a Cell in Excel
A DOS .BAT File for Backing Up a File with Date A Stamp
Calculating the Distance Between Two Points on the Earth's Surface

Your Questions and Our Answers

1) Hi I have read in one book regarding demographic table i.e. summary stats were calculated for AGE variable and n% was calculated for gender and race variables.

Why it has been calculated like that.P- value was calculated for the above 3 variables for-byAGE-NPAR1WAY(wilcoxon signed scores)gender-Chi-square testRace-- Fishers Exact TestY it has been calculated by 3 diff methods.
Plz explain the concept behind demog? By Sai......

Ans)P-value can be calculated by many different SAS Procedures. Here in this case, there are three different SAS procedures that contributed to P-Values.Three types of variables are regularly treated in statistical analysis for clinical trials: categorical, quantitative, and survival. Categorical variables are characterized as having a limited number of discrete values that can be nominal, ordinal, or interval. Quantitative, or continuous, variables are defined as those that can be put into an infinite number of continuous values.

Survival variables are used to measure the duration of time for the occurrence of a specific event. In the eg. that you have provided, categorical variables are Race and Gender whereas Age is the quantitative variable.Before the statistician goes into defining the analysis methods, he/she needs to make certain assumptions about the distribution of the population (eg. normal distribution, poisson distribution, etc).

In the example, for the key variable Age, it was better to calculate the P-value without considering any such strict distributional assumptions. That means, it was a case where statistical tests with nonparametric approach yield the best result. And the nonparametric SAS procedure for calculating P-value for a quantitative (or continuous ) variable is PROC NPAR1WAY.Coming to Gender, the population distribution was considered for the analysis. Parametric SAS procedure for calculating P-value for a categorical variable is chi-square test through Proc freq.Coming to Race, population distribution was not taken into consideration and hence it involved a nonparametric approach.

The nonparametric approach for calculating P-value for a categorical variable is Fisher's exact test through Prof Freq.So the fact in a nutsell is, the test you use to get p-value depends on whether the variable of interest is categorical/continuous /survival, and also how the population is distributed for the particular variable of interest and when it comes to your other question, why summary stats were caculated for AGE variable and n% was calculated for gender and race variables".That is because Age is a continuous variable and for continuous variables, getting the descriptive statistics provide useful information. Descriptive statistics is obtained by using proc means/proc univariate/proc summary etc.Gender and Race are categorical variables. Frequency distributions work best with such variables whose values are best summarized by counts rather than averages. .........by cathy.


2) Rajini
Creating reports?Can anyone tell how to create ad hoc reports,data validation,edit checks.if interviewer asks about it what shoulld we tell him?

ans)
a) creating ad hoc reports is nothing but little programming according to the queries that can be requested by FDA or IDMB or whoever. they can be like for Ex: i need the report of top 10 AE's that were reported in IIT population etc..my best suggestion would be like imagine some programming situation for some kind of report or listing or some specific graph like distribution of lab values across the populations etc...and explain them in detail how u can do it.

b) data validation: we can validate the analysis data sets, or created reports, listing or graphs. say that, u had done programming on the already created report or whatever based on the same specs the other programmer had done to prepare that report and checked the results with Proc Compare, and found the discrepancies or redundancy and documented it.generally they use Version Control softwares to make sure the change features of the programs that aid them in validation.if u r workin in an UNIX environment, Please tell them about RCS - revision control system (u can find plenty of material how to use RCS in UNIX , please go though it and u will understand it more clearly, its very easy to understand )

c) Edit checks are some thing that will check whether the data we have in the datasets is appropriate or not, i mean whether all the categorical and continuous variables fall into same limitations what they are supposed to be in.generally in the pharma companies there will be around 75 - 100 edit check macros that are already created, so u can just run them on ur data to make sure data u have is OK or not...u might have known about oracle clinical, those CDM guys will run the edit checks to make sure the data they are goin to send to a SAS programmer is clean, so there would be no problems in future to avoid data problems once u start workin on the data................ by Sreekanth Sharma.....

3) Srikanth Sharma :My experience:......interview questionsthese were some of questions asked by the biggest Pharma company in US..hope this will help some folks.
1) how many analysis datasets did you create?
2) how many of them are safety analysis datasets? how many efficacy analysis datasets?
3) name all the efficacy datasets u created?
4) tell me from scratch how you create a efficacy dataset, all the parent datasets, derived variables? derivation criteria?
5) how frequently u create analysis datasets in your project? how frequently u do reporting...
6) how many studies did u integrate in ISS n ISE? explain about those studies? how many protocols did u follow?
7) tell me the syntax for PROC LOGISTIC, GLM, MIXED, LIFETEST, PHREG, PROC CORR.
8) u said ur study was double blinded study? so u had a chance for unblinding the study? if no, who will unblind it? at what stage u have to unblind? u know which subjects got which trt? so how can u say, its a double blinded study?

ans)

1) how many analysis datasets did you create?datasets?ans: 7 .......safety, remaining effiacy..actually it differs for each study.

3) name all the efficacy datasets u created?ans: CHFB, CHFHDL, CHFBIDL, i said some..

4) tell me from scratch how you create a efficacy dataset, all the parent datasets, derived variables? derivation criteria?ans: it was a long answer detailing each and every variable, and the derivation for each variable.

5) how frequently u create analysis datasets in your project? how frequently u do reporting...ans: analysis datasets in the begining of the project for perticular study, for each diifft study in the beginning we create analysis datasets, and reporting we do it in drafts, like draft I, draft II, 3, 4 like that..

6) how many studies did u integrate in ISS n ISE? explain about those studies? how many protocols did u follow?i said 5 studies were int, protocols - 2

7)tell me the syntax for PROC LOGISTIC, GLM, MIXED, LIFETEST, PHREG, PROC CORR.ans: i wrote the whole syntax and explained the logic, told in detail what we have to keep in class and model statement, and what kind of P value, and where that P value is in each output dataset.

8) u said ur study was double blinded study? so u had a chance for unblinding the study? if no, who will unblind it? at what stage u have to unblind? u know which subjects got which trt? so how can u say, its a double blinded study?ans: i read the article called "INDEPENDENT DATA REVIEWS" from http://www.lexjansen.com/ by Ananth Kumar of Gilead sciences..and told the same thing .

4) temple NEED INFO Hi ,.Can any body give me a link to find a brief description or more info about Clinical Development Analysis and Reporting System(CDARS).

ans) CDARS provides user-friendly tools to facilitate easy retrieval and collation of patient and clinical information which forms the core component for further clinical research or audit studies. It helps us to create reports and also extract patient lists with specific criteria or outcome and track on all episodes related to respective patient, from a clinical data management system like OC.Pfizer is a company that uses this type of tool....... by Cathy.....

5) Sai reg phase III data I am preparing for interviews and have some doubts

1.From where the raw data will be extracted (Phase III trial data)
2.Cant we get the phase III data of any trial in PC files(excel sheets) in real time scenarios or else we have to fetch it only from OC database.
3.wht r the analysis datasets.
4.how r they created from original raw data
5.wht is safety analysis n efficacy analysis?diff between safety n efficacy datasets?Plz rectify my doubts.


ans)1.From where the raw data will be extracted (Phase III trial data)The raw data is extracted from the clinical data management system. Every company has its own favourite CDMS. OC is just one among them. Few others are Clinplus, Clinaccess, Medidate Rave etc.

2.Cant we get the phase III data of any trial in PC files(excel sheets) in real time scenarios or else we have to fetch it only from OC database.Well....Since clinical trials and associated reporting are highly regulated, companies go for sophisticated database like OC rather than a poor excel "database". One thing you must remember is that, whatever electronic medium we use to capture and store data must be 21 CFR Part 11 compliant. Excel definetely does not satisfy the conditions mentioned in this part.

3.wht r the analysis datasets. It depends on your project/trial. Some common ones are Demog, Adverse events, Concomitant medication, Physical Examination (vitals), Disposition, lab data etc. These are some of the safety-related analysis datasets. Mostly these analysis datasets remain same among various therapeutic areas, wheres the efficacy analysis datasets are the ones that are unique to your study.

4. How r they created from original raw data. There is a lot of formating and derivations that go into the creation of Analysis datatsets, and there will be a specification to guide us. There is no straight forward answer to this question that I can type here. I request you to refer further on this.

5.wht is safety analysis n efficacy analysis?diff between safety n efficacy datasets?Safety data is the one we use for analyzing how safe a drug is. It is the major part of most of the phase I, II and III studies. In addition, PhaseIII also concentrate on the efficacy part. Efficacy analysis is done essentially to find out to what extent drug A is efficacious when compared to Drug B. For an antihypertensive medication, efficacy might be analysed using the change from baseline systolic/diastolic blood pressure when a particular drug is taken.Hope now you got some idea!!.............by Cathy......

6) temple
pivot tables
What are pivot tables in sas? and also can anyone let me know which versoin of sas is used mostly in Companies


ans) Pivot Tables are used in the financial environment to enable analysts to group and summarize statistics. A Pivot table is to a way to extract data from a long list of information, and present it in a readable form. Remember the data we had from the student scores spreadsheet? You could turn that into a pivot table, and then view only the Maths scores for each pupil. Or view just Paul's scores, and nobody else's.

Why are they called Pivot Tables ? -Well, basically they allow us to pivot our data via drag-and-drop to produce meaningful information. This makes Pivot Tables interactive in that once the table is complete we can very easily see what effect moving (or pivoting) our data has on our information. This will become patently clear once you give Pivot Tables a go. Believe me, no matter how experienced you get at Pivot Tables there will always be an element of trial-and-error involved in producing the desired results! What this means is you will find yourself pivoting your table a lot! cotd... http://www.orkut.com/CommMsgs.aspx?cmm=42949801&tid=5202377049730991529

7) Pallav
variable name more than 32 character....
How to export a file having variable name more than 32 characters and embedded spaces?


ans)hi pallav, correct me if i am wrong

1) how can the variable Name can have spaces? only variable labels will have spaces2) the primary purpose we create a export file in .xpt format is to send the file to FDA..but unfortunatly FDA requests only SAS Version 5 Compatible files...so even if SAS v9 can create variable name that can span for 32 characters, FDA won't accept it.if ur intention is to create a transport file to send to other SAS user in ur team or cross functional team member..then its fine...u can send the file that has variable namel of 32 char long... by Sreekanth....

8)about ranuni function?
what is the use of ranuni function?please provide me the relevant link for this ranuni function?
ans:
do nething &
RANUNI is a function used for random number generation. It has values between 0 and 1.We use this for selecting samples randomly.Let us say you want to select 10% of blood samples. thats where we can use it.
sarath_sas
Here is one example:%let n = 3;data generate;do Subj = 1 to &n;x = int(100*ranuni(0) + 1);output;end;run;title "Data Set with &n Random Numbers";proc print data=generate noobs;run;in this, we are selecting 3 numbers .so it select randomly between 0 to 99 . if we ran this code again then it select another random 3 numbers.

9)padmakar
related to colon
Could anybody explain how X: notation works in selecting the variables that starts with x i.e. specifies all variables that begin with the letter X.Plz explain with an eg:- showing its application.Thnx in Advance ....padhu.........

ans)Ram
Please look Little SAS book Chapter - 2.10

Reading Messy Raw DataFor example, given this line of raw data,

My dog Sam Breed: Rottweiler Vet Bills: $478

From the example above, you need get only Rottweiler, then look the different INPUT statements, in which you are getting Rottweiler:Statements Value of variable DogBreed1.

INPUT @’Breed:’ DogBreed $; Rottweil2.

INPUT @’Breed:’ DogBreed $20.; Rottweiler Vet Bill3.

INPUT @’Breed:’ DogBreed :$20.; RottweilerYou can see the difference, in the 1 step you are getting value only 8 characters, 2nd sept - 20 characters and in the 3rd step you got only Rottweiler (this you want).This example from Little SAS book, so that you can read from the book, if you didn't get it here.


9)m
_null_
why do you use _NULL_ ? tell some instances where you have used that technique?


do nething &
We DATA _NULL_to create customized reports without manipulating the dataset.we are not creating any new dataset using this statement.We can also use this for creating raw data files from the datasets.we use file and put statements for that.i think this helps.or you can go to the litttle sas book ....

10) What abt. ETL.

ETL process:Extract,transform and Load
ExtractThe 1st part of an ETL process is to extract the data from the source systems. Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization / format. Common data source formats are relational databases and flat files, but may include non-relational database structures such as IMS or other data structures such as VSAM or ISAM. Extraction converts the data into a format for transformation processing.An intrinsic part of the extraction is the parsing of extracted data, resulting in a check if the data meets an expected pattern or structure. If not, the data is rejected entirely.

Transform:The transform stage applies a series of rules or functions to the extracted data from the source to derive the data to be loaded to the end target. Some data sources will require very little or even no manipulation of data. In other cases, one or more of the following transformations types to meet the business and technical needs of the end target may be required:· Selecting only certain columns to load (or selecting null columns not to load) · Translating coded values (e.g., if the source system stores 1 for male and 2 for female, but the warehouse stores M for male and F for female), this is called automated data cleansing; no manual cleansing occurs during ETL · Encoding free-form values (e.g., mapping "Male" to "1" and "Mr" to M) · Joining together data from multiple sources (e.g., lookup, merge, etc.) · Generating surrogate key values · Transposing or pivoting (turning multiple columns into multiple rows or vice versa) · Splitting a column into multiple columns (e.g., putting a comma-separated list specified as a string in one column as individual values in different columns) · Applying any form of simple or complex data validation; if failed, a full, partial or no rejection of the data, and thus no, partial or all the data is handed over to the next step, depending on the rule design and exception handling. Most of the above transformations itself might result in an exception, e.g. when a code-translation parses an unknown code in the extracted data.

Load:The load phase loads the data into the end target, usually being the data warehouse (DW). Depending on the requirements of the organization, this process ranges widely. Some data warehouses might weekly overwrite existing information with cumulative, updated data, while other DW (or even other parts of the same DW) might add new data in a historized form, e.g. hourly. The timing and scope to replace or append are strategic design choices dependent on the time available and the business needs. More complex systems can maintain a history and audit trail of all changes to the data loaded in the DW.As the load phase interacts with a database, the constraints defined in the database schema as well as in triggers activated upon data load apply (e.g. uniqueness, referential integrity, mandatory fields), which also contribute to the overall data quality performance of the ETL process.

11) Can anyone tell me thw advantages of CDISC standards? Can anyone explain abt. CDISC standards?

ans) What are benefits of the CDISC standards? Ultimately all benefits associated with standards implementation --- efficiency, time saving, process improvement, reduced time for regulatory submissions, more efficient regulatory reviews of submission, savings in time and money on data transfers among business partners, more efficient archive and recovery procedures, more accessible information, better communications amongst team members --- come down to saving money - whether it be in time, resources or actual funds.

•CDISC SDTM
unfolding the core model that is the basis both for the specialised dataset templates (SDTM domains) optimised for medical reviewers•CDISC Define.xmlmetadata describing the data exchange structures (domains).

Basic Concepts in CDISC SDTM
Observations and Variables•The SDTM provides a general framework for describing the organization of information collected during human and animal studies.•The model is built around the concept of observations, which consist of discrete pieces of information collected during a study. Observations normally correspond to rows in a dataset.•Each observation can be described by a series of named variables. Each variable, which normally corresponds to a column in a dataset, can be classified according to its Role. •Observations are reported in a series of domains, usually corresponding to data that were Example:Subject AB-525 weight 52Kg 30days after first dose (May 30,2006)DOMAIN:"VS"(Vital Signs, a Findings Domain)Identifier: Unique Subject Identifier is USUBJID="AB-525".Topic: Vital Signs Test Short name is VSTESTCD="WEIGHT"Timing:date/time of measurement is VSDTC=2006-06-29study day of the vital sign VSDY=30Result Qualifier: Result or Finding the original units is VSORRES=52Variable Qualifier: Unit is VSORRESSU="KG"Additinal timing variables and qualifiers may be added as necessary as they are in the model.

CDISC’s Submission standard
•Underlying Models:CDISC Study Data Tabulation ModelØClinical Observations•General Classes: Events, Findings, Interventions–Trial Design Model•Elements, Arms, Trial Summary Parameters etc.•Domains, submission dataset templates:CDISC SDTM Implementation Guide.

11) What are the current versions of SDTM,ODM,LAB,ADaM,Define.Xml?

ans) Current Versions:SDTM,ODM,LAB,ADaM,Define.Xml
Study Data Tabulation Model (SDTM) - The current version is 3.1.1.Standard for the Exchange of Non-Clinical Data (SEND) – The current version is 2.3; this is actually considered to be a part of the SDTM (for animal/tox data vs. human data)Operational Data Model (ODM) – The current version is 1.2.1. Version 1.3 Draft is currently posted for comment. This version will enhance v 1.2 such that it can support SDTM metadata for regulatory submission. Laboratory Data Model (LAB) – The current version is 1.0.1. This content standard can be implemented via ASCII, SAS, XML or and ANSI-accredited HL7 V3 message.Case Report Tabulation Data Definition Specification (CRT-DDS) Define.xml – The current version is 1.0. This is a means to submit SDTM metadata to FDA in an ODM XML format.Analysis Dataset Model Team - (ADaM 2.0)

12) What do you do to get clean data?
-use prog freq to check variables which have limited number of categories, such as gender, sex..
-use a data step to identify invalid character values (use put in data _null_ statement.
-use a where statement with proc print to list out of range data
-use user-defined formats to detect invalid values-using proc means/proc univariate to detect invalid values.

13) If you need the value of a variable rather than the variable itself, what would you use to load the value to a macro variable?
Use call SYMPUT, to get a variable from a dataset then assign it's value to a macro variable.

14)Efficiently duplicating SAS data sets
Question: How can a duplicate copy of a data set be created within the same library without copying the data set to an intermediate location, renaming it, and copying it back?

Answer: The most efficient way to do this is to use the APPEND procedure. Also, unlike the DATA step, indexes are copied as well.
If the data set supplied on the BASE= option does not exist, then it's created with the contents of the data set supplied with the DATA= option. Here is an example of creating a duplicate data set with a different name in the same library using
PROC APPEND.
proc append base=sasuser.new
data=sasuser.class;
run;
NOTE: Appending SASUSER.CLASS to SASUSER.NEW.
NOTE: BASE data set does not exist. DATA file is being copied to BASE file. NOTE: The data set SASUSER.NEW has 19 observations and 5 variables.
real time 1.34 seconds
cpu time 0.11 seconds

15) SAS Data Step Debugger:
Have you ever found yourself with a SAS data step which runs, but not quite as you want it to? The SAS Data Step debugger can help you find logic errors in your program by allowing you to step through your code, examine data, and even temporarily modify data values. The full text of this tip appears in the 1997 Q1 edition of SAS Communications.

To invoke the data step debugger, add the DEBUG option in the DATA statement.
For example:
data temp/debug;
do i=1 to 10;
x=i*2;
output;
end;
run;

When you submit this code you will see two new windows on your workstation screen, the DEBUGGER LOG window and the DEBUGGER SOURCE window. In the DEBUGGER LOG window you can type commands to examine variable values, set breakpoints, and change values. The results of these commands are shown as well. The DEBUGGER SOURCE window contains the source code for the DATA step you are debugging.

To end the debugger session, type QUIT on the command line in the DEBUGGER LOG window.

16) is
recent interview questions i faced
1. suppose we have 2 datasets in a Work library i want to delete one dataset from worklibrary what wil u do?2. what is concamitant medication tell me about it?3. we have a dataset having two variables name,sal i want the SUM of the sal variable at the bottom of the sal variable. write the syntax for that?4.tell me about sas Mautosource?5.significance of p-value

1) SAS datasets created with just a name (i.e. without specifying the name of the library ) are by default in the WORK library. The WORK library and the datasets in it are automatically deleted at the end of the SAS session. We don’t need to delete them seperately. But during the SAS session, if we want to delete the datasets from the work library, you can use either PROC DATASETS or PROC SQL. I feel PROC SQL is better for this purpose simply because it gives very less information in the log. Using the DROP statement with the Proc SQL we will be able to delete the dataset from work library.Whereas the DELETE command in the Proc datasets is used to remove the datasets from the SAS work library. If we want to delete all the datasets in the work library we need to give KILL option instead of the DELETE option. DELETE option just deletes the specified datasets in the work library. The following example shows how to delete all the members within a permanent SAS library using the KILL option:

LIBNAME input ‘temp’;
PROC DATASETS LIBRARY=input DELETE X;
RUN; (X is the dataset we need to delete form the library).

2) Comcomitant medications are drugs that are taken along with the study drug by the patients in the study. Suppose a patient, name stacy in enrolled in the oncology( drug A) clinical trial of ABC pharmaceuticals, and if she is also taking some other drugs( Aspirin, acetaminophen etc) along with drug A are called concomitant medications. Concomitant medications used in either safety or efficacy analysis of the drug. Concomitant medications(aspirin, acetaminophen etc) may be examined to determine whether they interact with the study drug (in this case drug A) or whether they can explain the presence of certain adverse events.

3) A SUM statement is used to compute the total number of the sal variable.

Procedure Code:
PROC PRINT DATA=namesal;
BY name;
VAR name sal;
SUM sal;
RUN;

4) When we want SAS to search for your macro programs in auto call libraries, you must specify two options, Mautosource and SASautos. The Mautosource option must be enabled to tell the macro processor to search auto call libraries when resolving macro program references. To use the auto call facility, you must have the SAS system option MAUTOSOURCE set. It causes the macro processor to search the auto call library for a file with the requested name when a macro has been invoked but not compiled.

5) If the p-value were greater than 0.05, you would say that the group of independent variables does not show a statistically significant relationship with the dependent variable, or that the group of independent variables does not reliably predict the dependent variable. Note that this is an overall significance test assessing whether the group of independent variables,when used together reliably predict the dependent variable, and does not address the ability of any of the particular independent variables to predict the dependent variable.
source:http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter1/statareg_annotated2.htm

sarath

17) Sarath one question from me too..regarding the ISS & ISE. Could you please explain a bit more in detail exactly how the Statistics group integrates the Safety and efficacy information in these datasets across studies (i.e are these sister studies, are datasets across studies merged?how? ) with a few examples. Seekin for one

ans) ISS will have all the clinical trial data, collected form a normal volunteers (from phase 1 study) and patients (all other studies). ISE will have the clinical trial data only from the phase II and Phase III and phase IV and not of Phase I. The reason behind this is, Phase I study is conducted to identify the safety and not the efficacy of the drug, so the data from the Phase I study will not be there in the ISE.ISE should have a description of entire efficacy database demographics and baseline characteristics.ISS should have the details including the extent of the exposure of drug by the patient, different characteristics of patients enrolled in the study, listing the deaths occurred -during the study, How many patients are drop-outs from the study and Potential SAE, other AE and lab results.

ISS is considered as one of the most necessary document required for filing the NDA (new drug application). The safety data from different trials can be integrated by pooling all the safety data together and then to identify the AE, that are rare. The data integration approach for the ISS and ISE are entirely different, whereas pooling the efficacy data from different studies is not required, although pooled data will give more information regarding the efficacy of the drug. Pooling all the safety data is necessary in making the ISS. ISS needs a thorough research because it involves with the safety and safety parameter is considered important than the efficacy in a clinical trial, because study should always benefit patients. ISR (integrated summary report): It is a compilation of all the information collected from the safety and efficacy analysis in all the studies. ISS and ISE are different parts of ISR. Both the ISS and ISE reports are necessary for all the new drug applications (NDA) in the United States.Every clinical trial is different, because each one is conducted for a specific purpose (Phase I for safety in normal population and all other for efficacy in patients). The reason behind creating the ISR will be to create an integrated report to compare and to differentiate all other study results and to get one conclusion after reviewing the patient benefit/risk profile. It requires by the FDA is the other reason. Last but not the least reason for this is to reach a definite conclusion through thorough checking all the data which is integrated.
source: encyclopedia of biopharmaceutical statistics

Resources from where you can learn SAS online:

http://www.ats.ucla.edu/stat/sas/
http://www.psych.yorku.ca/lab/sas/
http://learn.sdstate.edu/Dwight_Galster/510docs/Tutorial%20Programs/sas_tutorial_contents.htm
http://www.itc.virginia.edu/research/sas/training/v8/
http://www.umanitoba.ca/centres/mchp/teaching/sasmanual/



SAS tutorial for UNIX:

http://its.unm.edu/introductions/Sas_tutorial/

What is SAS?


The SAS System (originally Statistical Analysis System) is an integrated system of software products provided by SAS Institute that enables the programmer to perform:
data entry, retrieval, management, and mining
report writing and graphics
statistical and mathematical analysis
business planning, forecasting, and decision support
operations research and project management
quality improvement
applications development
Data warehousing (extract, transform, load)
platform independent and remote computing
In addition, the SAS System integrates with many SAS business solutions that enable large scale software solutions for areas such as human resource management, financial management, business intelligence, customer relationship management and more.
Discovering and developing safe and effective new medicines is a long, difficult and expensive process:
Preclinical Testing --> Investigational New Drug Application (IND) --> Clinical Trials, Phase- I --> Phase-II --> Phase-III --> New Drug Application (NDA) --> Approval --> Phase-IV.
SAS-programming is very important from preclinical testing to approval stages. Here, when you see the sequence of drug development and approval process you can yourself realize how important clinical trials and SAS-programming areas are and there is an urgent need of starting training programs in these areas.

Training program on how to effectively monitor clinical trials can be designed based on Good Clinical Practices and ICH guidelines. Various topics may include: design of protocol, case report forms, reporting serious adverse events, early phase clinical evaluation, processing clinical research data, clinical database, data display, report and analysis plan, clinical trial report, standard operating procedures in regulatory affairs, auditing of clinical trials and new drug application etc.

Clinical trials data arriving on case report forms is fairly standard, for example demography, adverse event, medications, laboratory etc hence can be stored in fairly standard data structures. Designing clinical data structures for data entry is important, but it should be done with some understanding of the analysis that will be performed. Once an appropriate clinical data structure is arrived at for data entry, it is important to then determine how to best use the data in the SAS analysis environment.

US FDA considers SAS validation an important component of the quality assurance, reliability & accuracy of much of the information used to approve & develop drugs and medical devices. Hot spots and crucial points in pharmaceutical research & developments are submission of various clinical reports to FDA and these clinical reports need to be created using SAS-programming techniques.
Different Phases in Clinical trilas and SAS programming include:
(1) Clinical Report Generation for example:
(a) Adverse Event tables and listings,
(b) Demographics,
(c) Safety,
(d) Efficacy,
(e) Lab Data,
(2) Custom Derived SAS data sets,
(3) SAS-Programming for NDAs (New Drug Applications),
(4) Experience with Phases I-IV,
(5) Data Cleaning,
(6) Statistical Analysis using SAS tools and
(7) Data Warehousing.