* I write this as a short guide though I do not always stick to it. This post was inspired by a thoughtful discussion on the linkedin Stata-Users group.
* The number one rule is: Always, always comment your code!
* See my reasons:
* http://www.econometricsbysimulation.com/2012/07/7-reasons-to-comment-your-code.html
* I prefer using <*> before a comment as the primary method of commenting.
* And <//> before my lines when in mata.
* I only like using /* and */ when I have a large amount of comments.
* I think it is useful to add comments after commands like.
clear // remove data in memory
* Though with long commands I think it is hard to read. For example:
twoway (scatter length gear_ratio) (scatter foreign mpg_price) (scatter price mpg) ///
, title("This is a useless and meaningless graph") // Graphs length against gear_ratio
* -----------------------------SAMPLE DOCUMENT----------------------------
/***** Title of Do File
Description of do file. This might have several paragraphs for which
I reccomend hard breaking the lines since Stata does not have word wrap
*********************** Section 0: Initialization **********************
If your do file is very long and has multiple sections consider including
an index.
Index
1. Parameter declaration
2. Input/clean data, generate temporary data
3. Manipulate variables
4. Generate summary statistics
5. Generate estimates
6. Delete temporary data/variables
I might also consider including a variable glossory at the begging of your do file.
For example:
cntgdp: Country GPD
cntgdp2: Country GPD demeaned
year: Year
nsrvy: Number of Survey Wave
As for naming variables, I would suggest not letting variables get longer
than six letters and two numbers long.
For example the variables might mean:
dstgdpp: disctrict gross domestic product per capita, nominal currency of that year.
dstgdppcchgyr00: disctrict gross domestic product per capita change from year 2000.
It might seem like a good idea to write it this way but it is really confusing to
try to read especially since stata will start truncating variables.
I would suggest instead naming variables something like this instead:
dstgdp1: disctrict gross domestic product per capita change from year 2000.
dstgdp0: disctrict gross domestic product per capita, nominal currency of that year.
Have two places you define the variables.
The variable glossory at the begginning of your document and the label that you
give your variables.
*/
******************** End Section 0: Initialization **********************
****************** Section 1: Parameter declaration *********************
* Often times you might find it useful to specify globals or locals that help you
* control your analysis when you run your file.
* For example:
global exmin = 1
* When set to 1 minorities will be excluded from the analysis.
global ppp = 0
* When set to 1 purchasing power parody will be used instead of GDP per capita.
* Of course you will need to code up within the analysis what the globals actually do.
* Speficy a working directory. This can be done with the "cd".
* Personally I don't think this is suffcient.
* Often I am loading multiple data files from multiple directories.
* I prefer using globals specified in the parameter section.
* This allows users to have slightly or largely different file organization,
* Yet still be able to run your analysis.
* For example:
* Use globals to specify directories of interest
* Read directory
global rdir = "C:/data_files/my_project/original_data/"
* Save directory
global sdir = "C:/my_project/modified_data/"
**************** End Section 1: Parameter declaration *********************
****************** Section 2: Input/clean data *********************
* When you load in data. Always first load it then save a copy of it somewhere else.
* Load original data:
sysuse auto, clear
* Save data to new directory where it will never accidently overwrite your original data
save "${sdir}auto.dta", replace
*************** End Section 2: Input/clean data *********************
****************** Section 3: Manipulate variables *********************
* Always give your variables labels when you define them.
gen mpg_price = mpg*price
label var mpg_price "Miles Per Gallon times Price"
* Uses spaces to help denote commands which are secondary.
* Never use tabs instead of spaces
* because
* they
* are
* hard
* to
* read
* and can
* substantially
* decrease
* your
* page space
* Also, your code may
* look different with different
* programs.
* This is very annoying.
* I stuck a
* lot of spaces to
* simulate the
* Stata editor.
* Always explain why you do things.
drop if foreign == 1
* We only are interested in domestic cars (for example).
* When doing any kind of looping also use indentation:
forv i = 1(1)10 {
* When using forvalues never do i = 1/10 instead of i = 1(1)10 which are equivalent.
* But i = 1/10 notation can cause problems when using macros.
* It is very improtant to indent.
if (`i' == 3) {
* Do something
* I am displaying filler text when i==3 and only then
di "Filler"
* This will display i squared when i==3 (which is obviously 9)
di `i'^2
}
* End if
}
* End forv i loop
* Also, indent when commands go on multiple lines in length.
twoway (scatter length gear_ratio) (scatter foreign mpg_price) (scatter price mpg) ///
, title("This is a useless and meaningless graph")
* This can made commands much easier to read.
*************** End Section 3: Manipulate variables *********************
* Also, take a look at some of the comments below. There have been some very thoughtful contributions by Stata users.