Setting up Sublime Text 3 REPL for IPython

There are many editors that can be used to do python programming including spyder and pycharm. Sublime Text 3 with REPL provides the best workflow for a python programmers. SublimeREPL with Sublime Text 3 allows you to send chunks of python code or lines of python code without having to take your hand of your keyboard and interrupt your workflow using keyboard shortcut. This post is a tutorial on how to step up Sublime Text with python on Windows.

 

Sublime Text 3 is available to be download at the following link:  https://www.sublimetext.com/3

 

Once sublime text 3 is installed, install package control from the following link: https://packagecontrol.io/installation

Launch sublime text 3 and press Ctrl Shift p to open the command palette.

Type Anaconda into the search bar and click it to install it.

Once anaconda is installed open, open sublimerepl.py in C:\Users\<Username>\AppData\Roaming\Sublime Text 3\Packages\SublimeREPL.py

 

Copy and paste

res["file_path"] = os.path.expanduser('~')

 

below

res["win_cmd_encoding"] = "utf8"

 

and save the file.

Open ipy_repl in  C:\Users\<Username>\AppData\Roaming\Sublime Text 3\Packages\SublimeREPL\config\Python directory and copy and paste the following code, overwriting the existing content.

 

Next open command prompt and run the following code:

ipython profile create

Open

~/.ipython/profile_default/ipython_config.py

and type

c.InteractiveShell.colors = "NoColor" 

in the bottom of the file and save it.

 

Open subprocess_repl.py

and delete


def is_alive(self):
    return self.popen.poll() is None

 

Paste

def is_alive(self):
     return True
 

Sublime Text is now properly set up to work with IPython using REPL. You can send code from the python file to the interactive session using ctrl + ,+ l.

Reading Character Count in Resume

This is a side project I did to improve my character reading skills using R. The goal of this project was to count the frequency of letters and numbers contained within the resume using R and then to plot the frequency of each character using a histogram. The end result is shown below.

 

The following code was used to generate the graph seen below.

 

Rplot

The difficulty I had with this project was trying to figure out how to display the chart with decreasing bars. I tried to use the arrange() function to arrange the column frequency using arrange(dataf,desc(Frequency)) but ggplot still graphed the bars in an increasing manner. After some googling, I found a post on stackoverflow which suggested using the reorder() function to get the desired results.

http://stackoverflow.com/questions/16961921/plot-data-in-descending-order-as-appears-in-data-frame

Importing data from SQL database into R

If the data, that you want to perform analyze is stored in a SQL (Structured Query Language) database such as MySql, Oracle, or Microsoft SQL then the RODBC R package can be used to import the data into R. ODBC (Open Database Connectivity) is a standard protocol used to connect to databases regardless of the operating system they are running.

In order to connect to the database you will need the username, password, the RODBC library and the database name.

The RODBC library can be installed and loaded into R using the following R code

install.packages("RODBC")
library(RODBC)

Now, we will connect to the database, with the database name CARS:


mydb<-odbcConnect("CARS",uid="username",pwd="password") -- connect to database

After an open database connection has been successfully established, the sqlQuery() function can be used to create a data.frame from the database rows.


data_query <- "SELECT * from car_names WHERE price > 15000"; #create the sql query to use in sqlQuery function

car_data <- sqlQuery(channel=mydb, query=data_query, stringAsFactors=False); # stores the data in a data.frame containing all the rows
#from data_query. stringAsFactor = False is used so R does not transform the character data into factors

After you are done the connection can be closed by.


odbcClose(mydb)

Posted in SQL

Updating WordPress Password using SQL

If it's been a while since I made post, I sometimes forget the login  admin account password. Usually the admin password can be reset using the reset password page but due to hosting server issues, the reset email might not be sent. In that case, the fastest way to reset the password would be to login into the hosting cpanel and modify the password directly using SQL commands.

Inorder to change the password using SQL you must:

1. Login to the cpanel of your hosting provider.

2. Access phpMyadmin and browse to where your wordpress directory is stored.

3. Open the operations tab and enter in the following sql query.

USE databasename; -- to select the database

SELECT ID, user_login, user_pass, from wp_users; -- retrieves the user_login, user_pass rows

UPDATE wp_users SET user_pass MD5=('Newpassword') where ID=1  Limit=1; 
--updates the user password for the admin account, the ID=1 identifies the admin row, MD5 is used
--because the password stored in the database are MD5 hashed

Now you can login to your account using Newpassword as the login password.

Posted in SQL

Time Series Analysis of Housing Starts

PSTAT 174 Class Project: Time Series Analysis of Housing Starts

The decline of the United States housing market in 2007 had a pervasive negative effect on the United States economy. Housing prices peaked in 2006 and started declining in 2007. The cause of the housing market is believed to have been caused by several factors including relaxed standards for mortgages rates and deregulation. Increased foreclosure rates resulted in a credit crisis which is believed to be the primary cause pf the 2007-2009 recession. During the recession, housing demand fell and construction for new housing declined rapidly.

This is why I decided to do a time series analysis of United States housing starts. Because housing starts tend to be a good economic indicator of how well the economy is doing and how well it going to do. If the housing market is doing well then it results in economic growth. If there is sustained decline in the housing market than the economic growth will tend to decrease as well. By building a time series model, we can forecast whether the economic conditions are going to improve to decline.

For this project I used data released by the Federal Reserve Bank of St. Louis to do my time series analysis. The first thing I did was I used the Quandl package to import the data into R and then plotted the time series.

library("astsa", lib.loc="~/R/win-library/3.1")
library("Defaults", lib.loc="~/R/win-library/3.1")
library("forecast", lib.loc="~/R/win-library/3.1")
library("fracdiff", lib.loc="~/R/win-library/3.1")
library("plotrix", lib.loc="~/R/win-library/3.1")
library("tseries", lib.loc="~/R/win-library/3.1")
library("Quandl", lib.loc="~/R/win-library/3.1")

housing=Quandl("FRED/HOUST", trim_start="1979-12-31", trim_end="2014-04-30", collapse="monthly", type="ts")
housingts=ts(housing, frequency = 12, start = c(1979,12))

plot(housingts)

plotts

From the graph, we can see housing starts declining around 2007 because of the subprime mortgage crisis.


acf2(ts(housingts, frequency=1))

> adf.test(housingts)

Augmented Dickey-Fuller Test

data: housingts
Dickey-Fuller = -2.1506, Lag order = 7, p-value = 0.5138
alternative hypothesis: stationary

pacfplot

From the time series graph, we can conclude that the series can conclude that the series is not stationary. There is a slow decay in the acf which indicates that there is strong correlation in the data and that the data is not stationary. The large p-value from the adf test future supports our hypothesis that the data is not stationary.

The next step is to make the data stationary.

In order to make the data stationary, we can take a lagged difference of the data and then look at the acf again.

housingdiff=diff(housingts)

> acf2(housingdiff)

plottsseries

The acf plot of the differenced data no longer has a slow decay and the Dickey-Fuller test also supports the hypothesis that the data is stationary because of the small p-value. There are a few significant lag and that is due to the seasonal nature of the data.

 


> adf.test(housingdiff)

Augmented Dickey-Fuller Test

data: housingdiff
Dickey-Fuller = -6.6809, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(housingdiff) : p-value smaller than printed p-value

Now that the data set is stationary we can now start working with it by looking at the ACF and PACF lags to get the values. We want to be conservative with our estimates so we begin by trying ARIMA(2,1,3)(2,1,3) model.

> fit213213&amp;amp;amp;amp;amp;lt;-arima(housingts, order = c(2, 1, 3),seasonal = list(order = c(2, 1, 3), period = 12)) 
> tsdisplay(residuals(fit213213))
|> fit213213$aic
[1] 4851.315

residualspls

We can also use the auto.arima() function to see which values R gets.

 

</span>

<pre>
> fit.start=auto.arima(housingts)
> fit.start

Series: housingts
ARIMA(2,1,3)(2,0,0)[12]

Coefficients:
ar1 ar2 ma1 ma2 ma3 sar1 sar2
1.2874 -0.3256 -1.6559 0.8715 -0.1725 -0.1352 -0.2074
s.e. 0.5709 0.5348 0.5665 0.7266 0.2052 0.0520 0.0517

sigma^2 estimated as 9231: log likelihood=-2466.11
AIC=4948.21 AICc=4948.57 BIC=4980.38

We can try other models to see if we can do better.
> fit112200&amp;amp;amp;amp;amp;lt;-arima(housingts, order = c(1, 1, 2),seasonal = list(order = c(2, 0, 0), period = 12)) 
> fit112200$aic
[1] 4945.036

residualsplot2

Since this has a lower AIC value, then it tells us that this model is the better model to use.

 

</span>

<pre>
tsdisplay(residuals(fit112200))

residualsplot2

Looking at the plot, there seems to be significant lag at 13 and 26. Considering that we have 36 observations and there is 0.05 percent chance for the observation to outside the confidence interval we can expect 36*0.05 = 1.8 or 2 lags to be outside the confidence interval. Since most of the values fall within the confidence interval then the values in the graph are sufficiently close to being white noise.

We can also look at the tsdiag.

> tsdiag(fit112200)

> summary(fit112200)
Series: housingts
ARIMA(1,1,2)(2,0,0)[12]

Coefficients:
ar1 ma1 ma2 sar1 sar2
0.9308 -1.2837 0.362 -0.1427 -0.2054
s.e. 0.0601 0.0720 0.045 0.0517 0.0517

sigma^2 estimated as 9250: log likelihood=-2466.52
AIC=4945.04 AICc=4945.24 BIC=4969.16

standarizedresiduals

From the summary, we can see that the coefficient are significant because 0 or 1 do not fall in their interval.

Now that we have a seasonal arima model can now use it for forecasting.


> plot(forecast(fit112200, h=10))
(forecast(fit112200, h=4))
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
May 2014 1034.663 911.4092 1157.916 846.1627 1223.163
Jun 2014 1043.814 897.0021 1190.625 819.2848 1268.342
Jul 2014 1044.469 875.3739 1213.563 785.8605 1303.077
Aug 2014 1049.710 859.2006 1240.218 758.3513 1341.068

forecastarima

The housing data tells us that we can expect the new housing starts to increase and then decrease and increase. Since the units are in thousands, the small changes the observations makes a big difference. Because of the significant lags in the acf and pacf plot then our model will only work for future observations that are not too far into the future.

These observations makes sense because the housing market recently has had a slow upward trend. From this, we can expect the economy to slowly recover over the coming month and for the financial conditions to improve. Since if the economy is improving then consumers have more money and are creating demand for housing. In order to meet the demand, new housing is built. This assumption is backed by the following graph which shows that during a recession the new housing declines and when economic conditions improve housing starts start increasing.

forecast2arima


plot(forecast(fit112200, h=4))
rect(xleft=2006, xright=2009, ybottom=0, ytop=7000, col="#123456A0")
rect(xleft=1990, xright=1991, ybottom=0, ytop=7000, col="#123456A0")
rect(xleft=1981, xright=1982, ybottom=0, ytop=7000, col="#123456A0")

Since the end of the 2006-2009 recession, there has been a steady increase in new housing. And our model supports that steady growth.

Deleting Duplicate Entries Within A Column

The following code orders the values from A1 to A10 in ascending order and then deletes the rows that have duplicate values. This code could be generalized for large data sets.

 
Sub DelDuplicates()
Dim currentcell, nextcell 'Declares currentcell and nextcell as variant data types
Range("A1:A10").Sort key1:=Range("A1:A10") 
Set currentcell = Range("A1")
Do While Not IsEmpty(currentcell) 'Continues loop until a the current cell is found to be empty
    Set nextcell = currentcell.Offset(1, 0)
    If nextcell.Value = currentcell.Value Then
        currentcell.EntireRow.Delete
    End If
    Set currentcell = nextcell
    Loop
End Sub

Posted in VBA

Using the VBA Recorder

When starting to write a VBA program, it is sometimes useful to first using the VBA record to translate the keystrokes into VBA code and to edit the code to make it more concise. It is useful to use the VBA recorder when the actions consists  of sequence of menu commands, as for example creating a chart or sorting the data. If the actions consist mostly of cursor movements, then the code generated by the recorder tends to be very verbose.

The VBA recorder is accessed by clicking the record macro button under the developer tab. After recording the code, the macro can be run by pressing alt-F8.

For instance, if you use the VBA recorder and create a table like the one shown below then the VBA code will look similar to this.


Sub Macro1()
'
' Macro1 Macro
'

'
Range("A1").Select
ActiveCell.FormulaR1C1 = "Jake"
Range("B1").Select
ActiveCell.FormulaR1C1 = "400"
Range("A2").Select
ActiveCell.FormulaR1C1 = "Amy"
Range("B2").Select
ActiveCell.FormulaR1C1 = "1000"
Range("B3").Select
ActiveCell.FormulaR1C1 = "=SUM(R[-2]C:R[-1]C)"
Range("B4").Select
End Sub

 

The code can be more concise by editing out the select operations and removing the formula property since it is not unnecessary for a single range select object:

 


Sub Macroexample()

Range("A1") = "Jake"
Range("B1") = 400
Range("A2") = "Amy"
Range("B2") = 1000

Range("B3") = Application.Sum(Range("B1:B2")) 'Need to use application.sum because sum is an excel function.
End Sub