1. LITERATURE REVIEW

This section of the report presents theories and concepts from previous studies that are considered significant to this study. The relevant concepts to be described in this section include classification of trips, factors influencing trip generation, techniques of trip generation modelling and discrete choice models.

2.1 Trip Generation Modelling

Trip generation is a crucial stage in the 4-stage modelling process because failure to categorize the accurate trip production and attraction can lead to underestimated or overestimated number of trips. More still, in the determination of the number of trips, trip purpose and activity have been noted as key determinants. In practice, it is essential to characterize journeys in terms of journeys/trip purpose in order to obtain a clear understanding of travel and trip generation models. ‘A trip or journey is described as the one-way movement from a point of origin to a point of destination’ (Ort”zar S and Willumsen, 2011). By purpose, trips can be classified into work trips, education trips, shopping trips, social and recreational journeys and escort trips among others. Work and education trips are usually termed as compulsory or commute trips and the others are referred as optional or non-commute trips. The National Travel Survey, England shows that shopping, social and recreational trips account for 19%, 15% and 15% of average number of trips travelled (DfT, 2015).

In order to predict and estimate the average number of trips, prior knowledge of the social economic and demographic characteristics of the communities is useful for efficient planning. Consequently the impact of the various social economic on transportation planning can then be determined using statistical approaches/models (Penn et al., 2008).

For decades, development of models in estimating household production rates has existed. Data driven analysis, namely, correlation analysis and cross-categorization have been able to detect linear relationships among core variable and the various explanatory variables. Particularly, for total number of trips, estimation methods have evolved, including linear regression analysis and category analysis (FHWA, 1975; Caldwell and Demetsky, 1978), progressing into generalised linear model (Said and Young, 1990; Lan and Hu, 2000), Structural Equation Modelling (SEM) (Gim, 2011; Etminani-Ghasrodashti and Ardeshiri, 2015)

The two most frequently used approaches of trip generation modelling are linear regression analysis and category analysis. Both methods have their strengths and weaknesses (Hu, 2010). In regression analysis, it is assumed that there is linear relationship between the independent and dependant variables. This process is quite effective because there are statistical tests for goodness of it, however, the assumption of linearity is restrictive. More still, the absence of fixed upper and lower limits of the number of trips, could possibly lead to perverse estimations with increase in the model’s covariate, or else lead to negative trip numbers with minimal covariate values (Paez et al., 2006). In addition, linear regression assumes that the number of trips is continuous, but this assumption is questionable mainly when the number of trips are low. Unlike random utility theory, the relationship between the number of trips and the covariates in regression analysis lacks interactive justification (Ben-Akiva and Lerman, 1985).

Alternatively, cross ‘classification/category analysis models have long been used to estimate the number of trips as a function of various household attributes (income, car ownership). However, category analysis models do not permit extrapolation beyond its strata, have no statistical goodness-to-fit measures and also require large sample sizes making them impractical and expensive (Rouphail et al., 2013, Ort”zar S and Willumsen, 2011). Therefore, in order to close this gap, new model forms are increasingly being considered for trip generation. These models are based on discrete choice analysis and have been regarded as a major innovation in the transport field (Ben-Akiva and Lerman 1985; Rouphail et al., 2013).

Logistic regression models on the other hand are discrete choice models which can be can be used to model binary logit models (where an individual will or will not make a trip), multinomial logit (determination of the probability of making 1, 2, 3 or more trips) etc. This way, one can examine the trip frequency by each individual or household (Hosmer and Lemeshow, 2000; Hosmer, 2013). Discrete choice analysis has been previously used for mode choice modelling, but quite recently, it has been considered for determination of generation choice where the frequency of daily person or household trips can be estimated. Choice behaviour can further be modelled using two approaches; aggregate, normally defined for a group of individuals (area, zone, household) or disaggregate, to describe the individual decisions (Koppelman and Pas, 1984).

Some studies have recommended disaggregate models because they are inclined to retain the variance and interactive context of the particular response variable hence better results are expected especially in terms of transferability (Atherton and Ben-Akiva, 1976). More research has also indicated that disaggregate models are more efficient when the descriptive power of the model is required and also further preferred because of their independence from aggregate or zonal definitions (Downes, 1976; Wilmot, 1995; Hess et al., 2005).

Nevertheless, despite the various criticisms of aggregate models in terms of their inflexibility and inaccuracy, they are sometimes considered more favourable compared to disaggregate approaches. Disaggregate approaches are constrained by their inability to produce forecasts and they have also been seen to require data which may not reasonably be forecast (Ort”zar S and Willumsen, 2011). Furthermore, from the empirical test carried out by Badoe and Chen (2004) to determine the forecast performance of household and person trip generation in Canada, the study concluded that household unit was the more preferred analysis unit to use in trip generation modelling in cases of prediction and comparison. In this study, comparison between years is to be made, hence household aggregate approach will be used.

2.2 Previous Trip Generation studies

As earlier indicated in 2.1, trip production is a function of the socio-economic and demographic characteristics of the individual choice makers in a household or zone coupled with the land-use patterns of the study area and trip attributes. These explanatory variables differ based on the trip purpose and are keenly used to determine the degree to which these characteristics affect the trip maker’s propensity to make a trip (Hu, 2010).

Various studies have investigated these factors that affect travel and more specifically as regards to shopping, social and recreational trips using the different modelling approaches. These studies, however have majorly been carried in the developed countries compared to the developing countries. These key characteristics are further discussed in this section.

2.2.1 Developed Country Context

Income has been cited as a key determinant of trip generation. Research shows that increase in household income increases the ability for members in that household to pay for journey hence a rise in number of trips generated (Hu 2010; Jang 2005; Lim and Srinivasan, 2011). Jang (2005) adds that increment in income encourages individuals to participate more in social life, hence increasing non home based trips.

Secondly, increase in household size has also been cited as a major factor that increases trips generated (Ortuzar and Willumsen, 2011). In a study carried out by Penn et al. (2008) on National Household Travel Survey, USA, results showed that increase in the number of adults in a household increases number of shopping, social and recreational trips. This is greatly so because adults are responsible for undertaking the shopping in a home. The presence of children in a household was on the other hand, regarded as a double edged sword as regards to travel since it at times reduces number of trips. In the same study, the authors are in agreement with Agyemang-Duah et al. (1995) indicating that increase of children in the home reduced shopping trips but registered as a utility to social and recreational trips. Agyemang-Duah et al. (1995) adds that number of children may act as a scale factor which could lead to increase in shopping trips. Also Lim and Srinivasan (2011), in their study to carry out a comparative Analysis of Alternate Econometric approaches using the 2001 U.S. National Household travel Data, it was concluded that children (0-5years and 5-15years) significantly increase trip rate for Home-Based Other (HBO),Non-Home Based (NHB) but the rate decreases for Home-Based Work (HBW).

Similarly, vehicle ownership has been regarded to have significant influence on trip generation. Households with more cars have been seen to generate more trips. It has also been noted that households with single cars tend to utilise the cars more intensively hence non priority activities like shopping, or socializing may not be able to use the vehicle (El Pas, 1983; Penn et al., 2008). For instance, Hunt and Broadstock (2010) developed a trip generation model in the UK aimed at testing the dependence of trip making patterns on car ownership for residential developments. The study concluded that car ownership most definitely has a positive effect on travel. More still, the authors attribute car ownership to other household attributes like income, household size, and employment, thereby in agreement with (Wootton and Pick, 1967; Dargay et al., 2007; Dargay and Gately, 1999). Other studies like Guiliano (2003), Guiliana and Dargay (2006), and Guiliano and Narayan (2003) investigated differences in travel patterns between the various socio-demographics in USA and UK. The results showed that the average number of trips in USA is 4.4trips/day compared to UK’s 3trips/day. The authors attributed this behaviour to higher car ownership in USA which is sparked by their high household income hence increasing the demand for travel. Also, the studies identified age and gender as key determinants of travel in these two countries. In the UK, participants 65years and older were seen to travel 50% of the distance travelled by the younger ones versus 60% for those in the USA. Lower trip rate in the UK was also attributed to its high transport cost.

In addition to car ownership is the number of people with driving licence in the household. Studies to investigate whether the number of driving licences increases need for travel mainly in the elderly and disabled in London (Schm”cker et al., 2005), and Canada (Mercado and Paez, 2008; Newbold et al., 2005) were carried out. The studies concluded that car ownership and driving license ownership are positively correlated and they increase need for travel among the tested socio-demographics. However, from the study carried out by Vickerman and Barmby (1985) in UK, results showed that car ownership has no automatic effect on shopping trips.

Also, number of workers in the household has been selected as a significant factor in trip generation. According to Rouphail et al. (2013) where cummulative logistic models were used to estimate trip generation, the models showed that increase in the number wokers in the househould increases the rate of Home based work trips (HBW) whereas for Home-based Other trips (HBO), it is considered as a disutulity.

Futhermore, gender has also been studied by various researchers as a key engine for travel. Studies by Polk (2004) in Sweden, Best and Lanzendorf (2005) in Cologne carried out research to investigate the role of gender in generating trips. The latter study determined that woemen made less work trips but the trip rate increased for shopping or child care related trips.

2.2.2 Developing Country Context

In the developing countries, few studies have been carried out mainly due to lack of data, unavailability of funds to carry the surveys or due to lack of prioritizing of the sector. However progress has been registered in trip generation research as described in this sub section.

In Nigeria, Oyedepo & Makinde (2009) undertook a household trip generation study in Ado-Ekiti where linear regression model was developed. The survey mainly focussed on collection of the socio-economic characteristics in the area. The model aimed at estimating the number of trips generated per household per trip purpose (to school, to work, to shopping, to church, to home etc). The results showed that households with higher income and automobile availability make more trips compared to those with low income and less automobile accessibility. Also, the study showed home based other trips accounted for 52% of total trips versus 31% and 17% of non-home based and home based work respectively. Also, a study carried out by Srinivasan et al. (2007) in Chennai, India revealed that vehicle ownership indeed increases rate of travel since these vehicular trips are assumed to replace walk trips mainly in developing countries.

Furthermore still in Nigeria, Okoko and Fasakin (2007), estimated predictive models to determine the relationship between residential density and trip rate in Akure town. The study observed differentials in trip rate although the value was determined as statistically insignificant. Morestill Okoko (2008) carried out a study to determine women’s propensity to trip making in Akure. The study evaluated the factors that affect frequency of all-purpose trips undertaken by women. The impacts were assessed using multiple regression models and the study recommended economic female empowerment as key to increase in trip making.

Also in Iraq, research was carried out on Dohuk with the aim of development of trip production analysis model. The independent variables collected during the household Interview Survey included household size, car ownership, family income, number of workers in the home employment status, age (persons>6yrs) whereas the dependent variables included the total number of trips by purpose (HBW, HBC, HBS, HBSH, HBO, NHB). Using the Cross Classification model, the study concluded that family size and number of workers were the most effective independent variable in influencing trip rate (Al-Taei and Amal, 2006). Additionally, a study carried out in Chennai, India, a developing city, age 6-17years (school going age) contribute to increment of trips made although the level of significance is low (Srinivasan et al., 2007). The results by Al-Taei and Amal (2006) are consistent with the study undertaken by Moussa (2013) in the Gaza City, where the study also concurred that household size is indeed a key determinant of trip production. The author also cited number of licensed drivers, household income and vehicle ownership as key factors that influence trip productionUsing the linear regression techniques, studies have been carried out in Al-Diwaniyah City and Baghadad (Iraq) by (Sofia et al, 2012; Sarsam and Al-Hassani, 2011), Yogyakahut in Indonesia (Priyanto and Friandi, 2010), Kuwait (Said, 2010) and all these conclude that household size, income, number of workers and car ownership increase trip rate in the household. Srinivasan et al, (2007) went a step further and investigated if presence of female drivers in the household also contributes to increased need for travel.

More still, Petterson and Schmocker (2010), using Ordered Probit Regression Method, analysed travel patterns of the elderly (> 60years) in Manila to determine variation of trip frequency compared to London. The research indicated that there was a reduction in the total number of trips within the elderly in Manila. A deeper analysis according to trip purpose, the study illustrated that the recreational trips in these developed countries are fairly constant compared to the Manila.

2.3 Discrete choice Models

Substantial work has been done for decades on choice models in cases where the considered alternatives are discrete and also contain limited range (Hensher and Johnson, 1981). Prior work of Domencich and McFadden (1973) showed that discrete choice models mainly focussed on mode choice and less frequently on trip frequency.

However, quite recently, more diversification in the transport sector has occurred mainly in trip generation while utilising these models (Srinivasan et al., 2007, Penn et al., 2008; Koppleman and Bhat, 2006; ). These choice models aim at determining the relationship between the trip making levels and the utility by expressing the utility of choice of an alternative in terms of their characteristics. The models can then explicitly represent an individual’s preference ordering. Levels of trip making are tougher to evaluate unless a binary choice of whether or not to carry out a particular trip with specified characteristics is available.

Discrete choice models are majorly used to assess and forecast a decision maker’s choice of an alternative from a set of finite mutually exclusive and exhaustive alternatives. These three characteristics are perquisites to fit within a discrete choice framework called the choice set. As earlier explained in 2.1, the choices can be considered at either aggregate or disaggregate levels (Train, 2009; Koppelman and Bhat, 2006).

Generally discrete choice models assume that probability of an individual choice is dependent on the varying socioeconomic characteristics and relative desirability of that choice. Therefore, in order to represent the desirability of a particular alternative, the concept of utility is used. The utility theory specifies that the individual alternatives do not create utility, rather, it is derived from their characteristics or a combination of variables (Lancaster, 1966; Ben-Akiva and Lerman, 1985).

In transport, choice models are generally conducted using the random utility theory. Random Utility Theory assumes that an individual/decision maker chooses an alternative which gives the most satisfaction and hence provides the greatest utility as expressed in equation 2.1 (Train 2009). It consists of a random and deterministic component.

(2.1)

Where ‘nj, is the utility that the decision maker n obtains from alternative j; Vnj (also known as representative utility) represents the observed attributes of the alternatives as encountered by the decision maker and ”nj is the error term that includes the unobservable factors that affect utility but are not caputured in Vnj. The error/random term is mainly influenced by omitted variables, variations in taste, instrumental variables and measurement errors (Train, 2009). Therefore the probability Pni, that an individual n will choose alternative i, is summarized in equations 2.2;

(2.2)

In this case ”nj and j are unknown and therefore they are treated as random terms. However, with the joint density of the random vector denoted f(”n) [”n = (”n1,…, ”nj)], the probability of the decision maker’s choice can be determined as shown in equation 2.2. The probability follows a cumulative distribution which indicates that each random term ”nj ‘ ”ni, is less than the observed magnitude Vni ‘ Vnj. Therefore using the density function, the cumulative probability is revised as;

(2.3)

If the expression in the parentheses is true, then, the indicator function, I(.) is equal to 1, otherwise equal to 0 (Train, 2009).

More still, different discrete choice models can be achieved when this density function assumes different specifications, for example, assuming the unobserved part of utility having different distribution. Assuming that the unobserved term follows a Gumbel distribution or normal distribution, logit or probit models will be estimated respectively. This implies that discrete choice models can be used to deal with nominal and ordinal choices which are simply predicted by unordered and ordered response models respectively (Manski, 1973; Ben-Akiva and Lerman, 1985; Train, 2009).

Trip frequency models have evolved over the years from using linear regression (Monzon et al., 1989; Barmbay & Doornik, 1989; Paez et al., 2006) and category analysis (Said and Young, 1990; Said et al., 1991; Guevera and Thomas, 2007) to ordered or unordered responses models. Trip frequency studies have used both ordered (Vickerman and Barmby, 1984; Srinivasan, 2007; Schm”cker et al., 2005) and unordered (Schm”cker et al., 2006; Penn et al., 2008; Daly, 1997; Daly and Miller, 2006) response models. Similarly, based on the fact that three trip purposes are being investigated, it is necessary to consider distributions that take into consideration the multi response variables. This implies that dependent variables which can assume more than two categorical indicators are considered by these distributions. Some of these distributions include multinomial logit, multinomial probit, and ordered logit or probit models (Penn et al., 2008).

2.3.1 Ordered Response Models

Modelling the trip frequency as ordered choices assumes that there is a correlation between the alternatives for each trip purpose. Ordered alternatives further assume each alternative is similar to that closer and progressively differs to one that is further away. Train (2009) adds that unordered response models like, nested logit, mixed logit or probit could be used since they also account for correlation among alternatives. However, this might be complex since the specification of utility for these models is associated with each alternative.

Ordered response models can be logit or probit. These two differ in terms of distribution of the error terms; where logit assumes Gumbel distribution and probit assumes normal distribution. Long (1997) recommends that the choice between use of either logit or probit depends on the preference of the researcher since both models give the same result. Schm”cker et al. (2005) used ordered probit models to predict trip making according to trip purpose for both the elderly and the disabled people in London, with trip frequency as the latent variable. The trips investigated included shopping trips, personal trips, work trips and recreational trips, however social trips were not included in this study. Paez et al. (2006) also point out that taking the trip frequency as a set of mutually exclusive and exhaustive, incorporates upper and lower limits called thresholds; these will be explained in detail in the methodology. Though the Paez study examined the socio-demographic factors that affect travel, the dependant variable was the mean distance travelled. Generally, to the best of the author’s knowledge, no previous studies have investigates non-commute trips (social, recreational, social trips) in developing countries using ordered models.

The major critique of ordered distributions, however, is that the different categories of the dependent variables are examined simultaneously with respect to the independent variable implying that it may not be feasible to have different sets of estimators for the changing dependent variable values (Drucker and Khattak, 2000).

2.3.2 Unordered Response Models

On the other hand, unordered models can also be used to model trip frequencies. Unlike ordered distributions, unordered distributions are able to assess the different categories of the dependent variables separately as well as show flexibility in utilising the appropriate explanatory variables. Alternatively, the author intends to apply Multinomial Logit (MNL) based on its ability to consider multi-response dependent variables as well as solve some of the shortcomings expressed by the ordered distributions (Penn et al., 2008). Daly (1997), adds that a model using the logit form is quite suitable in estimating the number of trips by determining the probability that one could make a trip. These models represent one’s trip making ability.

Train (2009) explicitly explains that three factors elucidate the strength of logit models and also delineate their limitations; taste variation, substitution patterns and repeated choices. Logit models can show taste variation in terms of observed characteristics of the decision maker but cannot represent the various differences in taste linked to these observed characteristics. Similarly, logit models assume independence of the unobserved factors indicating that these models can capture dynamics of repeated choices, however they assume error terms are independently distributed hence showing independence of the irrelevant alternatives (IIA). This IIA assumption can therefore lead to unrealistic predictions.

2.3.3 Discrete Choice Model Estimation Methods

On determination of the likelihood function of the model, statistical inferences of the populace can be made, suggesting that probability distribution of the underlying data can be determined. Assuming that the determined parameter values show different probability distributions, it is vital to define the appropriate parameter value that relates to the chosen probability distribution. One of the statistical inferences widely used is the Maximum Likelihood Estimation (MLE) method which basically estimates parameter values that maximise the likelihood/probability of success (Myung, 2003; Manski and MacFadden, 1981).

The method stipulates that the probability of a decision maker n, choosing a given alternative can be expressed as Pni;

(2.4)

In this case, yni is equal to 1 if the decision maker chooses alternative i, and if otherwise then yni is zero. This implies that for the unchosen alternative, yni =0, then Pni is equal to 1 (any number raised to the power zero). All in all the term (2.4) gives the probability of the selected alternative.

Furthermore, assuming that the choice made by the decision maker n, is independent of each other person’s choice, the probability of choosing that particular choice by each decision maker is given by equation 2.5

(2.5)

Where ” represents the parameters of the model, L (”) is the likelihood function. However, in order to minimise cases of small values of the likelihood function, logarithms are taken on both sides of the function hence transforming equation 2.5 to equation 2.6 where LL (”) is the log-likelihood.

(2.6)

The log-likelihood is usually concave for linear utility functions but also parameters have been seen to display non ‘ linear representative utility. In the case of non-linearity, the maximum likelihood estimation is reinterpreted whereby the at the maximum, the derivative of the log-likelihood is zero (Equation 2.7)

(2.7)

2.5 Model Specification Testing

This section reflects on estimation methods of discrete choice models coupled with the goodness-to-fit indicators to be used in the model. Model specification testing is necessary so as to improve estimation efficiency and consequently reduce bias. Ideally, the most immediate model estimation outputs are the respective signs of the estimated coefficients and the significance of the coefficients. With the estimation of more than one specification, it is also useful to compare goodness to fit measures as further explained in the subsequent sections (Ben-Akiva and Lerman, 1985; Train, 2009; Ort”zar S and Willumsen, 2011)

2.5.1 The Likelihood Ratio test

The likelihood ratio consists of comparing the log-likelihood of the estimated model with that of a controlled model. A common restriction is equating all the model parameters to zero (Train, 2009). Equation 2.8 shows the likelihood ratio test.

(2.8)

Where; LR is the likelihood Ratio, LL(”’) and LL(”) represent the maximum value of log-likelihood for the estimated and constrained model respectively.

The likelihood ratio value takes on a chi-squared distribution and is comparable to the critical value. This test considers a null hypothesis which equates all parameter estimates to zero, and if the LR value is greater than the critical value, the null hypothesis is rejected.

2.5.2 Likelihood ratio index

This is often used in discrete choice models to measure how well the models fit the data. The statistic determines how appropriate the model, with its probable parameters, performs in comparison with a model in which all the parameters are zero. The comparison is made on the basis of log-likelihood function, estimated at both the expected parameters and at zero for the parameters (Train, 2009). The likelihood ratio index is given as;

(2.9)

Where LL(”) is the value of the log likelihood of the estimated parameters , LL(0) is the log likelihood value with all parameters equal to zero. The likelihood index ranges from zero to one, with one showing that the estimated parameters seamlessly predict the choices of the sampled decision makers and zero illustrating that the estimated parameters are not measurably different from the zero parameters (Train, 2009).

2.5.3 Rho-square test

According to Ben-Akiva and Lerman (1985), increase in explanatory variables can lead to the improvement of the likelihood ratio index. However, this may create a model with many explanatory variables making it inefficient. Therefore, the likelihood ratio index is commonly adjusted to eliminate the influence of the number of explanatory variables on model goodness-to-fit as shown in equation 2.10.

(2.10)

The test value ”2, ranges between 0 and 1. K is the number of explanatory parameters in the model.

2.6 Summary of the Literature